Save netCDF files to EOSC (CESNET) bucket#

It is important to save your results in a place that can last longer than a few days/weeks!
import os
import pathlib
import s3fs
import xarray as xr

Get a sample file#

ds = xr.tutorial.open_dataset("air_temperature.nc").rename({"air": "Tair"})

Save sample file into local file#

  • The file is small (< 5GB so it is not an issue to do that)

ds.load().to_netcdf('1air_temperature.nc')
ds.load().to_netcdf('2air_temperature.nc')
/tmp/ipykernel_1162/1018954315.py:1: SerializationWarning: saving variable Tair with floating point data as an integer dtype without any _FillValue to use for NaNs
  ds.load().to_netcdf('1air_temperature.nc')
/tmp/ipykernel_1162/1018954315.py:2: SerializationWarning: saving variable Tair with floating point data as an integer dtype without any _FillValue to use for NaNs
  ds.load().to_netcdf('2air_temperature.nc')

Save your results to Remote object storage#

  • If not done, create your credentials by follwoing this link

  • Verify your credentials in /home/jovyan/.aws/credentials It should look like

[default]
aws_access_key_id=xxxxx
aws_secret_access_key=yyyy
target = s3fs.S3FileSystem(anon=False,
      client_kwargs={
         'endpoint_url': 'https://object-store.cloud.muni.cz'
      })
It is important to save your results in 'your' bucket. [The credential created here ](../EOSC_to_bucket.md) is a common space for pangeo-eosc cloud users. You need to not to 'over write' data on other users
your_name='put-yourname'

Set the bucket and place where you’ll copy your data to

s3_prefix =  "s3://escience/"+your_name
print(s3_prefix)
s3://escience/mschulz

List files you want to copy

import glob
list_files = glob.glob("*.nc")
list_files
['1air_temperature.nc', '2air_temperature.nc']

Put them to s3 storage

for file in list_files:
    s3_path_file = os.path.join(s3_prefix, os.path.basename(file))
    print(file, s3_path_file)
    target.put(file, s3_path_file)
1air_temperature.nc s3://escience/mschulz/1air_temperature.nc
2air_temperature.nc s3://escience/mschulz/2air_temperature.nc

You can now use the remote file#

remote_path ='escience/'+your_name
target.ls(remote_path)
['escience/mschulz/1air_temperature.nc',
 'escience/mschulz/2air_temperature.nc']
target.rm(remote_path+'/1air_temperature.nc')
s3path = remote_path+'/2air_temperature.nc'
ds_check = xr.open_dataset(target.open(s3path))
ds_check
<xarray.Dataset>
Dimensions:  (lat: 25, time: 2920, lon: 53)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    Tair     (time, lat, lon) float32 ...
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...