Save netCDF files to EOSC (CESNET) bucket
Contents
Save netCDF files to EOSC (CESNET) bucket#
It is important to save your results in a place that can last longer than a few days/weeks!
When you have saved data on your https://pangeo-eosc.vm.fedcloud.eu/jupyterhub/user/todaka/JupyterLab instance, you may want to download it to your PC and/or you want to copy that on object storage https://object-store.cloud.muni.cz
import os
import pathlib
import s3fs
import xarray as xr
Get a sample file#
ds = xr.tutorial.open_dataset("air_temperature.nc").rename({"air": "Tair"})
Save sample file into local file#
The file is small (< 5GB so it is not an issue to do that)
ds.load().to_netcdf('1air_temperature.nc')
ds.load().to_netcdf('2air_temperature.nc')
/tmp/ipykernel_1162/1018954315.py:1: SerializationWarning: saving variable Tair with floating point data as an integer dtype without any _FillValue to use for NaNs
ds.load().to_netcdf('1air_temperature.nc')
/tmp/ipykernel_1162/1018954315.py:2: SerializationWarning: saving variable Tair with floating point data as an integer dtype without any _FillValue to use for NaNs
ds.load().to_netcdf('2air_temperature.nc')
Save your results to Remote object storage#
If not done, create your credentials by follwoing this link
Verify your credentials in
/home/jovyan/.aws/credentials
It should look like
[default]
aws_access_key_id=xxxxx
aws_secret_access_key=yyyy
target = s3fs.S3FileSystem(anon=False,
client_kwargs={
'endpoint_url': 'https://object-store.cloud.muni.cz'
})
It is important to save your results in 'your' bucket. [The credential created here ](../EOSC_to_bucket.md) is a common space for pangeo-eosc cloud users. You need to not to 'over write' data on other users
your_name='put-yourname'
Set the bucket and place where you’ll copy your data to
s3_prefix = "s3://escience/"+your_name
print(s3_prefix)
s3://escience/mschulz
List files you want to copy
import glob
list_files = glob.glob("*.nc")
list_files
['1air_temperature.nc', '2air_temperature.nc']
Put them to s3 storage
for file in list_files:
s3_path_file = os.path.join(s3_prefix, os.path.basename(file))
print(file, s3_path_file)
target.put(file, s3_path_file)
1air_temperature.nc s3://escience/mschulz/1air_temperature.nc
2air_temperature.nc s3://escience/mschulz/2air_temperature.nc
You can now use the remote file#
remote_path ='escience/'+your_name
target.ls(remote_path)
['escience/mschulz/1air_temperature.nc',
'escience/mschulz/2air_temperature.nc']
target.rm(remote_path+'/1air_temperature.nc')
s3path = remote_path+'/2air_temperature.nc'
ds_check = xr.open_dataset(target.open(s3path))
ds_check
<xarray.Dataset> Dimensions: (lat: 25, time: 2920, lon: 53) Coordinates: * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0 * lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0 * time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00 Data variables: Tair (time, lat, lon) float32 ... Attributes: Conventions: COARDS title: 4x daily NMC reanalysis (1948) description: Data is from NMC initialized reanalysis\n(4x/day). These a... platform: Model references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...