Toggle navigation sidebar

Toggle in-page Table of Contents

Pangeo Tutorial at CLIVAR CMIP6 Bootcamp 2022

Welcome 👋

Before the workshop

Setup: how to run the tutorial

Pangeo 101

Handling multi-dimensional arrays with xarray
Interactive plotting with holoviews
Data access and discovery
Chunking
Parallel computing with dask

Resources and Examples

How to
Multidimensional Coordinates example using CMIP6 Pangeo ocean model data
Manipulation of CMIP6 model data using Pangeo catalog
MODIS Sea-ice
Using ADAM-API to access MODIS Aqua CHL
Using ADAM-API to access UERRA regional reanalysis
Rolling mean with CMIP6
Using masks and computing weighted average
Using public data on NIRD using s3 and saving results in private s3 object storage
Using cartopy and projections for plotting
Reading AERONET data with pandas
Reading MODIS data with xarray
Read MODIS Terra/Aqua netcdf as xarray
Save files from bucket to bucket
Save files to EOSC (CESNET)
Save files to EOSC (CESNET)
Save files to EOSC (CESNET)
Save files to EOSC (CESNET)
Search and Load CMIP6 Data via ESGF / OPeNDAP
Some tips with xarray and pandas
Regridding model data with xESMF

Beyond the workshop

Join the community!
Project Pythia
Environmental Data Science Book

Powered by Jupyter Book

Binder
JupyterHub

repository
open issue

.ipynb

Contents

Get a sample file
Save sample file into local file
Save your results to Remote object storage
You can now use the remote file

Save files to EOSC (CESNET)

Contents

Get a sample file
Save sample file into local file
Save your results to Remote object storage
You can now use the remote file

Save files to EOSC (CESNET)#

It is important to save your results in a place that can last longer than a few days/weeks!

When you have saved data on your https://pangeo-clivar.vm.fedcloud.eu/jupyterhub/user/todaka/JupyterLab instance, you may want to download it to your PC and/or you want to copy that on object storage https://object-store.cloud.muni.cz

import os
import pathlib
import s3fs
import xarray as xr

Get a sample file#

ds = xr.tutorial.open_dataset("air_temperature.nc").rename({"air": "Tair"})

Save sample file into local file#

The file is small (< 5GB so it is not an issue to do that)

ds.load().to_netcdf('1air_temperature.nc')
ds.load().to_netcdf('2air_temperature.nc')

/tmp/ipykernel_4103/1018954315.py:1: SerializationWarning: saving variable Tair with floating point data as an integer dtype without any _FillValue to use for NaNs
  ds.load().to_netcdf('1air_temperature.nc')
/tmp/ipykernel_4103/1018954315.py:2: SerializationWarning: saving variable Tair with floating point data as an integer dtype without any _FillValue to use for NaNs
  ds.load().to_netcdf('2air_temperature.nc')

Save your results to Remote object storage#

If not done, create your credentials by follwoing this link
Verify your credentials in /home/jovyan/.aws/credentials It should look like

[default]
aws_access_key_id=xxxxx
aws_secret_access_key=yyyy

target = s3fs.S3FileSystem(anon=False,
      client_kwargs={
         'endpoint_url': 'https://object-store.cloud.muni.cz'
      })

It is important to save your results in 'your' bucket. [The credential created here ](../EOSC_to_bucket.md) is a common space for pangeo-eosc cloud users. You need to not to 'over write' data on other users

your_name='tinaok'

Set the bucket and place where you’ll copy your data to

s3_prefix =  "s3://tmp/"+your_name
print(s3_prefix)

s3://tmp/tinaok

List files you want to copy

import glob
list_files = glob.glob("*.nc")
list_files

['2air_temperature.nc', '1air_temperature.nc']

Put them to s3 storage

for file in list_files:
    s3_path_file = os.path.join(s3_prefix, os.path.basename(file))
    print(file, s3_path_file)
    target.put(file, s3_path_file)

2air_temperature.nc s3://tmp/tinaok/2air_temperature.nc
1air_temperature.nc s3://tmp/tinaok/1air_temperature.nc

You can now use the remote file#

remote_path ='tmp/'+your_name
target.ls(remote_path)

['tmp/tinaok/1air_temperature.nc',
 'tmp/tinaok/2air_temperature.nc',
 'tmp/tinaok/Tair_temperature.nc',
 'tmp/tinaok/tTair_temperature.nc',
 'tmp/tinaok/zarr-demo']

target.rm(remote_path+'/1air_temperature.nc')

s3path = remote_path+'/1air_temperature.nc'

ds_check = xr.open_dataset(target.open(s3path))
ds_check

<xarray.Dataset>
Dimensions:  (lat: 25, time: 2920, lon: 53)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    Tair     (time, lat, lon) float32 ...
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...

xarray.Dataset

Dimensions:
- lat: 25
- time: 2920
- lon: 53

Coordinates: (3)

lat

(lat)

float32

75.0 72.5 70.0 ... 20.0 17.5 15.0

standard_name :: latitude
long_name :: Latitude
units :: degrees_north
axis :: Y

array([75. , 72.5, 70. , 67.5, 65. , 62.5, 60. , 57.5, 55. , 52.5, 50. , 47.5,
       45. , 42.5, 40. , 37.5, 35. , 32.5, 30. , 27.5, 25. , 22.5, 20. , 17.5,
       15. ], dtype=float32)

lon

(lon)

float32

200.0 202.5 205.0 ... 327.5 330.0

standard_name :: longitude
long_name :: Longitude
units :: degrees_east
axis :: X

array([200. , 202.5, 205. , 207.5, 210. , 212.5, 215. , 217.5, 220. , 222.5,
       225. , 227.5, 230. , 232.5, 235. , 237.5, 240. , 242.5, 245. , 247.5,
       250. , 252.5, 255. , 257.5, 260. , 262.5, 265. , 267.5, 270. , 272.5,
       275. , 277.5, 280. , 282.5, 285. , 287.5, 290. , 292.5, 295. , 297.5,
       300. , 302.5, 305. , 307.5, 310. , 312.5, 315. , 317.5, 320. , 322.5,
       325. , 327.5, 330. ], dtype=float32)

time

(time)

datetime64[ns]

2013-01-01 ... 2014-12-31T18:00:00

standard_name :: time
long_name :: Time

array(['2013-01-01T00:00:00.000000000', '2013-01-01T06:00:00.000000000',
       '2013-01-01T12:00:00.000000000', ..., '2014-12-31T06:00:00.000000000',
       '2014-12-31T12:00:00.000000000', '2014-12-31T18:00:00.000000000'],
      dtype='datetime64[ns]')

Data variables: (1)
- Tair
  (time, lat, lon)
  float32
  ...
  long_name :
  4xDaily Air temperature at sigma level 995
  units :
  degK
  precision :
  2
  GRIB_id :
  11
  GRIB_name :
  TMP
  var_desc :
  Air temperature
  dataset :
  NMC Reanalysis
  level_desc :
  Surface
  statistic :
  Individual Obs
  parent_stat :
  Other
  actual_range :
  [185.16 322.1 ]
```
[3869000 values with dtype=float32]
```
Attributes: (5)
Conventions :
COARDS
title :
4x daily NMC reanalysis (1948)
description :
Data is from NMC initialized reanalysis (4x/day). These are the 0.9950 sigma level values.
platform :
Model
references :
http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.html

previous

Save files from bucket to bucket

next

Save files to EOSC (CESNET)

By Pangeo
© Copyright 2022.