From xarray to pandas#

Import python packages#

import xarray as xr
xr.set_options(display_style='html')
import intake
import cftime
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import pandas as pd
import dask
%matplotlib inline

Open CMIP6 online catalog#

cat_url = "https://storage.googleapis.com/cmip6/pangeo-cmip6.json"
col = intake.open_esm_datastore(cat_url)
col

Search corresponding data#

cat = col.search(source_id=['CESM2-WACCM'], experiment_id=['historical'], table_id=['AERmon'], variable_id=['so2'], member_id=['r1i1p1f1'])
cat.df

Create dictionary from the list of datasets we found#

  • This step may take several minutes so be patient!

dset_dict = cat.to_dataset_dict(zarr_kwargs={'use_cftime':True})
lconf = list(dset_dict.keys())
print(lconf)

Open dataset#

  • Use xarray python package to analyze netCDF dataset

  • open_dataset allows to get all the metadata without loading data into memory.

  • with xarray, we only load into memory what is needed.

dset = dset_dict[lconf[0]]
dset = dset.squeeze()

Get metadata corresponding to the whole dataset#

dset

Get metadata corresponding to SO2#

print(dset['so2'])

zonal mean and one level and convert to pandas dataframe#

%%time 
dset_selection = dset['so2'].sel(lev=-1000, method='nearest').mean('lon').load()
dset_selection
dset_selection.sel(time=cftime.DatetimeNoLeap(2003, 10, 15), method="nearest").plot()

Convert to pandas dataframe#

%%time
pdf = dset_selection.to_dataframe()
pdf.head()

Drop a column#

pdf.drop('member_id', axis=1, inplace=True)
pdf.head()

Save to local file#

pdf.to_csv("CMIP_NCAR_CESM2-WACCM_historical_AERmon_zonal_mean.csv", sep='\t')

Save your results to Remote private object storage#

  • your credentials are in $HOME/.aws/credentials

  • check with your instructor to get the secret access key (replace XXX by the right key)

[default]
aws_access_key_id=forces2021-work
aws_secret_access_key=XXXXXXXXXXXX
aws_endpoint_url=https://forces2021.uiogeo-apps.sigma2.no/
It is important to save your results in a place that can last longer than a few days/weeks!
import s3fs
fsg = s3fs.S3FileSystem(anon=False,
      client_kwargs={
         'endpoint_url': 'https://forces2021.uiogeo-apps.sigma2.no/'
      })

Upload local file to remote storage#

s3_path =  "s3://work/annefou/CMIP_NCAR_CESM2-WACCM_historical_AERmon_zonal_mean.csv"
print(s3_path)
fsg.put('CMIP_NCAR_CESM2-WACCM_historical_AERmon_zonal_mean.csv', s3_path)