From xarray to pandas
Contents
From xarray to pandas#
Import python packages#
import xarray as xr
xr.set_options(display_style='html')
import intake
import cftime
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import pandas as pd
import dask
%matplotlib inline
Open CMIP6 online catalog#
cat_url = "https://storage.googleapis.com/cmip6/pangeo-cmip6.json"
col = intake.open_esm_datastore(cat_url)
col
Search corresponding data#
cat = col.search(source_id=['CESM2-WACCM'], experiment_id=['historical'], table_id=['AERmon'], variable_id=['so2'], member_id=['r1i1p1f1'])
cat.df
Create dictionary from the list of datasets we found#
This step may take several minutes so be patient!
dset_dict = cat.to_dataset_dict(zarr_kwargs={'use_cftime':True})
lconf = list(dset_dict.keys())
print(lconf)
Open dataset#
Use
xarray
python package to analyze netCDF datasetopen_dataset
allows to get all the metadata without loading data into memory.with
xarray
, we only load into memory what is needed.
dset = dset_dict[lconf[0]]
dset = dset.squeeze()
Get metadata corresponding to the whole dataset#
dset
Get metadata corresponding to SO2#
print(dset['so2'])
zonal mean and one level and convert to pandas dataframe#
%%time
dset_selection = dset['so2'].sel(lev=-1000, method='nearest').mean('lon').load()
dset_selection
dset_selection.sel(time=cftime.DatetimeNoLeap(2003, 10, 15), method="nearest").plot()
Convert to pandas dataframe#
%%time
pdf = dset_selection.to_dataframe()
pdf.head()
Drop a column#
pdf.drop('member_id', axis=1, inplace=True)
pdf.head()
Save to local file#
pdf.to_csv("CMIP_NCAR_CESM2-WACCM_historical_AERmon_zonal_mean.csv", sep='\t')
Save your results to Remote private object storage#
your credentials are in
$HOME/.aws/credentials
check with your instructor to get the secret access key (replace XXX by the right key)
[default]
aws_access_key_id=forces2021-work
aws_secret_access_key=XXXXXXXXXXXX
aws_endpoint_url=https://forces2021.uiogeo-apps.sigma2.no/
It is important to save your results in a place that can last longer than a few days/weeks!
import s3fs
fsg = s3fs.S3FileSystem(anon=False,
client_kwargs={
'endpoint_url': 'https://forces2021.uiogeo-apps.sigma2.no/'
})
Upload local file to remote storage#
s3_path = "s3://work/annefou/CMIP_NCAR_CESM2-WACCM_historical_AERmon_zonal_mean.csv"
print(s3_path)
fsg.put('CMIP_NCAR_CESM2-WACCM_historical_AERmon_zonal_mean.csv', s3_path)