Global Earth Observation imagery and HEALPix#

Authors & Contributors#

Authors#

  • Jean-Marc Delouis, LOPS - Laboratoire d’Oceanographie Physique et Spatiale UMR 6523 CNRS-IFREMER-IRD-Univ.Brest-IUEM (France), @jmdelouis

  • Tina Odaka, LOPS - Laboratoire d’Oceanographie Physique et Spatiale UMR 6523 CNRS-IFREMER-IRD-Univ.Brest-IUEM (France), @tinaok

Contributors#

  • Anne Fouilloux, Simula (Norway), @annefou

  • Justus Magin, LOPS - Laboratoire d’Oceanographie Physique et Spatiale UMR 6523 CNRS-IFREMER-IRD-Univ.Brest-IUEM (France), @keewis

Modelling publication#

Overview

Questions
  • What is zarr?
  • What is healpix grid?
  • What is dask jupyter lab extention?
  • How do I read Copernicus Marine SST data and transform it to healpix grid?
Objectives
  • Learn about Healpix
  • Learn about Zarr, Dask, Dask Gateway, Dask Client, Scheduler, Workers

Context#

Earth Observation images naturally represent data on a spheroidal surface. In this notebook, we introduce HEALPix, a powerful indexing scheme for organizing spheroidal data, together with the Zarr storage format. By combining these two technologies, we can analyze Earth Observation data in its native form.

Because remote sensing datasets can be quite large, we also use Dask with Xarray to parallelize our data analysis.

Data#

In this episode, we will be using datasets from Copernicus Marine Service:

The datasetst are probided in Zarr format and Netcdf format and will be accessed through S3-compatible object storage or Ifremer’s https server.

Setup#

This episode uses the following main Python packages:

Please install these packages if not already available in your Python environment.

Packages#

In this episode, Python packages are imported when we start to use them. However, for best software practices, we recommend you to install and import all the necessary libraries at the top of your Jupyter notebook.

Installation of required packages#

!pip install -U xdggs copernicusmarine healpix_geo healpy flox

Hide code cell output

Collecting xdggs
  Using cached xdggs-0.3.0-py3-none-any.whl.metadata (5.2 kB)
Collecting copernicusmarine
  Using cached copernicusmarine-2.2.2-py3-none-any.whl.metadata (8.2 kB)
Collecting healpix_geo
  Using cached healpix_geo-0.0.6-cp312-cp312-manylinux_2_34_x86_64.whl.metadata (783 bytes)
Collecting healpy
  Using cached healpy-1.18.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.1 kB)
Requirement already satisfied: flox in /srv/conda/envs/notebook/lib/python3.12/site-packages (0.10.4)
Collecting flox
  Using cached flox-0.10.7-py3-none-any.whl.metadata (5.8 kB)
Requirement already satisfied: arro3-core>=0.4.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from xdggs) (0.5.1)
Collecting cdshealpix (from xdggs)
  Using cached cdshealpix-0.7.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.3 kB)
Collecting h3ronpy (from xdggs)
  Using cached h3ronpy-0.22.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.8 kB)
Requirement already satisfied: lonboard>=0.9.3 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from xdggs) (0.11.1)
Requirement already satisfied: matplotlib in /srv/conda/envs/notebook/lib/python3.12/site-packages (from xdggs) (3.10.5)
Requirement already satisfied: numpy>=2.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from xdggs) (2.2.6)
Requirement already satisfied: pooch in /srv/conda/envs/notebook/lib/python3.12/site-packages (from xdggs) (1.8.2)
Requirement already satisfied: pyproj>=3.3 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from xdggs) (3.7.1)
Requirement already satisfied: xarray in /srv/conda/envs/notebook/lib/python3.12/site-packages (from xdggs) (2025.7.1)
Collecting arcosparse<0.5.0,>=0.4.2 (from copernicusmarine)
  Using cached arcosparse-0.4.2-py3-none-any.whl.metadata (5.2 kB)
Requirement already satisfied: boto3>=1.26 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from copernicusmarine) (1.39.11)
Requirement already satisfied: click!=8.2.0,>=8.0.4 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from copernicusmarine) (8.2.1)
Requirement already satisfied: dask>=2022 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from copernicusmarine) (2025.7.0)
Requirement already satisfied: h5netcdf<2.0.0,>=1.4.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from copernicusmarine) (1.6.4)
Requirement already satisfied: pydantic<3.0.0,>=2.9.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from copernicusmarine) (2.11.7)
Requirement already satisfied: pystac>=1.8.3 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from copernicusmarine) (1.13.0)
Requirement already satisfied: requests>=2.27.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from copernicusmarine) (2.32.4)
Collecting semver>=3.0.2 (from copernicusmarine)
  Using cached semver-3.0.4-py3-none-any.whl.metadata (6.8 kB)
Requirement already satisfied: setuptools>=68.2.2 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from copernicusmarine) (80.9.0)
Requirement already satisfied: tqdm>=4.65.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from copernicusmarine) (4.67.1)
Requirement already satisfied: zarr>=2.13.3 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from copernicusmarine) (3.1.1)
Requirement already satisfied: pandas<3,>=2 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from arcosparse<0.5.0,>=0.4.2->copernicusmarine) (2.3.1)
Requirement already satisfied: pyarrow>=17.0.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from arcosparse<0.5.0,>=0.4.2->copernicusmarine) (20.0.0)
Requirement already satisfied: h5py in /srv/conda/envs/notebook/lib/python3.12/site-packages (from h5netcdf<2.0.0,>=1.4.0->copernicusmarine) (3.14.0)
Requirement already satisfied: packaging in /srv/conda/envs/notebook/lib/python3.12/site-packages (from h5netcdf<2.0.0,>=1.4.0->copernicusmarine) (25.0)
Requirement already satisfied: python-dateutil>=2.8.2 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from pandas<3,>=2->arcosparse<0.5.0,>=0.4.2->copernicusmarine) (2.9.0)
Requirement already satisfied: pytz>=2020.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from pandas<3,>=2->arcosparse<0.5.0,>=0.4.2->copernicusmarine) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from pandas<3,>=2->arcosparse<0.5.0,>=0.4.2->copernicusmarine) (2025.2)
Requirement already satisfied: annotated-types>=0.6.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from pydantic<3.0.0,>=2.9.1->copernicusmarine) (0.7.0)
Requirement already satisfied: pydantic-core==2.33.2 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from pydantic<3.0.0,>=2.9.1->copernicusmarine) (2.33.2)
Requirement already satisfied: typing-extensions>=4.12.2 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from pydantic<3.0.0,>=2.9.1->copernicusmarine) (4.14.1)
Requirement already satisfied: typing-inspection>=0.4.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from pydantic<3.0.0,>=2.9.1->copernicusmarine) (0.4.1)
Requirement already satisfied: charset_normalizer<4,>=2 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from requests>=2.27.1->copernicusmarine) (3.4.3)
Requirement already satisfied: idna<4,>=2.5 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from requests>=2.27.1->copernicusmarine) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from requests>=2.27.1->copernicusmarine) (1.26.19)
Requirement already satisfied: certifi>=2017.4.17 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from requests>=2.27.1->copernicusmarine) (2025.8.3)
Requirement already satisfied: astropy in /srv/conda/envs/notebook/lib/python3.12/site-packages (from healpy) (7.1.0)
Requirement already satisfied: numpy_groupies>=0.9.19 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from flox) (0.11.3)
Requirement already satisfied: toolz in /srv/conda/envs/notebook/lib/python3.12/site-packages (from flox) (1.0.0)
Requirement already satisfied: scipy>=1.12 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from flox) (1.16.1)
Requirement already satisfied: botocore<1.40.0,>=1.39.11 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from boto3>=1.26->copernicusmarine) (1.39.11)
Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from boto3>=1.26->copernicusmarine) (1.0.1)
Requirement already satisfied: s3transfer<0.14.0,>=0.13.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from boto3>=1.26->copernicusmarine) (0.13.1)
Requirement already satisfied: six>=1.5 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from python-dateutil>=2.8.2->pandas<3,>=2->arcosparse<0.5.0,>=0.4.2->copernicusmarine) (1.17.0)
Requirement already satisfied: cloudpickle>=3.0.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from dask>=2022->copernicusmarine) (3.1.1)
Requirement already satisfied: fsspec>=2021.09.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from dask>=2022->copernicusmarine) (2025.7.0)
Requirement already satisfied: partd>=1.4.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from dask>=2022->copernicusmarine) (1.4.2)
Requirement already satisfied: pyyaml>=5.3.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from dask>=2022->copernicusmarine) (6.0.2)
Requirement already satisfied: anywidget~=0.9.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from lonboard>=0.9.3->xdggs) (0.9.18)
Requirement already satisfied: arro3-compute>=0.4.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from lonboard>=0.9.3->xdggs) (0.5.1)
Requirement already satisfied: arro3-io>=0.4.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from lonboard>=0.9.3->xdggs) (0.5.1)
Requirement already satisfied: ipywidgets>=7.6.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from lonboard>=0.9.3->xdggs) (8.1.7)
Requirement already satisfied: traitlets>=5.7.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from lonboard>=0.9.3->xdggs) (5.14.3)
Requirement already satisfied: psygnal>=0.8.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from anywidget~=0.9.0->lonboard>=0.9.3->xdggs) (0.14.1)
Requirement already satisfied: comm>=0.1.3 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipywidgets>=7.6.0->lonboard>=0.9.3->xdggs) (0.2.3)
Requirement already satisfied: ipython>=6.1.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipywidgets>=7.6.0->lonboard>=0.9.3->xdggs) (9.4.0)
Requirement already satisfied: widgetsnbextension~=4.0.14 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipywidgets>=7.6.0->lonboard>=0.9.3->xdggs) (4.0.14)
Requirement already satisfied: jupyterlab_widgets~=3.0.15 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipywidgets>=7.6.0->lonboard>=0.9.3->xdggs) (3.0.15)
Requirement already satisfied: decorator in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipython>=6.1.0->ipywidgets>=7.6.0->lonboard>=0.9.3->xdggs) (5.2.1)
Requirement already satisfied: ipython-pygments-lexers in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipython>=6.1.0->ipywidgets>=7.6.0->lonboard>=0.9.3->xdggs) (1.1.1)
Requirement already satisfied: jedi>=0.16 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipython>=6.1.0->ipywidgets>=7.6.0->lonboard>=0.9.3->xdggs) (0.19.2)
Requirement already satisfied: matplotlib-inline in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipython>=6.1.0->ipywidgets>=7.6.0->lonboard>=0.9.3->xdggs) (0.1.7)
Requirement already satisfied: pexpect>4.3 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipython>=6.1.0->ipywidgets>=7.6.0->lonboard>=0.9.3->xdggs) (4.9.0)
Requirement already satisfied: prompt_toolkit<3.1.0,>=3.0.41 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipython>=6.1.0->ipywidgets>=7.6.0->lonboard>=0.9.3->xdggs) (3.0.51)
Requirement already satisfied: pygments>=2.4.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipython>=6.1.0->ipywidgets>=7.6.0->lonboard>=0.9.3->xdggs) (2.19.2)
Requirement already satisfied: stack_data in /srv/conda/envs/notebook/lib/python3.12/site-packages (from ipython>=6.1.0->ipywidgets>=7.6.0->lonboard>=0.9.3->xdggs) (0.6.3)
Requirement already satisfied: wcwidth in /srv/conda/envs/notebook/lib/python3.12/site-packages (from prompt_toolkit<3.1.0,>=3.0.41->ipython>=6.1.0->ipywidgets>=7.6.0->lonboard>=0.9.3->xdggs) (0.2.13)
Requirement already satisfied: parso<0.9.0,>=0.8.4 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from jedi>=0.16->ipython>=6.1.0->ipywidgets>=7.6.0->lonboard>=0.9.3->xdggs) (0.8.4)
Requirement already satisfied: locket in /srv/conda/envs/notebook/lib/python3.12/site-packages (from partd>=1.4.0->dask>=2022->copernicusmarine) (1.0.0)
Requirement already satisfied: ptyprocess>=0.5 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from pexpect>4.3->ipython>=6.1.0->ipywidgets>=7.6.0->lonboard>=0.9.3->xdggs) (0.7.0)
Requirement already satisfied: donfig>=0.8 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from zarr>=2.13.3->copernicusmarine) (0.8.1.post1)
Requirement already satisfied: numcodecs>=0.14 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from numcodecs[crc32c]>=0.14->zarr>=2.13.3->copernicusmarine) (0.16.1)
Requirement already satisfied: crc32c>=2.7 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from numcodecs[crc32c]>=0.14->zarr>=2.13.3->copernicusmarine) (2.7.1)
Requirement already satisfied: pyerfa>=2.0.1.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from astropy->healpy) (2.0.1.5)
Requirement already satisfied: astropy-iers-data>=0.2025.4.28.0.37.27 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from astropy->healpy) (0.2025.8.11.0.41.9)
Requirement already satisfied: contourpy>=1.0.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from matplotlib->xdggs) (1.3.3)
Requirement already satisfied: cycler>=0.10 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from matplotlib->xdggs) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from matplotlib->xdggs) (4.59.0)
Requirement already satisfied: kiwisolver>=1.3.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from matplotlib->xdggs) (1.4.9)
Requirement already satisfied: pillow>=8 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from matplotlib->xdggs) (11.3.0)
Requirement already satisfied: pyparsing>=2.3.1 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from matplotlib->xdggs) (3.2.3)
Requirement already satisfied: platformdirs>=2.5.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from pooch->xdggs) (4.3.8)
Requirement already satisfied: executing>=1.2.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from stack_data->ipython>=6.1.0->ipywidgets>=7.6.0->lonboard>=0.9.3->xdggs) (2.2.0)
Requirement already satisfied: asttokens>=2.1.0 in /srv/conda/envs/notebook/lib/python3.12/site-packages (from stack_data->ipython>=6.1.0->ipywidgets>=7.6.0->lonboard>=0.9.3->xdggs) (3.0.0)
Requirement already satisfied: pure_eval in /srv/conda/envs/notebook/lib/python3.12/site-packages (from stack_data->ipython>=6.1.0->ipywidgets>=7.6.0->lonboard>=0.9.3->xdggs) (0.2.3)
Using cached xdggs-0.3.0-py3-none-any.whl (41 kB)
Using cached copernicusmarine-2.2.2-py3-none-any.whl (114 kB)
Using cached arcosparse-0.4.2-py3-none-any.whl (26 kB)
Using cached healpix_geo-0.0.6-cp312-cp312-manylinux_2_34_x86_64.whl (669 kB)
Using cached healpy-1.18.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (8.9 MB)
Using cached flox-0.10.7-py3-none-any.whl (79 kB)
Using cached semver-3.0.4-py3-none-any.whl (17 kB)
Using cached cdshealpix-0.7.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
Using cached h3ronpy-0.22.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.4 MB)
Installing collected packages: semver, healpix_geo, h3ronpy, healpy, flox, cdshealpix, arcosparse, copernicusmarine, xdggs
  Attempting uninstall: flox0mâ•ș━━━━━━━━━━━━━━━━━━━━━━━━━━ 3/9 [healpy]
    Found existing installation: flox 0.10.4━━━━━━━━━━━━━━━━━━ 3/9 [healpy]
    Uninstalling flox-0.10.4:0m━━━━━━━━━━━━━━━━━━━━━━━━━━ 3/9 [healpy]
      Successfully uninstalled flox-0.10.4━━━━━━━━━━━━━━━━━━━━ 3/9 [healpy]
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9/9 [xdggs]32m5/9 [cdshealpix]
Successfully installed arcosparse-0.4.2 cdshealpix-0.7.1 copernicusmarine-2.2.2 flox-0.10.7 h3ronpy-0.22.0 healpix_geo-0.0.6 healpy-1.18.1 semver-3.0.4 xdggs-0.3.0

Import necessary libraries#

import xarray as xr
import numpy as np
import fsspec

import pint_xarray
import cf_xarray.units  

import healpy as hp
import matplotlib.pyplot as plt

import xdggs

import hvplot.xarray

import copernicusmarine
from copernicusmarine.core_functions import custom_open_zarr  

from healpix_geo.nested import lonlat_to_healpix

Create a local Dask cluster on the local machine#

from dask.distributed import Client

client = Client()   # create a local dask cluster on the local machine.
client

Client

Client-86e42ebe-9cf1-11f0-8144-5ef18bd36582

Connection method: Cluster object Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status

Cluster Info

Inspecting the Cluster Info section above gives us information about the created cluster: we have 2 or 4 workers and the same number of threads (e.g. 1 thread per worker).

Go further
  • You can also create a local cluster with the `LocalCluster` constructor and use `n_workers` and `threads_per_worker` to manually specify the number of processes and threads you want to use. For instance, we could use `n_workers=2` and `threads_per_worker=2`.
  • This is sometimes preferable (in terms of performance), or when you run this tutorial on your PC, you can avoid dask to use all your resources you have on your PC!

Dask Dashboard#

Dask comes with a really handy interface: the Dask Dashboard. It is a web interface that you can open in a separate tab of your browser.

We will learn here how to use it through the Dask JupyterLab extension.

To use the Dask Dashboard through the JupyterLab extension on the Pangeo EOSC infrastructure, you will just need to click Launch dashboard in JupyterLab at the Client configuration in your JupyterLab, and the Dask dashboard port number, as highlighted in the figure below.

Dashboard link

Dask Lab

Then click the orange icon indicated in the above figure, and type your dashboard link.

You can click several buttons indicated with red arrows in the above figures, then drag and drop them to place them as per your convenience.

Example Dask Lab

It’s really helpful to understand your computation and how it is distributed.

Data Loading: Get data from Copernicus Marine Services#

Data from Copernicus Marine Service is available in zarr format via s3-compatible object storage which make this data easily and efficiently accessible.

Lets choose the date we want to test.

time_slice = slice('2024-06-01', '2024-06-01')

Load L4 Dataset from Copernicus Marine Services#

This dataset provides a time series of gap-free maps of Sea Surface Temperature (SST) foundation at high resolution on a 0.10 x 0.10 degree grid (approximately 10 x 10 km) for the Global Ocean, updated every 24 hours.

We load the data in L4 and select one date.

Geo Chunk and Time Chunk#

Let’s try to load data in ‘geo’ chunked format and ‘time’ chunked format to see the differences.

url =  "https://s3.waw3-1.cloudferro.com/mdl-arco-geo-045/arco/SST_GLO_PHY_L4_NRT_010_043/cmems_obs-sst_glo_phy_nrt_l4_P1D-m_202303/geoChunked.zarr"
url = "https://s3.waw3-1.cloudferro.com/mdl-arco-time-045/arco/SST_GLO_PHY_L4_NRT_010_043/cmems_obs-sst_glo_phy_nrt_l4_P1D-m_202303/timeChunked.zarr"

L4 = custom_open_zarr.open_zarr(url, )#**zarr_kwargs)
L4 = copernicusmarine.open_dataset(    #dataset_id="cmems_obs-sst_glo_phy_l4_gir_P1D-m"
                                       dataset_id="cmems_obs-sst_glo_phy_nrt_l4_P1D-m",)

L4=L4.sel(time=time_slice)
L4
INFO - 2025-09-29T04:26:40Z - Selected dataset version: "202303"
INFO - 2025-09-29T04:26:40Z - Selected dataset part: "default"
<xarray.Dataset> Size: 144MB
Dimensions:           (time: 1, latitude: 1600, longitude: 3600)
Coordinates:
  * latitude          (latitude) float32 6kB -79.95 -79.85 ... 79.85 79.95
  * longitude         (longitude) float32 14kB -179.9 -179.9 ... 179.9 179.9
  * time              (time) datetime64[ns] 8B 2024-06-01
Data variables:
    analysed_sst      (time, latitude, longitude) float64 46MB dask.array<chunksize=(1, 64, 3200), meta=np.ndarray>
    analysis_error    (time, latitude, longitude) float64 46MB dask.array<chunksize=(1, 64, 3200), meta=np.ndarray>
    mask              (time, latitude, longitude) int8 6MB dask.array<chunksize=(1, 64, 3200), meta=np.ndarray>
    sea_ice_fraction  (time, latitude, longitude) float64 46MB dask.array<chunksize=(1, 64, 3200), meta=np.ndarray>
Attributes:
    Conventions:  CF-1.7, ACDD-1.3, ISO 8601
    institution:  Institut Francais de Recherche pour l'Exploitation de la me...
    source:       Odyssea L4 processor
    references:   Product User Manual for L4 Odyssea Product over the Global ...
    contact:      emmanuelle.autret@ifremer.fr;jfpiolle@ifremer.fr
    title:        ODYSSEA Global Sea Surface Temperature Gridded Level 4 Dail...
    history:      Optimally interpolated SST originally produced by Ifremer/C...
%%time
url = 'https://data-cersat.ifremer.fr/data/sea-surface-temperature/odyssea/l4/glob/nrt/data/v3.0/2023/00*/*-IFR-L4_GHRSST-SSTfnd-ODYSSEA-GLOB_010-v02.1-fv01.0.nc'

# Create an HTTP filesystem
fs = fsspec.filesystem('http')

# Use fs.glob to expand the wildcard and get all matching files
file_list = fs.glob(url)


# Open the datasets with xarray
L4 = xr.open_mfdataset([fs.open(f) for f in file_list], engine='h5netcdf').chunk({"time":1, "lat":'auto', "lon":'10M'}).persist()

L4
CPU times: user 3.73 s, sys: 1.97 s, total: 5.7 s
Wall time: 59.6 s
<xarray.Dataset> Size: 1GB
Dimensions:           (time: 9, lat: 1600, lon: 3600)
Coordinates:
  * lat               (lat) float32 6kB -79.95 -79.85 -79.75 ... 79.85 79.95
  * lon               (lon) float32 14kB -179.9 -179.9 -179.8 ... 179.9 179.9
  * time              (time) datetime64[ns] 72B 2023-01-01 ... 2023-01-09
Data variables:
    analysed_sst      (time, lat, lon) float64 415MB dask.array<chunksize=(1, 745, 1677), meta=np.ndarray>
    analysis_error    (time, lat, lon) float64 415MB dask.array<chunksize=(1, 745, 1677), meta=np.ndarray>
    mask              (time, lat, lon) int8 52MB dask.array<chunksize=(1, 1600, 3600), meta=np.ndarray>
    sea_ice_fraction  (time, lat, lon) float64 415MB dask.array<chunksize=(1, 745, 1677), meta=np.ndarray>
Attributes: (12/71)
    Conventions:                     CF-1.7, ACDD-1.3, ISO 8601
    standard_name_vocabulary:        Climate and Forecast (CF) Standard Name ...
    naming_authority:                org.ghrsst
    netcdf_version_id:               4.7.4 of Oct 31 2021 03:14:43 $
    title:                           ODYSSEA Global Sea Surface Temperature G...
    id:                              ODYSSEA-IFR-L4-GLOB_002-v3.0
    ...                              ...
    doi:                             https://doi.org/10.48670/mds-00321
    project:                         Copernicus Marine Service
    program:                         Copernicus, GHRSST
    publisher_name:                  Copernicus Marine Service
    publisher_institution:           Copernicus Marine Service
    publisher_url:                   https://marine.copernicus.eu/
Key Points
  • Where do you find attributes?
  • What kind of data variables do you find in this dataset? What are the coordinates and dimensions?
Key Points
  • `geo` and `time`: which chunk is suitable for our computation??

Lets check the unit.#

L4['analysed_sst'].attrs
{'long_name': 'analysed sea surface temperature',
 'standard_name': 'sea_surface_foundation_temperature',
 'units': 'kelvin',
 'valid_min': np.int16(-300),
 'valid_max': np.int16(4500)}
L4['analysed_sst'].attrs['units']
'kelvin'

With pint xarray one can convert units. Lets convert SST L4 from Kelvin to Celcius.

L4['analysed_sst'].pint.quantify().pint.to("degC").pint.dequantify()
<xarray.DataArray 'analysed_sst' (time: 9, lat: 1600, lon: 3600)> Size: 415MB
dask.array<truediv, shape=(9, 1600, 3600), dtype=float64, chunksize=(1, 745, 1677), chunktype=numpy.ndarray>
Coordinates:
  * lat      (lat) float32 6kB -79.95 -79.85 -79.75 -79.65 ... 79.75 79.85 79.95
  * lon      (lon) float32 14kB -179.9 -179.9 -179.8 ... 179.8 179.9 179.9
  * time     (time) datetime64[ns] 72B 2023-01-01 2023-01-02 ... 2023-01-09
Attributes:
    long_name:      analysed sea surface temperature
    standard_name:  sea_surface_foundation_temperature
    valid_min:      -300
    valid_max:      4500
    units:          degree_Celsius
ds = L4[['analysed_sst']].pint.quantify().pint.to("degC").pint.dequantify()
ds
<xarray.Dataset> Size: 415MB
Dimensions:       (time: 9, lat: 1600, lon: 3600)
Coordinates:
  * lat           (lat) float32 6kB -79.95 -79.85 -79.75 ... 79.75 79.85 79.95
  * lon           (lon) float32 14kB -179.9 -179.9 -179.8 ... 179.8 179.9 179.9
  * time          (time) datetime64[ns] 72B 2023-01-01 2023-01-02 ... 2023-01-09
Data variables:
    analysed_sst  (time, lat, lon) float64 415MB dask.array<chunksize=(1, 745, 1677), meta=np.ndarray>
Attributes: (12/71)
    Conventions:                     CF-1.7, ACDD-1.3, ISO 8601
    standard_name_vocabulary:        Climate and Forecast (CF) Standard Name ...
    naming_authority:                org.ghrsst
    netcdf_version_id:               4.7.4 of Oct 31 2021 03:14:43 $
    title:                           ODYSSEA Global Sea Surface Temperature G...
    id:                              ODYSSEA-IFR-L4-GLOB_002-v3.0
    ...                              ...
    doi:                             https://doi.org/10.48670/mds-00321
    project:                         Copernicus Marine Service
    program:                         Copernicus, GHRSST
    publisher_name:                  Copernicus Marine Service
    publisher_institution:           Copernicus Marine Service
    publisher_url:                   https://marine.copernicus.eu/

Lets try Persist() ;#

persist() load data from disk, triggers computation and keeps data as dask arrays in your memory. Please watch carefully the dask lab view

ds=ds.persist()
ds
<xarray.Dataset> Size: 415MB
Dimensions:       (time: 9, lat: 1600, lon: 3600)
Coordinates:
  * lat           (lat) float32 6kB -79.95 -79.85 -79.75 ... 79.75 79.85 79.95
  * lon           (lon) float32 14kB -179.9 -179.9 -179.8 ... 179.8 179.9 179.9
  * time          (time) datetime64[ns] 72B 2023-01-01 2023-01-02 ... 2023-01-09
Data variables:
    analysed_sst  (time, lat, lon) float64 415MB dask.array<chunksize=(1, 745, 1677), meta=np.ndarray>
Attributes: (12/71)
    Conventions:                     CF-1.7, ACDD-1.3, ISO 8601
    standard_name_vocabulary:        Climate and Forecast (CF) Standard Name ...
    naming_authority:                org.ghrsst
    netcdf_version_id:               4.7.4 of Oct 31 2021 03:14:43 $
    title:                           ODYSSEA Global Sea Surface Temperature G...
    id:                              ODYSSEA-IFR-L4-GLOB_002-v3.0
    ...                              ...
    doi:                             https://doi.org/10.48670/mds-00321
    project:                         Copernicus Marine Service
    program:                         Copernicus, GHRSST
    publisher_name:                  Copernicus Marine Service
    publisher_institution:           Copernicus Marine Service
    publisher_url:                   https://marine.copernicus.eu/

Save Dataset to local Zarr#

Zarr is a data format for storing chunked, compressed, N-dimensional arrays.

ds.to_zarr('SST.zarr', mode='w')
/srv/conda/envs/notebook/lib/python3.12/site-packages/zarr/api/asynchronous.py:228: UserWarning: Consolidated metadata is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.
  warnings.warn(
<xarray.backends.zarr.ZarrStore at 0x739761578e00>
!ls
Bi-linear-plot.png	       agenda.md	   intro.md
SST.zarr		       assets		   landslide_xbatcher.ipynb
SST_AI.ipynb		       conf.py		   pangeo.md
Xbatcher_Sentinel2_wave.ipynb  eo4eu_bids25.ipynb  references.bib
_config.yml		       eosc-pangeo.md	   users-getting-started.md
_toc.yml		       healpix.zarr
afterword		       images
!ls -lart SST.zarr/analysed_sst/c/0
total 3
drwxr-xr-x 11 jovyan jovyan 4096 Sep 29 04:27 ..
drwxr-xr-x  5 jovyan jovyan 4096 Sep 29 04:27 .
drwxr-xr-x  2 jovyan jovyan 4096 Sep 29 04:27 0
drwxr-xr-x  2 jovyan jovyan 4096 Sep 29 04:28 2
drwxr-xr-x  2 jovyan jovyan 4096 Sep 29 04:28 1
!cat SST.zarr/zarr.json 
{
  "attributes": {
    "Conventions": "CF-1.7, ACDD-1.3, ISO 8601",
    "standard_name_vocabulary": "Climate and Forecast (CF) Standard Name Table v79",
    "naming_authority": "org.ghrsst",
    "netcdf_version_id": "4.7.4 of Oct 31 2021 03:14:43 $",
    "title": "ODYSSEA Global Sea Surface Temperature Gridded Level 4 Daily Multi-Sensor Observations",
    "id": "ODYSSEA-IFR-L4-GLOB_002-v3.0",
    "cmems_product_id": "SST_GLO_PHY_L4_NRT_010_043",
    "summary": "This dataset provide a times series of daily multi-sensor optimal interpolation of Sea Surface Temperature (SST) foundation over Global Ocean on a 0.1 degree resolution grid, every 24 hours. It is produced for the Copernicus Marine Service.",
    "references": "Piolle J. F., Autret E., Arino O., Robinson I.S, Le Borgne P., (2010), Medspiration, toward the sustained delivery of satellite SST products and services over regional seas, Proceedings of the 2010 ESA Living Planet Symposium Bergen.",
    "processing_level": "L4",
    "keywords": "Oceans > Ocean Temperature > Sea Surface Temperature",
    "keywords_vocabulary": "NASA Global Change Master Directory (GCMD) Science Keywords",
    "institution": "Institut Francais de Recherche pour l'Exploitation de la mer / Centre d'Exploitation et de Recherche Satellitaire",
    "institution_abbreviation": "Ifremer/CERSAT",
    "license": "These data are available free of charge under the CMEMS data policy, refer to http://marine.copernicus.eu/services-portfolio/service-commitments-and-licence/",
    "citation": "Ifremer / CERSAT. 2022. ODYSSEA Global High-Resolution Sea Surface Temperature Gridded Level 4 Daily dataset (v3.0) for Copernicus Marine Service. Ver. 3.0. Ifremer, Plouzane, France. Dataset accessed [YYYY-MM-DD].",
    "contact": "emmanuelle.autret@ifremer.fr;jfpiolle@ifremer.fr",
    "technical_support_contact": "cersat@ifremer.fr",
    "scientific_support_contact": "emmanuelle.autret@ifremer.fr;jfpiolle@ifremer.fr",
    "creator_email": "cersat@ifremer.fr",
    "creator_type": "institution",
    "creator_institution": "Ifremer / CERSAT",
    "creator_name": "CERSAT",
    "creator_url": "http://cersat.ifremer.fr",
    "format_version": "GHRSST GDS v2.1",
    "gds_version_id": "2.1",
    "processing_software": "Odyssea 3.0",
    "source": "Odyssea L4 processor",
    "geospatial_bounds": "POLYGON ((-180.0 -80.0, 180.0 -80.0, 180.0 80.0, -180.0 80.0, -180.0 -80.0))",
    "geospatial_bounds_crs": "EPSG:4326",
    "geospatial_bounds_vertical_crs": "EPSG:5831",
    "geospatial_lat_max": 80.0,
    "geospatial_lat_min": -80.0,
    "geospatial_lat_resolution": 0.1,
    "geospatial_lat_units": "degrees_north",
    "geospatial_lon_max": 180.0,
    "geospatial_lon_min": -180.0,
    "geospatial_lon_resolution": 0.1,
    "geospatial_lon_units": "degrees_east",
    "geospatial_vertical_min": 0.0,
    "geospatial_vertical_max": 0.0,
    "time_coverage_start": "2022-12-31T12:00:00",
    "time_coverage_end": "2023-01-01T12:00:00",
    "time_coverage_resolution": "P1D",
    "spatial_resolution": "0.1 degree",
    "temporal_resolution": "daily",
    "cdm_data_type": "grid",
    "source_data": "AVHRR_SST_METOP_B-OSISAF-L2P-v1.0 SLSTRA_MAR_L2P_v1.0 SLSTRB_MAR_L2P_v1.0 VIIRS_N20-STAR-L2P-v2.80 VIIRS_NPP-STAR-L2P-v2.80 GOES16-OSISAF-L3C-V1.0 G17-STAR-L3C-V2.71 SEVIRI_SST-OSISAF-L3C-V1.0 SEVIRI_IO_SST-OSISAF-L3C-V1.0 AHI_H08-STAR-L3C-v2.7 AMSR2-REMSS-L2P-v8.2",
    "platform": "Metop-B Sentinel-3A Sentinel-3B NOAA-20 NPP GOES16 GOES17 Meteosat-11 Meteosat-9 Himawari-8 GCOM-W",
    "platform_type": "low_earth_orbit_satellite low_earth_orbit_satellite low_earth_orbit_satellite low_earth_orbit_satellite low_earth_orbit_satellite high_earth_orbit_satellite high_earth_orbit_satellite high_earth_orbit_satellite high_earth_orbit_satellite low_earth_orbit_satellite low_earth_orbit_satellite",
    "platform_vocabulary": "CEOS mission table",
    "instrument": "AVHRR/3 SLSTR SLSTR VIIRS VIIRS ABI ABI SEVIRI SEVIRI AHI AMSR2",
    "instrument_type": "infrared_radiometer infrared_radiometer infrared_radiometer infrared_radiometer infrared_radiometer infrared_radiometer infrared_radiometer infrared_radiometer infrared_radiometer infrared_radiometer microwave_radiometer",
    "instrument_vocabulary": "CEOS instrument table",
    "product_version": " 1.0",
    "date_created": "2023-03-02T15:22:05",
    "date_modified": "2023-03-02T15:22:05",
    "date_issued": "2023-03-02T15:22:05",
    "date_metadata_modified": "2022-03-01T00:00:00",
    "history": "Optimally interpolated SST originally produced by Ifremer/CERSAT with Odyssea processor 3.0",
    "uuid": "4328118f-924a-42b8-b7b8-f9a8beabe86a",
    "file_quality_level": 3,
    "source_version": "3.0",
    "acknowledgment": "This dataset is funded by Copernicus Marine Service",
    "metadata_link": "https://data.marine.copernicus.eu/product/SST_GLO_PHY_L4_NRT_010_043/description",
    "doi": "https://doi.org/10.48670/mds-00321",
    "project": "Copernicus Marine Service",
    "program": "Copernicus, GHRSST",
    "publisher_name": "Copernicus Marine Service",
    "publisher_institution": "Copernicus Marine Service",
    "publisher_url": "https://marine.copernicus.eu/"
  },
  "zarr_format": 3,
  "consolidated_metadata": {
    "kind": "inline",
    "must_understand": false,
    "metadata": {
      "lon": {
        "shape": [
          3600
        ],
        "data_type": "float32",
        "chunk_grid": {
          "name": "regular",
          "configuration": {
            "chunk_shape": [
              3600
            ]
          }
        },
        "chunk_key_encoding": {
          "name": "default",
          "configuration": {
            "separator": "/"
          }
        },
        "fill_value": 0.0,
        "codecs": [
          {
            "name": "bytes",
            "configuration": {
              "endian": "little"
            }
          },
          {
            "name": "zstd",
            "configuration": {
              "level": 0,
              "checksum": false
            }
          }
        ],
        "attributes": {
          "long_name": "longitude",
          "standard_name": "longitude",
          "axis": "X",
          "authority": "CF-1.7",
          "valid_range": [
            -180.0,
            180.0
          ],
          "coverage_content_type": "coordinate",
          "comment": "geographical coordinates, WGS84 projection",
          "units": "degrees_east",
          "_FillValue": "AAAAAAAA+H8="
        },
        "dimension_names": [
          "lon"
        ],
        "zarr_format": 3,
        "node_type": "array",
        "storage_transformers": []
      },
      "lat": {
        "shape": [
          1600
        ],
        "data_type": "float32",
        "chunk_grid": {
          "name": "regular",
          "configuration": {
            "chunk_shape": [
              1600
            ]
          }
        },
        "chunk_key_encoding": {
          "name": "default",
          "configuration": {
            "separator": "/"
          }
        },
        "fill_value": 0.0,
        "codecs": [
          {
            "name": "bytes",
            "configuration": {
              "endian": "little"
            }
          },
          {
            "name": "zstd",
            "configuration": {
              "level": 0,
              "checksum": false
            }
          }
        ],
        "attributes": {
          "long_name": "latitude",
          "standard_name": "latitude",
          "axis": "Y",
          "authority": "CF-1.7",
          "valid_range": [
            -90.0,
            90.0
          ],
          "coverage_content_type": "coordinate",
          "comment": "geographical coordinates, WGS84 projection",
          "units": "degrees_north",
          "_FillValue": "AAAAAAAA+H8="
        },
        "dimension_names": [
          "lat"
        ],
        "zarr_format": 3,
        "node_type": "array",
        "storage_transformers": []
      },
      "time": {
        "shape": [
          9
        ],
        "data_type": "float64",
        "chunk_grid": {
          "name": "regular",
          "configuration": {
            "chunk_shape": [
              9
            ]
          }
        },
        "chunk_key_encoding": {
          "name": "default",
          "configuration": {
            "separator": "/"
          }
        },
        "fill_value": 0.0,
        "codecs": [
          {
            "name": "bytes",
            "configuration": {
              "endian": "little"
            }
          },
          {
            "name": "zstd",
            "configuration": {
              "level": 0,
              "checksum": false
            }
          }
        ],
        "attributes": {
          "long_name": "reference time of field",
          "standard_name": "time",
          "axis": "T",
          "authority": "CF-1.7",
          "coverage_content_type": "coordinate",
          "units": "seconds since 1981-01-01",
          "calendar": "proleptic_gregorian",
          "_FillValue": "AAAAAAAA+H8="
        },
        "dimension_names": [
          "time"
        ],
        "zarr_format": 3,
        "node_type": "array",
        "storage_transformers": []
      },
      "analysed_sst": {
        "shape": [
          9,
          1600,
          3600
        ],
        "data_type": "int16",
        "chunk_grid": {
          "name": "regular",
          "configuration": {
            "chunk_shape": [
              1,
              745,
              1677
            ]
          }
        },
        "chunk_key_encoding": {
          "name": "default",
          "configuration": {
            "separator": "/"
          }
        },
        "fill_value": 0,
        "codecs": [
          {
            "name": "bytes",
            "configuration": {
              "endian": "little"
            }
          },
          {
            "name": "zstd",
            "configuration": {
              "level": 0,
              "checksum": false
            }
          }
        ],
        "attributes": {
          "long_name": "analysed sea surface temperature",
          "standard_name": "sea_surface_foundation_temperature",
          "valid_min": -300,
          "valid_max": 4500,
          "units": "degree_Celsius",
          "add_offset": 273.15,
          "scale_factor": 0.01,
          "_FillValue": -32768
        },
        "dimension_names": [
          "time",
          "lat",
          "lon"
        ],
        "zarr_format": 3,
        "node_type": "array",
        "storage_transformers": []
      }
    }
  },
  "node_type": "group"
}
Key Points
  • What is 'zarr' format?

Data preparation#

Lets open the zarr file we prepared, and this time lets use hvplot for plotting the sea surface temperature.

ds = xr.open_dataset("SST.zarr", engine="zarr",chunks={})#.isel(time=0)#.persist()
ds
<xarray.Dataset> Size: 415MB
Dimensions:       (time: 9, lat: 1600, lon: 3600)
Coordinates:
  * lon           (lon) float32 14kB -179.9 -179.9 -179.8 ... 179.8 179.9 179.9
  * lat           (lat) float32 6kB -79.95 -79.85 -79.75 ... 79.75 79.85 79.95
  * time          (time) datetime64[ns] 72B 2023-01-01 2023-01-02 ... 2023-01-09
Data variables:
    analysed_sst  (time, lat, lon) float64 415MB dask.array<chunksize=(1, 745, 1677), meta=np.ndarray>
Attributes: (12/71)
    Conventions:                     CF-1.7, ACDD-1.3, ISO 8601
    standard_name_vocabulary:        Climate and Forecast (CF) Standard Name ...
    naming_authority:                org.ghrsst
    netcdf_version_id:               4.7.4 of Oct 31 2021 03:14:43 $
    title:                           ODYSSEA Global Sea Surface Temperature G...
    id:                              ODYSSEA-IFR-L4-GLOB_002-v3.0
    ...                              ...
    doi:                             https://doi.org/10.48670/mds-00321
    project:                         Copernicus Marine Service
    program:                         Copernicus, GHRSST
    publisher_name:                  Copernicus Marine Service
    publisher_institution:           Copernicus Marine Service
    publisher_url:                   https://marine.copernicus.eu/
ds['analysed_sst'].hvplot(y='lat', x='lon', width=800, height=400, rasterize=True, geo=True)

Convert Data in HEALPix#

Because we work with full globe data, we need to treat the grid system that conveys the shape of the sphere. To do that, we use one of the Discrete Global Grid Systems (DGGS), HEALPix.

HEALPix stands for Hierarchical Equal Area isoLatitude Pixelation of a sphere. This pixelation produces a subdivision of a spherical surface in which each pixel covers the same surface area as every other pixel. HEALPix

See https://healpix.sourceforge.io and/or the HEALPix Primer for more information.

The healpy tutorial is also a very good starting point to understand more about HEALPix.

Resolution#

The resolution of the grid is expressed by the parameter Nside, which defines the number of divisions along the side of a base-resolution pixel that is needed to reach a desired high-resolution partition. In Discrete Global Grid Systems (DGGS), most common parameter to express these Hierarchical level is called Refinement Level, which we express as refinement_level. The relation between Nside and Refinment Level is expressed as Nside=2**refinement_level

The total number of cell can be computed as

12*4**level, or 12*nside**2

Ordering Systems#

HEALPix supports two pixel ordering systems: nested and ring.

Detailed explanations of the two pixel ordering systems can be found at https://healpix.jpl.nasa.gov/html/intronode4.htm.

In our example we use nested

ds = xr.open_dataset("SST.zarr", engine="zarr", chunks={}).compute()
ds
<xarray.Dataset> Size: 415MB
Dimensions:       (time: 9, lat: 1600, lon: 3600)
Coordinates:
  * lon           (lon) float32 14kB -179.9 -179.9 -179.8 ... 179.8 179.9 179.9
  * lat           (lat) float32 6kB -79.95 -79.85 -79.75 ... 79.75 79.85 79.95
  * time          (time) datetime64[ns] 72B 2023-01-01 2023-01-02 ... 2023-01-09
Data variables:
    analysed_sst  (time, lat, lon) float64 415MB nan nan nan ... -1.9 -1.9 -1.9
Attributes: (12/71)
    Conventions:                     CF-1.7, ACDD-1.3, ISO 8601
    standard_name_vocabulary:        Climate and Forecast (CF) Standard Name ...
    naming_authority:                org.ghrsst
    netcdf_version_id:               4.7.4 of Oct 31 2021 03:14:43 $
    title:                           ODYSSEA Global Sea Surface Temperature G...
    id:                              ODYSSEA-IFR-L4-GLOB_002-v3.0
    ...                              ...
    doi:                             https://doi.org/10.48670/mds-00321
    project:                         Copernicus Marine Service
    program:                         Copernicus, GHRSST
    publisher_name:                  Copernicus Marine Service
    publisher_institution:           Copernicus Marine Service
    publisher_url:                   https://marine.copernicus.eu/

Define the HEALPix resolution#

refinement_level=7

nside = 2 ** refinement_level
nest = True
full_cell_ids = range(0, 12*nside**2)
full_cell_ids
range(0, 196608)

Transformation to HEALPix#

# --- single-dataset transform -> grouped by unique HEALPix cell_ids ---
def to_healpix_cells_grouped_mean(
    ds: xr.Dataset, level: int | None = None, ellipsoid: str = "WGS84"
) -> xr.Dataset:
    """
    Returns a dataset with dims (cell_ids), where 'cell_ids' is a
    dimension/coordinate containing unique HEALPix ids (NESTED).
    Values are averaged over all source samples that mapped to the same cell.
    """
    # 1) stack (longitude,latitude) -> cells
    ds=ds.stack(cells=("lon", "lat"))#.chunk({"time":1, "cells":'10M'}).persist()

    # 2) hash each (longitude,latitude) to HEALPix nested cell id
    cell_ids = lonlat_to_healpix(ds.lon, ds.lat, level, ellipsoid=ellipsoid)

    # 3) attach cell_ids coord on 'cells'
    ds = ds.assign_coords(cell_ids=("cells", cell_ids.astype("int64")))
    ds["cell_ids"].attrs.update(
        {
            "grid_name": "healpix",
            "level": level,
            "indexing_scheme": "nested",
        }
    )
    cell_ids_attrs = dict(ds["cell_ids"].attrs)  # keep for after groupby

    # 4) group by cell_ids and average -> new dim named 'cell_ids'
    # **note** This is a very simplified test conversion, 
    # Next example will do this in sofisticated way. 
    #group by and take mean, skipping np.nan values, so that temperature data does not get affected. (skipna=True )        
    ds = ds.groupby("cell_ids").mean(skipna=True, keep_attrs=True) 

    # 5) restore attrs on the new dimension coordinate
    if "cell_ids" in ds.coords:
        ds["cell_ids"].attrs.update(cell_ids_attrs)

    return ds

ds_healpix = to_healpix_cells_grouped_mean(ds,  refinement_level, )
ds_healpix
<xarray.Dataset> Size: 16MB
Dimensions:       (cell_ids: 193800, time: 9)
Coordinates:
  * time          (time) datetime64[ns] 72B 2023-01-01 2023-01-02 ... 2023-01-09
  * cell_ids      (cell_ids) int64 2MB 0 1 2 3 4 ... 196604 196605 196606 196607
Data variables:
    analysed_sst  (cell_ids, time) float64 14MB 26.62 26.79 ... 27.97 28.03
Attributes: (12/71)
    Conventions:                     CF-1.7, ACDD-1.3, ISO 8601
    standard_name_vocabulary:        Climate and Forecast (CF) Standard Name ...
    naming_authority:                org.ghrsst
    netcdf_version_id:               4.7.4 of Oct 31 2021 03:14:43 $
    title:                           ODYSSEA Global Sea Surface Temperature G...
    id:                              ODYSSEA-IFR-L4-GLOB_002-v3.0
    ...                              ...
    doi:                             https://doi.org/10.48670/mds-00321
    project:                         Copernicus Marine Service
    program:                         Copernicus, GHRSST
    publisher_name:                  Copernicus Marine Service
    publisher_institution:           Copernicus Marine Service
    publisher_url:                   https://marine.copernicus.eu/

Set chunks#

chunk() re-set the chunks from i’ts original chunk form we loaded from the zarr file.

ds_healpix = ds_healpix.chunk(4**refinement_level).persist()
ds_healpix.to_zarr('healpix.zarr', mode='w')
ds_healpix
/srv/conda/envs/notebook/lib/python3.12/site-packages/distributed/client.py:3371: UserWarning: Sending large graph of size 13.31 MiB.
This may cause some slowdown.
Consider loading the data with Dask directly
 or using futures or delayed objects to embed the data into the graph without repetition.
See also https://docs.dask.org/en/stable/best-practices.html#load-data-with-dask for more information.
  warnings.warn(
/srv/conda/envs/notebook/lib/python3.12/site-packages/zarr/api/asynchronous.py:228: UserWarning: Consolidated metadata is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.
  warnings.warn(
<xarray.Dataset> Size: 16MB
Dimensions:       (cell_ids: 193800, time: 9)
Coordinates:
  * time          (time) datetime64[ns] 72B 2023-01-01 2023-01-02 ... 2023-01-09
  * cell_ids      (cell_ids) int64 2MB 0 1 2 3 4 ... 196604 196605 196606 196607
Data variables:
    analysed_sst  (cell_ids, time) float64 14MB dask.array<chunksize=(16384, 9), meta=np.ndarray>
Attributes: (12/71)
    Conventions:                     CF-1.7, ACDD-1.3, ISO 8601
    standard_name_vocabulary:        Climate and Forecast (CF) Standard Name ...
    naming_authority:                org.ghrsst
    netcdf_version_id:               4.7.4 of Oct 31 2021 03:14:43 $
    title:                           ODYSSEA Global Sea Surface Temperature G...
    id:                              ODYSSEA-IFR-L4-GLOB_002-v3.0
    ...                              ...
    doi:                             https://doi.org/10.48670/mds-00321
    project:                         Copernicus Marine Service
    program:                         Copernicus, GHRSST
    publisher_name:                  Copernicus Marine Service
    publisher_institution:           Copernicus Marine Service
    publisher_url:                   https://marine.copernicus.eu/

Load HEALPix Zarr from local Zarr file, and plot#

xr.open_zarr('healpix.zarr',chunks={}).pipe(xdggs.decode)["analysed_sst"].compute().dggs.explore()

HEALPix and spherical harmonics#

Healpy is designed to represent the sphere using spherical harmonics functions. We will use linear regression against spherical harmonics functions to fill the gaps caused by clouds.

How Do Spherical Harmonics Look?#

We will construct a function ( F ), which is defined as follows.

\( F = [\mathbb{R} {A_{00}},R{A_{10}}, R{A_{11}}, I{A_{11}},..,R {A_{l_{max}m_{max}}} I {A_{l_{max}m_{max}}} ] \)

lmax=15
nside=128
#compute Alm to fit
#get the l and m availble for l<=lmax
l,m=hp.Alm.getlm(lmax=lmax)

#count the number of alm map (1 for m=0 and 2 for m>0)
n_alm=(m==0).sum()+2*(m>0).sum()
function=np.zeros([n_alm,12*nside**2])

alm=np.zeros([l.shape[0]],dtype='complex')

i=0

#array to store the l and m values of the A_lm
l_func=np.zeros(n_alm,dtype='int')
m_func=np.zeros(n_alm,dtype='int')
is_real_func=np.zeros(n_alm,dtype='int')

for k in range(l.shape[0]):
    alm[k]=1.0
    function[i]=hp.reorder(hp.alm2map(alm,nside),r2n=True)
    l_func[i]=l[k]
    m_func[i]=m[k]
    is_real_func[i]=1
    i+=1
    if m[k]>0:
        alm[k]=complex(0,1)
        function[i]=hp.reorder(hp.alm2map(alm,nside),r2n=True)
        l_func[i]=l[k]
        m_func[i]=m[k]
        is_real_func[i]=0
        i+=1
    alm[k]=0.0
lm=3
plt.figure(figsize=(12,5))
for k in range(l_func.shape[0]):
    pos=1+l_func[k]*(2*lm+1)+2*(is_real_func[k]-0.5)*m_func[k]-1+(lm+1)
    if is_real_func[k]==1:
        title='$\mathbb{R}(A_{\ell=%d,m=%d})$'%(l_func[k],m_func[k])
    else:
        title='$\mathbb{I}(A_{\ell=%d,m=%d})$'%(l_func[k],m_func[k])
    if l_func[k]<=lm:
        hp.mollview(function[k],nest=True,hold=False,sub=(lm+1,2*lm+1,pos)
                    ,title=title,cbar=False,cmap='coolwarm')
<>:6: SyntaxWarning: invalid escape sequence '\m'
<>:8: SyntaxWarning: invalid escape sequence '\m'
<>:6: SyntaxWarning: invalid escape sequence '\m'
<>:8: SyntaxWarning: invalid escape sequence '\m'
/tmp/ipykernel_324/1890859632.py:6: SyntaxWarning: invalid escape sequence '\m'
  title='$\mathbb{R}(A_{\ell=%d,m=%d})$'%(l_func[k],m_func[k])
/tmp/ipykernel_324/1890859632.py:8: SyntaxWarning: invalid escape sequence '\m'
  title='$\mathbb{I}(A_{\ell=%d,m=%d})$'%(l_func[k],m_func[k])
_images/099f8dc3fa9dfa94d58ddc03b3c56072c5d8903efc3206d7b3241d2073999514.png
client.close()
Key Points
  • HALPIx
  • Access, read and get metadata from remote and local zarr

References#

[DAGB22]

J.-M. Delouis, E. Allys, E. Gauvrit, and F. Boulanger. Non-gaussian modelling and statistical denoising of planck dust polarisation full-sky maps using scattering transforms. Astronomy &amp; Astrophysics, 668:A122, December 2022. URL: http://dx.doi.org/10.1051/0004-6361/202244566, doi:10.1051/0004-6361/202244566.

Packages citation#

[AZLSc+24]

Andrea Zonca, Leo Singer, crosset, mreineck, Trygve Leithe Svalheim, Daniel Lenz, Reijo Keskitalo, Walt Ogburn, Alex Drlica-Wagner, Xavier Garrido, Matthew Petroff, Paul Price, Pierre Chanial, Nicolas Tessore, Samuel Wyatt, mlejeune, Duncan Watts, Andrew Pontzen, Eirik GjerlÞw, Thomas Robitaille, j-erler, jvavrek, Yu Feng, Duncan Macleod, Craig J Copi, Evert Rol, Maurizio Tomasi, Mathew S. Madhavacheril, and Andrés Asensio Ramos. Healpy/healpy: 1.17.1. 2024. URL: https://zenodo.org/doi/10.5281/zenodo.11337740, doi:10.5281/ZENODO.11337740.

[Cra23]

Fabio Crameri. Scientific colour maps. 2023. URL: https://zenodo.org/record/1243862, doi:10.5281/ZENODO.1243862.

[DFM+24]

JM. Delouis, T. Foulquier, L. Mousset, T. Odaka, F. Paul, and E. Allys. Foscat/foscat: 3.0.33. 2024. URL: jmdelouis/FOSCAT, doi:10.48550/ARXIV.2207.12527.

[FG24]

BRIOL F and Eynard-Bontemps G. Pyinterp/pyinterp: 2024.6.0. 2024. URL: CNES/pangeo-pyinterp.

[HMvdW+20]

Charles R. Harris, K. Jarrod Millman, StĂ©fan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Haldane, Jaime FernĂĄndez del RĂ­o, Mark Wiebe, Pearu Peterson, Pierre GĂ©rard-Marchant, Kevin Sheppard, Tyler Reddy, Warren Weckesser, Hameer Abbasi, Christoph Gohlke, and Travis E. Oliphant. Array programming with NumPy. Nature, 585(7825):357–362, September 2020. URL: https://doi.org/10.1038/s41586-020-2649-2, doi:10.1038/s41586-020-2649-2.

[HH17]

S. Hoyer and J. Hamman. Xarray: N-D labeled arrays and datasets in Python. Journal of Open Research Software, 2017. URL: https://doi.org/10.5334/jors.148, doi:10.5334/jors.148.

[Hun07]

J. D. Hunter. Matplotlib: a 2d graphics environment. Computing in Science & Engineering, 9(3):90–95, 2007. doi:10.1109/MCSE.2007.55.

[MBS+24]

J. Magin, B. Bovy, R. Scott, A. Kmoch, A. Coca-Castro, and D. Loos. Xdggs/xdggs: 0.0.1. 2024. URL: xarray-contrib/xdggs.

[RSB+20]

Philipp Rudiger, Jean-Luc Stevens, James A. Bednar, Bas Nijholt, Andrew, Chris B, Achim Randelhoff, Jon Mease, Vasco Tenner, maxalbert, Markus Kaiser, ea42gh, Jordan Samuels, stonebig, Florian LB, Andrew Tolmie, Daniel Stephan, Scott Lowe, John Bampton, henriqueribeiro, Irv Lustig, Julia Signell, Justin Bois, Leopold Talirz, Lukas Barth, Maxime Liquet, Ram Rachum, Yuval Langer, arabidopsis, and kbowen. Holoviz/holoviews: version 1.13.3. June 2020. URL: https://doi.org/10.5281/zenodo.3904606, doi:10.5281/zenodo.3904606.