Handling multi-dimensional arrays with xarray#

Authors & Contributors#

Authors#

Pier Lorenzo Marasco, Ispra (Italy), @pl-marasco

Contributors#

Alejandro Coca-Castro, The Alan Turing Institute (United Kingdom), @acocac
Anne Fouilloux, University of Oslo (Norway), @annefou
Guillaume Eynard-Bontemps, CNES (France), @guillaumeeb

Overview

Questions

What is Xarray?
How to open a local file?
How to print metadata information?
How to make a selection?
How to visualize with matplotlib?
How to perform basic computations, statistics and aggregations?
How to mask data?

Objectives

Learn about Xarray Python ecosystem
Learn file handling with xarray
Learn to get metadata information
Learn to select and mask data
Learn to make basic computations, aggregations and statistics
Learnn to visualize data

Context#

We will be using the Pangeo open-source software stack for computing and visualizing the Vegetation Condition Index (VCI) [Kog95], a well-established indicator to estimate droughts from remote sensing data.

VCI compares the current normalized difference vegetation index (NDVI) [Wik2)] to the range of values observed in previous years.

Data#

In this episode, we will use Sentinel-3 NDVI Analysis Ready Data (ARD) provided by the Copernicus Global Land Service.

This dataset can be discovered through the OpenEO API from the CGLS distributor, VITO. Access is free of charge but an EGI registration is needed.

The same dataset can also be downloaded from Zenodo: FOSS4G Training Datasets: NDVI

Further info about drought indices can be found in the Integrated Drought Management Programme (see here).

Setup#

This episode uses the following main Python packages:

xarray [HH17] with netCDF4 and h5netcdf engines
pooch [USR+20]
numpy [HMvdW+20]

Please install these packages if they are not already available in your Python environment (see Setup page).

Packages#

In this episode, Python packages are imported when we start to use them. However, for best software practices, we recommend that you install and import all the necessary libraries at the top of your Jupyter notebook.

import xarray as xr

Fetch Data#

For now we will fetch a netCDF file containing Sentinel-3 NDVI Analysis Ready Data (ARD).
The file is available in a Zenodo repository. We will download it using using pooch, a very handy Python-based library to download and cache your data files locally (see further info here)
In the Data access and discovery episode, we will learn about different ways to access data, including access to remote data.

import pooch

cgls_file = pooch.retrieve(
    url="https://zenodo.org/record/6969999/files/C_GLS_NDVI_20220101_20220701_Lombardia_S3_2.nc",
    known_hash="md5:bbb25f1865056c886c6f9b37147d8f2f",
    path=f".",
)

Downloading data from 'https://zenodo.org/record/6969999/files/C_GLS_NDVI_20220101_20220701_Lombardia_S3_2.nc' to file '/home/runner/work/foss4g-2022/foss4g-2022/tutorial/pangeo101/4d4b4841cc038396b6f78fb014a6b538-C_GLS_NDVI_20220101_20220701_Lombardia_S3_2.nc'.

What is xarray?#

Xarray introduces labels in the form of dimensions, coordinates and attributes on top of raw NumPy-like multi-dimensional arrays, which allows for a more intuitive, more concise, and less error-prone developer experience.

How is xarray structured?#

Xarray has two core data structures, which build upon and extend the core strengths of NumPy and Pandas libraries. Both data structures are fundamentally N-dimensional:

DataArray is the implementation of a labeled, N-dimensional array. It is an N-D generalization of a Pandas.Series. The name DataArray itself is borrowed from Fernando Perez’s datarray project, which prototyped a similar data structure.
Dataset is a multi-dimensional, in-memory array database. It is a dict-like container of DataArray objects aligned along any number of shared dimensions, and serves a similar purpose in xarray as the pandas.DataFrame.

Plotting#

Plotting data can easily be obtained through matplotlib.pyplot back-end matplotlib documentation.

NDVI_AOI.isel(time=0).plot(cmap="RdYlGn")

<matplotlib.collections.QuadMesh at 0x7f0a6445feb0>

In the next episode, we will learn more about advanced visualization tools and how to make interactive plots using holoviews, a tool part of the HoloViz ecosystem.

Basic maths#

NDVI values are a little odd in comparison to standard NDVI range values [-1, +1]. This confirms the max values reported in the Product User Manual (PUM).

NDVI characteristics from the Product User Manual (PUM)#

layer name	description	physical min	physical max	digital max	scaling	offset	No Data
ndvi	normalized difference vegetation index	-0.08	0.92	250	1/250	-0.08	254, 255
ndvi_unc	uncertainty on ndvi	0	1	1000	1/1000	0	65535
nobs	number of observations	0	32	32	1	0	255
qflag	bitwise quality flag	-	-	254	1	0	255

from: Copernicus Global Land Service NDVI 300 V2.0.1

Simple arithmetic operations can be performed without worrying about dimensions and coordinates, using the same notation we use with numpy. Underneath xarray will automatically vectorize the operations over all the data dimensions.

NDVI_AOI * (1/250) - 0.08

<xarray.DataArray 'NDVI' (time: 20, lat: 612, lon: 984)>
array([[[0.94 , 0.94 , 0.94 , ..., 0.94 , 0.94 , 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.94 , 0.94 , 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.94 , 0.94 , 0.94 ],
        ...,
        [0.252, 0.32 , 0.316, ..., 0.416, 0.352, 0.94 ],
        [0.404, 0.4  , 0.368, ..., 0.416, 0.412, 0.94 ],
        [0.372, 0.292, 0.292, ..., 0.244, 0.4  , 0.94 ]],

       [[0.94 , 0.94 , 0.94 , ..., 0.94 , 0.508, 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.488, 0.48 , 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.444, 0.508, 0.94 ],
        ...,
        [0.472, 0.432, 0.416, ..., 0.484, 0.464, 0.94 ],
        [0.448, 0.412, 0.392, ..., 0.56 , 0.528, 0.94 ],
        [0.444, 0.424, 0.4  , ..., 0.484, 0.496, 0.94 ]],

       [[0.94 , 0.94 , 0.94 , ..., 0.496, 0.504, 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.5  , 0.496, 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.552, 0.52 , 0.94 ],
        ...,
...
        ...,
        [0.484, 0.484, 0.54 , ..., 0.424, 0.484, 0.94 ],
        [0.452, 0.424, 0.484, ..., 0.488, 0.52 , 0.94 ],
        [0.432, 0.408, 0.392, ..., 0.72 , 0.64 , 0.94 ]],

       [[0.768, 0.74 , 0.708, ..., 0.752, 0.768, 0.94 ],
        [0.7  , 0.724, 0.724, ..., 0.716, 0.748, 0.94 ],
        [0.776, 0.784, 0.788, ..., 0.728, 0.716, 0.94 ],
        ...,
        [0.54 , 0.56 , 0.604, ..., 0.268, 0.296, 0.94 ],
        [0.488, 0.52 , 0.536, ..., 0.448, 0.516, 0.94 ],
        [0.464, 0.48 , 0.532, ..., 0.564, 0.432, 0.94 ]],

       [[0.82 , 0.82 , 0.804, ..., 0.756, 0.756, 0.94 ],
        [0.712, 0.72 , 0.72 , ..., 0.764, 0.752, 0.94 ],
        [0.812, 0.848, 0.848, ..., 0.76 , 0.756, 0.94 ],
        ...,
        [0.58 , 0.568, 0.568, ..., 0.392, 0.4  , 0.94 ],
        [0.584, 0.532, 0.528, ..., 0.548, 0.548, 0.94 ],
        [0.556, 0.484, 0.488, ..., 0.62 , 0.584, 0.94 ]]])
Coordinates:
  * time     (time) datetime64[ns] 2022-01-01 2022-01-11 ... 2022-07-11
  * lon      (lon) float64 8.502 8.505 8.508 8.511 ... 11.42 11.42 11.42 11.43
  * lat      (lat) float64 46.5 46.5 46.49 46.49 ... 44.69 44.69 44.68 44.68

xarray.DataArray

'NDVI'

time: 20
lat: 612
lon: 984

0.94 0.94 0.94 0.94 0.94 0.94 ... 0.44 0.496 0.608 0.62 0.584 0.94

array([[[0.94 , 0.94 , 0.94 , ..., 0.94 , 0.94 , 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.94 , 0.94 , 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.94 , 0.94 , 0.94 ],
        ...,
        [0.252, 0.32 , 0.316, ..., 0.416, 0.352, 0.94 ],
        [0.404, 0.4  , 0.368, ..., 0.416, 0.412, 0.94 ],
        [0.372, 0.292, 0.292, ..., 0.244, 0.4  , 0.94 ]],

       [[0.94 , 0.94 , 0.94 , ..., 0.94 , 0.508, 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.488, 0.48 , 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.444, 0.508, 0.94 ],
        ...,
        [0.472, 0.432, 0.416, ..., 0.484, 0.464, 0.94 ],
        [0.448, 0.412, 0.392, ..., 0.56 , 0.528, 0.94 ],
        [0.444, 0.424, 0.4  , ..., 0.484, 0.496, 0.94 ]],

       [[0.94 , 0.94 , 0.94 , ..., 0.496, 0.504, 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.5  , 0.496, 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.552, 0.52 , 0.94 ],
        ...,
...
        ...,
        [0.484, 0.484, 0.54 , ..., 0.424, 0.484, 0.94 ],
        [0.452, 0.424, 0.484, ..., 0.488, 0.52 , 0.94 ],
        [0.432, 0.408, 0.392, ..., 0.72 , 0.64 , 0.94 ]],

       [[0.768, 0.74 , 0.708, ..., 0.752, 0.768, 0.94 ],
        [0.7  , 0.724, 0.724, ..., 0.716, 0.748, 0.94 ],
        [0.776, 0.784, 0.788, ..., 0.728, 0.716, 0.94 ],
        ...,
        [0.54 , 0.56 , 0.604, ..., 0.268, 0.296, 0.94 ],
        [0.488, 0.52 , 0.536, ..., 0.448, 0.516, 0.94 ],
        [0.464, 0.48 , 0.532, ..., 0.564, 0.432, 0.94 ]],

       [[0.82 , 0.82 , 0.804, ..., 0.756, 0.756, 0.94 ],
        [0.712, 0.72 , 0.72 , ..., 0.764, 0.752, 0.94 ],
        [0.812, 0.848, 0.848, ..., 0.76 , 0.756, 0.94 ],
        ...,
        [0.58 , 0.568, 0.568, ..., 0.392, 0.4  , 0.94 ],
        [0.584, 0.532, 0.528, ..., 0.548, 0.548, 0.94 ],
        [0.556, 0.484, 0.488, ..., 0.62 , 0.584, 0.94 ]]])

Coordinates: (3)

time

(time)

datetime64[ns]

2022-01-01 ... 2022-07-11

standard_name :: t
long_name :: t
axis :: T

array(['2022-01-01T00:00:00.000000000', '2022-01-11T00:00:00.000000000',
       '2022-01-21T00:00:00.000000000', '2022-02-01T00:00:00.000000000',
       '2022-02-11T00:00:00.000000000', '2022-02-21T00:00:00.000000000',
       '2022-03-01T00:00:00.000000000', '2022-03-11T00:00:00.000000000',
       '2022-03-21T00:00:00.000000000', '2022-04-01T00:00:00.000000000',
       '2022-04-11T00:00:00.000000000', '2022-04-21T00:00:00.000000000',
       '2022-05-01T00:00:00.000000000', '2022-05-11T00:00:00.000000000',
       '2022-05-21T00:00:00.000000000', '2022-06-01T00:00:00.000000000',
       '2022-06-11T00:00:00.000000000', '2022-06-21T00:00:00.000000000',
       '2022-07-01T00:00:00.000000000', '2022-07-11T00:00:00.000000000'],
      dtype='datetime64[ns]')

lon
(lon)
float64
8.502 8.505 8.508 ... 11.42 11.43
standard_name :
projection_x_coordinate
long_name :
x coordinate
units :
degrees_east
axis :
X
```
array([ 8.501737,  8.504713,  8.507689, ..., 11.42138 , 11.424356, 11.427332])
```
lat
(lat)
float64
46.5 46.5 46.49 ... 44.68 44.68
standard_name :
projection_y_coordinate
long_name :
y coordinate
units :
degrees_north
axis :
Y
```
array([46.499803, 46.496827, 46.493851, ..., 44.687303, 44.684327, 44.681351])
```

Attributes: (0)

The universal function (ufunc) from numpy and scipy can be applied too directly to the data.

np.subtract(np.multiply(NDVI_AOI, 0.004), 0.08)

<xarray.DataArray 'NDVI' (time: 20, lat: 612, lon: 984)>
array([[[0.94 , 0.94 , 0.94 , ..., 0.94 , 0.94 , 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.94 , 0.94 , 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.94 , 0.94 , 0.94 ],
        ...,
        [0.252, 0.32 , 0.316, ..., 0.416, 0.352, 0.94 ],
        [0.404, 0.4  , 0.368, ..., 0.416, 0.412, 0.94 ],
        [0.372, 0.292, 0.292, ..., 0.244, 0.4  , 0.94 ]],

       [[0.94 , 0.94 , 0.94 , ..., 0.94 , 0.508, 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.488, 0.48 , 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.444, 0.508, 0.94 ],
        ...,
        [0.472, 0.432, 0.416, ..., 0.484, 0.464, 0.94 ],
        [0.448, 0.412, 0.392, ..., 0.56 , 0.528, 0.94 ],
        [0.444, 0.424, 0.4  , ..., 0.484, 0.496, 0.94 ]],

       [[0.94 , 0.94 , 0.94 , ..., 0.496, 0.504, 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.5  , 0.496, 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.552, 0.52 , 0.94 ],
        ...,
...
        ...,
        [0.484, 0.484, 0.54 , ..., 0.424, 0.484, 0.94 ],
        [0.452, 0.424, 0.484, ..., 0.488, 0.52 , 0.94 ],
        [0.432, 0.408, 0.392, ..., 0.72 , 0.64 , 0.94 ]],

       [[0.768, 0.74 , 0.708, ..., 0.752, 0.768, 0.94 ],
        [0.7  , 0.724, 0.724, ..., 0.716, 0.748, 0.94 ],
        [0.776, 0.784, 0.788, ..., 0.728, 0.716, 0.94 ],
        ...,
        [0.54 , 0.56 , 0.604, ..., 0.268, 0.296, 0.94 ],
        [0.488, 0.52 , 0.536, ..., 0.448, 0.516, 0.94 ],
        [0.464, 0.48 , 0.532, ..., 0.564, 0.432, 0.94 ]],

       [[0.82 , 0.82 , 0.804, ..., 0.756, 0.756, 0.94 ],
        [0.712, 0.72 , 0.72 , ..., 0.764, 0.752, 0.94 ],
        [0.812, 0.848, 0.848, ..., 0.76 , 0.756, 0.94 ],
        ...,
        [0.58 , 0.568, 0.568, ..., 0.392, 0.4  , 0.94 ],
        [0.584, 0.532, 0.528, ..., 0.548, 0.548, 0.94 ],
        [0.556, 0.484, 0.488, ..., 0.62 , 0.584, 0.94 ]]])
Coordinates:
  * time     (time) datetime64[ns] 2022-01-01 2022-01-11 ... 2022-07-11
  * lon      (lon) float64 8.502 8.505 8.508 8.511 ... 11.42 11.42 11.42 11.43
  * lat      (lat) float64 46.5 46.5 46.49 46.49 ... 44.69 44.69 44.68 44.68
Attributes:
    long_name:     NDVI
    units:         
    grid_mapping:  crs

xarray.DataArray

'NDVI'

time: 20
lat: 612
lon: 984

0.94 0.94 0.94 0.94 0.94 0.94 ... 0.44 0.496 0.608 0.62 0.584 0.94

array([[[0.94 , 0.94 , 0.94 , ..., 0.94 , 0.94 , 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.94 , 0.94 , 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.94 , 0.94 , 0.94 ],
        ...,
        [0.252, 0.32 , 0.316, ..., 0.416, 0.352, 0.94 ],
        [0.404, 0.4  , 0.368, ..., 0.416, 0.412, 0.94 ],
        [0.372, 0.292, 0.292, ..., 0.244, 0.4  , 0.94 ]],

       [[0.94 , 0.94 , 0.94 , ..., 0.94 , 0.508, 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.488, 0.48 , 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.444, 0.508, 0.94 ],
        ...,
        [0.472, 0.432, 0.416, ..., 0.484, 0.464, 0.94 ],
        [0.448, 0.412, 0.392, ..., 0.56 , 0.528, 0.94 ],
        [0.444, 0.424, 0.4  , ..., 0.484, 0.496, 0.94 ]],

       [[0.94 , 0.94 , 0.94 , ..., 0.496, 0.504, 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.5  , 0.496, 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.552, 0.52 , 0.94 ],
        ...,
...
        ...,
        [0.484, 0.484, 0.54 , ..., 0.424, 0.484, 0.94 ],
        [0.452, 0.424, 0.484, ..., 0.488, 0.52 , 0.94 ],
        [0.432, 0.408, 0.392, ..., 0.72 , 0.64 , 0.94 ]],

       [[0.768, 0.74 , 0.708, ..., 0.752, 0.768, 0.94 ],
        [0.7  , 0.724, 0.724, ..., 0.716, 0.748, 0.94 ],
        [0.776, 0.784, 0.788, ..., 0.728, 0.716, 0.94 ],
        ...,
        [0.54 , 0.56 , 0.604, ..., 0.268, 0.296, 0.94 ],
        [0.488, 0.52 , 0.536, ..., 0.448, 0.516, 0.94 ],
        [0.464, 0.48 , 0.532, ..., 0.564, 0.432, 0.94 ]],

       [[0.82 , 0.82 , 0.804, ..., 0.756, 0.756, 0.94 ],
        [0.712, 0.72 , 0.72 , ..., 0.764, 0.752, 0.94 ],
        [0.812, 0.848, 0.848, ..., 0.76 , 0.756, 0.94 ],
        ...,
        [0.58 , 0.568, 0.568, ..., 0.392, 0.4  , 0.94 ],
        [0.584, 0.532, 0.528, ..., 0.548, 0.548, 0.94 ],
        [0.556, 0.484, 0.488, ..., 0.62 , 0.584, 0.94 ]]])

Coordinates: (3)

time

(time)

datetime64[ns]

2022-01-01 ... 2022-07-11

standard_name :: t
long_name :: t
axis :: T

array(['2022-01-01T00:00:00.000000000', '2022-01-11T00:00:00.000000000',
       '2022-01-21T00:00:00.000000000', '2022-02-01T00:00:00.000000000',
       '2022-02-11T00:00:00.000000000', '2022-02-21T00:00:00.000000000',
       '2022-03-01T00:00:00.000000000', '2022-03-11T00:00:00.000000000',
       '2022-03-21T00:00:00.000000000', '2022-04-01T00:00:00.000000000',
       '2022-04-11T00:00:00.000000000', '2022-04-21T00:00:00.000000000',
       '2022-05-01T00:00:00.000000000', '2022-05-11T00:00:00.000000000',
       '2022-05-21T00:00:00.000000000', '2022-06-01T00:00:00.000000000',
       '2022-06-11T00:00:00.000000000', '2022-06-21T00:00:00.000000000',
       '2022-07-01T00:00:00.000000000', '2022-07-11T00:00:00.000000000'],
      dtype='datetime64[ns]')

lon
(lon)
float64
8.502 8.505 8.508 ... 11.42 11.43
standard_name :
projection_x_coordinate
long_name :
x coordinate
units :
degrees_east
axis :
X
```
array([ 8.501737,  8.504713,  8.507689, ..., 11.42138 , 11.424356, 11.427332])
```
lat
(lat)
float64
46.5 46.5 46.49 ... 44.68 44.68
standard_name :
projection_y_coordinate
long_name :
y coordinate
units :
degrees_north
axis :
Y
```
array([46.499803, 46.496827, 46.493851, ..., 44.687303, 44.684327, 44.681351])
```

Attributes: (3)
long_name :
NDVI
units :
grid_mapping :
crs

NDVI_AOI = NDVI_AOI * (1/250) - 0.08

NDVI_AOI

<xarray.DataArray 'NDVI' (time: 20, lat: 612, lon: 984)>
array([[[0.94 , 0.94 , 0.94 , ..., 0.94 , 0.94 , 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.94 , 0.94 , 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.94 , 0.94 , 0.94 ],
        ...,
        [0.252, 0.32 , 0.316, ..., 0.416, 0.352, 0.94 ],
        [0.404, 0.4  , 0.368, ..., 0.416, 0.412, 0.94 ],
        [0.372, 0.292, 0.292, ..., 0.244, 0.4  , 0.94 ]],

       [[0.94 , 0.94 , 0.94 , ..., 0.94 , 0.508, 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.488, 0.48 , 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.444, 0.508, 0.94 ],
        ...,
        [0.472, 0.432, 0.416, ..., 0.484, 0.464, 0.94 ],
        [0.448, 0.412, 0.392, ..., 0.56 , 0.528, 0.94 ],
        [0.444, 0.424, 0.4  , ..., 0.484, 0.496, 0.94 ]],

       [[0.94 , 0.94 , 0.94 , ..., 0.496, 0.504, 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.5  , 0.496, 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.552, 0.52 , 0.94 ],
        ...,
...
        ...,
        [0.484, 0.484, 0.54 , ..., 0.424, 0.484, 0.94 ],
        [0.452, 0.424, 0.484, ..., 0.488, 0.52 , 0.94 ],
        [0.432, 0.408, 0.392, ..., 0.72 , 0.64 , 0.94 ]],

       [[0.768, 0.74 , 0.708, ..., 0.752, 0.768, 0.94 ],
        [0.7  , 0.724, 0.724, ..., 0.716, 0.748, 0.94 ],
        [0.776, 0.784, 0.788, ..., 0.728, 0.716, 0.94 ],
        ...,
        [0.54 , 0.56 , 0.604, ..., 0.268, 0.296, 0.94 ],
        [0.488, 0.52 , 0.536, ..., 0.448, 0.516, 0.94 ],
        [0.464, 0.48 , 0.532, ..., 0.564, 0.432, 0.94 ]],

       [[0.82 , 0.82 , 0.804, ..., 0.756, 0.756, 0.94 ],
        [0.712, 0.72 , 0.72 , ..., 0.764, 0.752, 0.94 ],
        [0.812, 0.848, 0.848, ..., 0.76 , 0.756, 0.94 ],
        ...,
        [0.58 , 0.568, 0.568, ..., 0.392, 0.4  , 0.94 ],
        [0.584, 0.532, 0.528, ..., 0.548, 0.548, 0.94 ],
        [0.556, 0.484, 0.488, ..., 0.62 , 0.584, 0.94 ]]])
Coordinates:
  * time     (time) datetime64[ns] 2022-01-01 2022-01-11 ... 2022-07-11
  * lon      (lon) float64 8.502 8.505 8.508 8.511 ... 11.42 11.42 11.42 11.43
  * lat      (lat) float64 46.5 46.5 46.49 46.49 ... 44.69 44.69 44.68 44.68

xarray.DataArray

'NDVI'

time: 20
lat: 612
lon: 984

0.94 0.94 0.94 0.94 0.94 0.94 ... 0.44 0.496 0.608 0.62 0.584 0.94

array([[[0.94 , 0.94 , 0.94 , ..., 0.94 , 0.94 , 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.94 , 0.94 , 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.94 , 0.94 , 0.94 ],
        ...,
        [0.252, 0.32 , 0.316, ..., 0.416, 0.352, 0.94 ],
        [0.404, 0.4  , 0.368, ..., 0.416, 0.412, 0.94 ],
        [0.372, 0.292, 0.292, ..., 0.244, 0.4  , 0.94 ]],

       [[0.94 , 0.94 , 0.94 , ..., 0.94 , 0.508, 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.488, 0.48 , 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.444, 0.508, 0.94 ],
        ...,
        [0.472, 0.432, 0.416, ..., 0.484, 0.464, 0.94 ],
        [0.448, 0.412, 0.392, ..., 0.56 , 0.528, 0.94 ],
        [0.444, 0.424, 0.4  , ..., 0.484, 0.496, 0.94 ]],

       [[0.94 , 0.94 , 0.94 , ..., 0.496, 0.504, 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.5  , 0.496, 0.94 ],
        [0.94 , 0.94 , 0.94 , ..., 0.552, 0.52 , 0.94 ],
        ...,
...
        ...,
        [0.484, 0.484, 0.54 , ..., 0.424, 0.484, 0.94 ],
        [0.452, 0.424, 0.484, ..., 0.488, 0.52 , 0.94 ],
        [0.432, 0.408, 0.392, ..., 0.72 , 0.64 , 0.94 ]],

       [[0.768, 0.74 , 0.708, ..., 0.752, 0.768, 0.94 ],
        [0.7  , 0.724, 0.724, ..., 0.716, 0.748, 0.94 ],
        [0.776, 0.784, 0.788, ..., 0.728, 0.716, 0.94 ],
        ...,
        [0.54 , 0.56 , 0.604, ..., 0.268, 0.296, 0.94 ],
        [0.488, 0.52 , 0.536, ..., 0.448, 0.516, 0.94 ],
        [0.464, 0.48 , 0.532, ..., 0.564, 0.432, 0.94 ]],

       [[0.82 , 0.82 , 0.804, ..., 0.756, 0.756, 0.94 ],
        [0.712, 0.72 , 0.72 , ..., 0.764, 0.752, 0.94 ],
        [0.812, 0.848, 0.848, ..., 0.76 , 0.756, 0.94 ],
        ...,
        [0.58 , 0.568, 0.568, ..., 0.392, 0.4  , 0.94 ],
        [0.584, 0.532, 0.528, ..., 0.548, 0.548, 0.94 ],
        [0.556, 0.484, 0.488, ..., 0.62 , 0.584, 0.94 ]]])

Coordinates: (3)

time

(time)

datetime64[ns]

2022-01-01 ... 2022-07-11

standard_name :: t
long_name :: t
axis :: T

array(['2022-01-01T00:00:00.000000000', '2022-01-11T00:00:00.000000000',
       '2022-01-21T00:00:00.000000000', '2022-02-01T00:00:00.000000000',
       '2022-02-11T00:00:00.000000000', '2022-02-21T00:00:00.000000000',
       '2022-03-01T00:00:00.000000000', '2022-03-11T00:00:00.000000000',
       '2022-03-21T00:00:00.000000000', '2022-04-01T00:00:00.000000000',
       '2022-04-11T00:00:00.000000000', '2022-04-21T00:00:00.000000000',
       '2022-05-01T00:00:00.000000000', '2022-05-11T00:00:00.000000000',
       '2022-05-21T00:00:00.000000000', '2022-06-01T00:00:00.000000000',
       '2022-06-11T00:00:00.000000000', '2022-06-21T00:00:00.000000000',
       '2022-07-01T00:00:00.000000000', '2022-07-11T00:00:00.000000000'],
      dtype='datetime64[ns]')

lon
(lon)
float64
8.502 8.505 8.508 ... 11.42 11.43
standard_name :
projection_x_coordinate
long_name :
x coordinate
units :
degrees_east
axis :
X
```
array([ 8.501737,  8.504713,  8.507689, ..., 11.42138 , 11.424356, 11.427332])
```
lat
(lat)
float64
46.5 46.5 46.49 ... 44.68 44.68
standard_name :
projection_y_coordinate
long_name :
y coordinate
units :
degrees_north
axis :
Y
```
array([46.499803, 46.496827, 46.493851, ..., 44.687303, 44.684327, 44.681351])
```

Attributes: (0)

Mask#

Not all values are valid and masking all those which are not in the valid range [-0.08, 0.92] is necessary. Masking can be achieved through the method DataSet|Array.where(cond, other) or xr.where(cond, x, y).

The difference consists in the possibility to specify the value in case the condition is positive or not; DataSet|Array.where(cond, other) only offer the possibility to define the false condition value (by default is set to np.NaN))

NDVI_masked = NDVI_AOI.where((NDVI_AOI >= -0.08) & (NDVI_AOI <= 0.92))

NDVI_masked.isel(time=0).plot()

<matplotlib.collections.QuadMesh at 0x7f0a2cc4bca0>

To better visualize the mask, with the help of xr.where, ad-hoc variable can be created. ‘xr.where’ let us specify value 1 for masked and 0 for the unmasked data.

mask = xr.where((NDVI_AOI <= -0.08) | (NDVI_AOI >= 0.92), 1, 0)

mask = xr.where((NDVI_AOI <= -0.08) | (NDVI_AOI >= 0.92), 1, 0)

mask.isel(time=0).plot()

<matplotlib.collections.QuadMesh at 0x7f0a2cb54340>

Plot a single point (defined by its latitude and longitude) over the time dimension.

NDVI_masked.sel(lat=45.88, lon=8.63, method='nearest').plot()

[<matplotlib.lines.Line2D at 0x7f0a2cbef3d0>]

Save xarray Dataset#

It is very often convenient to save intermediate or final results into a local file. We will learn more about the different file formats Xarray can handle, but for now let’s save it as a netCDF file. Check the file size after saving the result into netCDF.

NDVI_masked.to_netcdf('C_GLS_NDVI_20220101_20220701_Lombardia_S3_2_masked.nc')

Advance Saving methods#

Encoding and Compression#

From the NDVI dataset we already know that values can be encoded and can be conceptualized as pure Digital Numbers (DN). To revert those values to physical values (PhyVal) the formula PhyVal = DN * scale_factor + add_offset has to be used. To achieve the same result and transform our PhyVal back to DN 4 different parameters has to be defined :

dtype : datatype specification, in a numpy version (np.int16, np.float32) or a string one that can be converted to it. Here we use ‘np.uint8’ as values will range only up to 255.
_FillValues : a values that substitute the NaNs one. Some cast doesn’t allow the conversion of Nans as there is no physical representation for that value (like from Float to Int), so an alternative value withing the acceptable values needs to be specified.
scale_factor & add_offset : values can be converted through a scaling and off_set parameters according to the formula decoded = scale_factor * encoded + add_offset

A compression method can be defined as well; if the format is netCDF4 with the engine set to ‘netcdf4’ or ‘h5netcdf’ there are different compression options. The easiest solution is to stick with the default one for NetCDF4 files.

Note that encoding parameters needs to be done through a nested dictionary and parameters has to be defined for each single variable.

NDVI_masked.to_netcdf('C_GLS_NDVI_20220101_20220701_Lombardia_S3_2_mcs.nc',
                      engine='netcdf4',
                      encoding={'NDVI':{"dtype": np.uint8,
                                        '_FillValue': 255,
                                        'scale_factor':0.004,
                                        'add_offset':-0.08,
                                        'zlib': True, 'complevel':4}
                                }
                      )

Key Points

Xarray Dataset and DataArray
Read and get metadata from local raster file
Dataset and DataArray selection
Aggregation and statistics
Masking values

Through the datatype and the compression a compression of almost 10 time has been achieved; as drawback speed reading has been decreased.

References#

Kog95: F.N. Kogan. Application of vegetation index and brightness temperature for drought detection. Advances in Space Research, 15(11):91–100, 1995. Natural Hazards: Monitoring and Assessment Using Remote Sensing Technique. URL: https://www.sciencedirect.com/science/article/pii/027311779500079T, doi:https://doi.org/10.1016/0273-1177(95)00079-T.
Wik2): Wikipedia. Normalized difference vegetation index. https://en.wikipedia.org/wiki/Normalized_difference_vegetation_index, 2022 (accessed August 7, 2022).

Packages citation#

HMvdW+20: Charles R. Harris, K. Jarrod Millman, Stéfan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fernández del Río, Mark Wiebe, Pearu Peterson, Pierre Gérard-Marchant, Kevin Sheppard, Tyler Reddy, Warren Weckesser, Hameer Abbasi, Christoph Gohlke, and Travis E. Oliphant. Array programming with NumPy. Nature, 585(7825):357–362, September 2020. URL: https://doi.org/10.1038/s41586-020-2649-2, doi:10.1038/s41586-020-2649-2.
HH17: S. Hoyer and J. Hamman. Xarray: N-D labeled arrays and datasets in Python. Journal of Open Research Software, 2017. URL: https://doi.org/10.5334/jors.148, doi:10.5334/jors.148.
USR+20: Leonardo Uieda, Santiago Rubén Soler, Rémi Rampin, Hugo van Kemenade, Matthew Turk, Daniel Shapero, Anderson Banihirwe, and John Leeman. Pooch: a friend to fetch your data files. Journal of Open Source Software, 5(45):1943, 2020. URL: https://doi.org/10.21105/joss.01943, doi:10.21105/joss.01943.

Pangeo Tutorial at FOSS4G 2022

Handling multi-dimensional arrays with xarray

Contents

Handling multi-dimensional arrays with xarray#

Authors & Contributors#

Authors#

Contributors#

Context#

Data#

Setup#

Packages#

Fetch Data#

Open and read metadata through Xarray#

What is xarray?#

How is xarray structured?#

Accessing Coordinates and Data Variables#

Xarray and Memory usage#

Renaming Coordinates and Data Variables#

Selection methods#

Plotting#

Basic maths#

NDVI characteristics from the Product User Manual (PUM)#

Statistics#

Aggregation#

Mask#

Save xarray Dataset#

Advance Saving methods#

Encoding and Compression#

References#

Packages citation#