Pangeo logo

A community platform for Big Data geoscience

Join the community!#

Information

Links

Website

https://pangeo.io/

GitHub

GitHub

Examples

Gallery

Chat

Gitter Pangeo - Discourse

News

Medium - Blog Fllo

Meetings#

  • General: Pangeo holds community meetings meetings every Wednesday. The meetings alternate between 12PM and 4PM US Eastern Time to encourage participants from a wider range of time zones.

  • Continental meetings: to adress different time zones among the globe continental meetings have been organized in Europe/Africa and Oceania.

  • Showcases: 15 minutes talks which are an opportunity for anyone to meet other members of the Pangeo community and let them know what you are working on. The talks are recorded, given a DOI, and made available on the Pangeo YouTube Channel. If you are interested in giving a talk, fill out this short form.

Cloud infrastructure#

Data life cycle#

  • Pangeo Forge: a tool designed to aid the extraction, transformation, and loading of datasets.

Most recent trainings (2021/22)#

  • Galaxy training in climate data: contains two modules introducing Pangeo, Pangeo ecosystem 101 for everyone and Pangeo Notebook in Galaxy - Introduction to Xarray showcasing how the Pangeo stack assists processing and analysing big climate datasets.

  • BIOGEOMON 2022 Python Pangeo Workshop: led by Landscape Geoinformatics includes Jupyter notebooks demonstrating Xarray for working with labeled multi-dimensional arrays of data. The material also shows a few basic steps how to improve reproducibility and pro-actively apply FAIR principles when sharing and archiving data and code online for publishing via GitHub and Zenodo.

  • FOSS4G 2021: focuses on data discovery with SpatioTemporal Asset Catalogs (STAC), data loading with Cloud-optimized formats (Cloud-Optimized Geotiff, ZARR), and scalable analysis with Xarray and Dask libraries.

Additional resources/initiatives consuming Pangeo stack#

List of some active initiatives. Find more in https://github.com/pangeo-data.

  • CarbonPlan: non-profit initiative, analyzes climate solutions based on the best available science and data. The team works collaboratively with the Pangeo community to build open tools and resources for the evaluation and deployment of robust climate programs.

  • CliMetLab: package, aims at simplifying access to climate and meteorological datasets, allowing users to focus on science instead of technical issues such as data access and data formats.

  • climpred: package, aims to be the primary package used to analyze output from initialized dynamical forecast models, ranging from short-term weather forecasts to decadal climate forecasts.

  • Digital Earth Africa Sandbox: platform, a cloud-based computational platform that operates through a Jupyter Lab environment. It provides a limited, but free compute resource for technical users and data scientists to explore DE Africa data and products. The platform consumes xarray and dask to optimize the processing and analysis of the curated datasets.

  • EOOffshore: research project, presents a case study that demonstrates the utility of the Pangeo software ecosystem to address these issues in the development of offshore wind speed and power density estimates, increasing wind measurement coverage of offshore renewable energy assessment areas in the Irish Continental Shelf region.

  • Fastscape LEM: software stack, aims at making landscape evolution models and topographic analysis algorithms readily accessible to a wide range of users, from experts in landscape evolution modelling to scientists, researchers and teachers in the broader Earth science community.

  • flox: package, explores strategies for fast GroupBy reductions with dask.array. It used to be called dask_groupby.

  • NetCarbon: startup company, offering farmers a free solution for measuring and monetizing their sequestered carbon to contribute towards carbon neutrality.

  • Planetary Computer: platform, a cloud-based computational platform aiming to combine a petabyte catalog of analysis-ready geospatial data, an API that facilitates spatiotemporal querying over that data and a computing environment that simplifies distributed computing workloads.

  • PyGMT: package, facilitates processing geospatial and geophysical data and making publication quality maps and figures.

  • scivision: package, aims to connect computer vision model developers to image data providers from diverse scientific fields. The project builds upon existing libraries to create and manipulate data catalogues e.g. intake, and xarray to handle N-dimensional data for exploring CV models.

  • Urban Grammar AI research project: research project, proposes a conceptual framework to characterize urban structure through the notions of spatial signatures and urban grammar. In addition to consume the Pangeo stack, the resource demonstrates some notebooks using dask_geopandas to optimize processing and analysing spatial operations on geometric types.

  • verde, package, aims at processing spatial data (bathymetry, geophysics surveys, etc) and interpolating it on regular grids (i.e., gridding).

  • xarray-sentinel: package, facilitates access and exploration of the SAR data products of the Copernicus Sentinel-1 satellite mission.

  • xESMF: package, a regridding tool suited for non-orthogonal grids. xESMF tries to be simple and intuitive.

  • xMIP: package, facilitates the cleaning, organization and interactive analysis of Model Intercomparison Projects (MIPs) within the Pangeo software stack.