Pangeo is a community effort for big data in the geosciences.
There are several building crises facing the geoscientific community:
We believe these challenges can all be addressed through a unified effort.
Our mission is to cultivate an ecosystem in which the next generation of open-source analysis tools for the geosciences can be developed, distributed, and sustained. These tools must be scalable in order to meet the current and future challenges of big data, and these solutions should leverage the existing expertise outside of the geoscientific community.
We envision a collection of related but independent open-source packages that meet specific scientific needs within the geoscience fields. These packages will follow modern best practices for software development, including:
At the core of the Pangeo concept is a collection of tools that are already widely used throughout and beyond the geoscientific community. In recent years, the open-source scientific software stack in Python has grown to become rich and full featured.
Caption: The Python Data Stack. Source: Jake VanderPlas, “The State of the Stack,” SciPy Keynote (SciPy 2015).
In practice, the “python data” software stack (see above) currently provides the most stable and powerful foundation layer for our desired tools. In particular, two widely used packages in the geoscientific software community, xarray
and dask
, provide a mechanism to easily build scalability into scientific analysis. Our vision of future geoscientific software involves the adoption of these common software layers, and a clear communication between developers to define project scope and dependency that eliminates redundancy and fragmentation.
The current Pangeo efforts are focused on closing some of the pressing concerns on scalability of the software tools shown above. In particular, we are developing tools that support distributed parallel computing in high-performance-computing and cloud-computing environments.
Caption: The Pangeo Platform; source: Abernathey et al (2017), “Pangeo: An Open Source Big Data Climate Science Platform “ NSF award 1740648.
The scientific culture in the geoscientific community must be tied to, and evolve from, the community’s software culture. Hence, we depend upon contributions from the entire community, both scientific and industrial.
We encourage everyone to get involved by:
For now, community discussion is happening on our