This talk will demonstrate how we can perform large scale raster analysis using GeoPySpark in a Jupyter Notebook. GeoPySpark was created to enable access to GeoTrellis to people with knowledge of Python. GeoTrellis is a geographic data processing library for high performance applications. It is written in Scala and uses Spark to work with raster and other geospatial data. GeoTrellis 1.0 was recently released under LocationTech, marking a major achievement for the community that has helped to build the project. It is open source under the Apache 2.0 license.
We hear frequently that Scala is difficult. The GeoTrellis team wants to make it accessible to Python developers. That is why we created GeoPySpark - a Python library for accessing GeoTrellis through PySpark.
Jupyter Notebooks are a tool increasingly be adopted by data scientists to iterate quickly and share algorithms and results. We are designing GeoPySpark to work smoothly in a Jupyter Notebook.
In this demo we will show from start to finish how to use GeoPySpark to create a weighted raster overlay demonstrating a site suitability layer that will be viewable in a leaflet enabled web-map in the Jupyter Notebook. The specific topic of the demo is less important than the functionality we will make available. But we will demonstrate how to create a map that shows the best places to site a nature reserve based on a number of factors - population centers, roads, development, existing green space, etc.
We believe GeoPySpark represents something new for leveraging big data processing through Map Algebra in a Juypyter Notebook.