Quantcast
Channel: maps for developers - Medium
Viewing all articles
Browse latest Browse all 2230

Build for the cloud with Rasterio

$
0
0
Landsat RGB (true color), Peron Islands and Channel Point, near Darwin, Australia. Landsat imagery is courtesy of NASA Goddard Space Flight Center and U.S. Geological Survey

By: Sean Gillies

Rasterio wraps the powerful features of the Geospatial Data Abstraction Library (GDAL) in idiomatic Python functions and classes. It is the most active Python project in Mapbox’s GitHub organization and one of the most active open source Python GIS projects.

Rasterio is in a period of pre-releases leading up to a 1.0.0 next year. Extensive testing and delegation of heavy lifting to GDAL makes it one of the more robust alpha packages out there. We’re using it in our raster processing pipelines, and it’s a part of related projects like gdbxtools and Marblecutter.

There are so many new, advanced things you can do with Rasterio. We wanted to run you through the most exciting updates for leveraging public datasets and computing resources in the cloud.

Cloud-friendly packaging

The Rasterio packages (“wheels” in Python parlance) that we are publishing on the Python Package Index are much more ready for production than they were when we introduced them at the beginning of 2015. They include the latest stable versions of open source geospatial workhorses GDAL, GEOS, and PROJ.4. We’ve added support for more formats, including NetCDF so you can access the AWS GEOS Public Dataset and JPEG 2000 for access to Sentinel-2 data.

Even better, the new generation of Rasterio wheels for Linux are a third of their former weight, which means that they’re simpler to deploy in applications on AWS Lambda like Mapbox’s landsat-tiler.

Three generations of NASA Mars rover wheels

Rasterio’s batteries-included wheels are built using the same tools and techniques used to make wheels for Numpy and SciPy. We’re indebted to the Python wheel-builders community for the trail-blazing it has done.

The best source of binary Rasterio packages for Windows and the conda platform is Conda Forge.

New advanced features

Rasterio turns five GDAL features into solid, idiomatic Python patterns suited for building applications that run in the cloud.

  1. Access to datasets stored in RAM
  2. Access to datasets in zipped streams
  3. Efficient access to metadata of rasters served via HTTP
  4. Quick overviews and subsets of cloud-optimized GeoTIFFs
  5. Lazy warping of cloud-optimized GeoTIFFs

When you can read raster data directly from zipped streams uploaded by users and analyze or process in-memory representations of rasters, you can deploy to computers with very limited filesystems. Rasterio makes this easy for developers with spatially-aware drop-in replacements for Python’s BytesIO and ZipFile.

Likewise, the ability to use data stored in the cloud directly without prior download lets you process large amounts of data on computers with little or no permanent data storage. To do this, Rasterio uses identifiers and idioms that will be familiar to users of the AWS CLI or boto3 package.

What is a cloud-optimized GeoTIFF? The format and practice are described at http://www.cogeo.org/.

Cloud Optimized GeoTIFF relies on two complementary pieces of technology.
The first is the ability of a GeoTIFF to not only store the raw pixels of the image, but to also organize those pixels in particular ways. The second is HTTP GET range requests, that let clients ask for just the portions of a file that they need. Together these enable fully online processing of data by COG-aware clients, as they can stream the right parts of the GeoTIFF as they need it, instead of having to download the whole file.

This is a new industry best practice 18 years in the making.

There’s a browser in GDAL

There is a web browser in GDAL that can navigate GeoTIFF files on the web. It’s a sophisticated browser that will fetch the least number of bytes required when you call the read method of a Rasterio dataset object. It isn’t extremely new or revolutionary: HTTP/1.1 range requests were standardized in 1999 and GDAL’s curl-based virtual file system goes back to 2010. What has changed recently is an increase in our attention, driven by the exploding availability of cheap “serverless” computing resources and the freely-available, massive datasets hosted alongside those resources. More attention means more development. Authentication features have been improved. Preliminary HTTP/2 support is implemented. Finer control over caching of requests is in the works.

The role that the cloud-optimized GeoTIFF media type plays in this emerging architecture is very important. HTTP knows nothing about raster data, nothing about rows and columns, blocks and stripes. It is blissfully ignorant of NoData. HTTP has requests and responses, with a handful of generic methods including partial retrieval of a resource’s representation. The quick access to overviews and subsets that is a hallmark of this cloud-native architecture is possible due to a well-designed and standardized media type, GeoTIFF, and the GDAL browser’s understanding of the capabilities of a GeoTIFF. Do you see what we have here? HTTP plus a GeoTIFF as engine of our application’s state¹.

Notebook on advanced features

If you’d like to take a deeper dive into Rasterio’s new advanced features, including a look behind the curtain at HTTP transcripts demonstrating range requests, see our Advanced Rasterio Features Notebook. GitHub displays a non-interactive version of the notebook, but you can download and run it on your own computer after following the instructions therein.

We hope this notebook will help you understand how to use Rasterio’s advanced features, cloud-optimized GeoTIFFs, and the AWS Landsat Public Dataset. We welcome helpful comments on the notebook gist. Please feel free to ping me on Twitter (@sgillies) with any other questions about this post.

[1] That’s right: Representational State Transfer (REST)

Sean Gillies


Build for the cloud with Rasterio was originally published in Points of interest on Medium, where people are continuing the conversation by highlighting and responding to this story.


Viewing all articles
Browse latest Browse all 2230

Trending Articles