By: Vincent Sarago
In recent years, the amount of freely available Earth Observation data has grown significantly. Right now you can can access petabytes of high resolution imagery with a few lines of code or via web interfaces, but there is still progress to be made to make this easier. We are releasing a new open source Rasterio plugin and a serverless demo tool to help process data directly on the cloud.
To demonstrate this approach, I’ll walk you through how to set up a highly customizable serverless tile utility, based on Amazon Web Services (AWS) Lambda functions, that processes and serves USGS Landsat data. It’s open source so that you can deploy it, rewrite it, and customize it for other services.
The data source
As Landsat 8 data is collected, it’s added to an Amazon Public Dataset (PDS), where it can be freely accessed. The data is stored as individual bands for each Landsat scene in a cloud optimized GeoTIFF, instead of keeping all 12 bands zipped together. This decision allows us to access only the three individual bands needed to visualize an RGB image and save a great deal of bandwidth. (You can refer to this post from 2013 for the principles behind making RGB images from Landsat 8 data.)
Petabytes of data and no server
By combining the power of AWS Lambda with Rasterio, our open source library for handling geospatial rasters, we can create and serve Landsat map tiles on the fly without thinking about servers or storage. It’s faster than the desktop scripting approach, and while there is a limit on parallel call in AWS Lambda, it scales almost infinitely at very low cost.
AWS Lambda is a service that lets you run functions in node, python, or Java in response to triggers like API calls, file uploads, database edits, etc. You only pay for the execution of the function. You don’t have to pay for an always-on server, and you don’t have to care about concurrency, because each request is processed independently.
The good without the bad and the ugly
Creating a python Lambda package (essentially a script) to process satellite imagery can be quite an undertaking. Usually, to create a package you have to start an AWS EC2 virtual machine, compile the python module you need, zip everything, and upload it to AWS S3. You can read Matt Perry’s tutorial to do it “the hard way”. But there is a work-around to avoid those steps. By using docker you can do everything locally, and if you don’t want to spend 30 minutes compiling GDAL you can even use an existing docker image to build and create a Lambda package.
The Landsat Lambda Tiler
We’ve opened the code with all the information you need to create you own Landsat tile server! Check out the landsat-tiler repository.
… and a Landsat viewer to use it
When you’re done deploying your Lambda function, it’s time to use it! By combining Mapbox-GL with Development Seed’s satellite API, we can create a simple viewer to access everything in the Landsat public dataset. The HTML + CSS code for the viewer is also open-sourced in the landsat-tiler repository.
Pipelines in the cloud
Serverless architectures are important in our work on the Mapbox satellite team. We have found places where they make our processes much clearer, faster, and more cost effective, and I hope we can inspire you to look for similar opportunities.
We continue to invest in improving the tools we use and the ecosystem around satellite and aerial imagery. Sometimes this is very low-level, for example, we have sponsored Even Rouault’s work on providing random access to geospatial datasets on the web using HTTP range requests. GDAL is the library that underlies Rasterio, and the HTTP range read feature makes it possible to work with geospatial rasters across networks quickly and comfortably. Now you can get a remote GeoTIFF’s metadata very cheaply, and pulling arbitrary data from it is as easy as working locally. This is the kind of efficient abstraction that makes it easy to invent new architectures for solving problems with remote sensing data.
GDAL’s HTTP range requests is the feature that makes a serverless tile server so fast. Without random access to pixels in the GeoTIFFs on S3, we would have to download entire files to the Lambda function’s very limited temporary disk space — for every tile.
Live demo
Understanding the code is always better with a live demo. Based on landsat-tiler, I wrote my own Landsat viewer site, where you can see it working.
The combination of large volumes of excellent free data with serverless architectures is still very new, and there’s a lot of room to explore. I hope this project gives you a helpful starting place for further experiments.
Further reading
In Introducing the AWS Lambda Tiler Chris Henrik explains how to process data that you can store on S3 to support your own serverless tile server. He also introduces the OpenAerialMap Dynamic Tiler written by Seth Fitzsimmons, a pioneering serverless tiler that uses Rasterio and GDAL.
In GDAL and cloud storage Even Rouault says more about the new and enhanced cloud based virtual file systems coming in GDAL 2.3.0.
Combining the power of AWS Lambda and Rasterio was originally published in Points of interest on Medium, where people are continuing the conversation by highlighting and responding to this story.