We just updated Mapbox Satellite across the continental United States with 48TB of open imagery from the National Agriculture Imagery Project (NAIP). The imagery captured by NAIP is incredibly diverse thanks to a variety of factors: different collection times, instruments, altitudes, weather, and lighting conditions. Before even beginning our processing, we reviewed the acquisition dates for all imagery we had on hand and found that there were 333 different collection dates from 2011 to 2013. To address these issues, we built a pipeline to calibrate NAIP against a consistent reference layer, adjust colors, and reduce seams in the final mosaic.
Calibration
One way to address color inconsistencies in raster data is to calibrate images according to a reference layer. Histogram matching is a common technique where colors from a reference layer are borrowed and applied to the target layer. Applying a single reference across the entire survey reduces most of NAIP’s color inconsistencies, helping all scenes converge towards a more even color palette. We used Landsat images as a reference layer for calibrating the dataset.
Using Landsat as a calibration layer presented a problem: Landsat imagery has a spatial resolution of approximately 30 meters, whereas NAIP’s resolution is roughly one meter. Many of the fine details present in the NAIP image are not present in the corresponding Landsat image. During histogram matching, these details are washed out by the colors of larger neighboring objects resolved by Landsat.
We used a weighted averaging technique to bring the finer details from the uncalibrated NAIP image back into the calibrated image. This technique allowed us to find a middle ground between color space convergence and fine detail.
Seamless
Histogram matching moved the entire survey towards a more uniform color palette, but seams were still present between neighboring images. To reduce the seams, we adopted a technique similar to the way that digital cameras stitch panoramic images. When panoramics are created, the camera positions the next frame so that there are some overlapping pixels between neighbors. Ideally, overlapping pixels should have the same value, but that’s rarely the case due to exposure inconsistencies. For subsequent frames, a transform function is applied to each overlapping pixel so that its value matches the corresponding pixel of the preceding image. The transformation is gradually reduced until it smoothly transitions to the response of the current image.
In the case of satellite and aerial imagery, there are often overlapping pixels between neighboring scenes. Overlapping regions are leveraged in a similar way, but the problem is more difficult. Panoramics are stitched across a single dimension (x axis), whereas satellite and aerial imagery are stitched across two dimensions (x and y axes). Also, most panoramic strategies adjust an image based on derived values from all images in the sequence in order to prevent drift effects. For satellite and aerial imagery, we have to adjust the image with knowledge of only its most adjacent neighbors, relying on histogram matching to prevent substantial drift effects.
Moving forward
By combining the techniques above in our imagery processing pipeline, we were able to ingest a massive, diverse dataset and roll out a beautiful imagery layer from coast to coast. We’re continuing an exploration of new imaging techniques to improve our satellite baselayer and in the coming months we’ll be releasing new imagery. Drop the Satellite team a note if you’re interested in building out a robust raster pipeline.