Salzburg is a beautiful city in December. The European LiDAR Mapping Forum coincided with the days when the „Krampus“ (= „Christmas monsters“) are roaming the Christmas markets in the old town to scare children and adults alike. One gave me a painful whipping in the legs with its leathery tail when I tried to protect a LAStools user … (-;
More to the point, here is my talk at ELMF 2012 on „LASindex – simple spatial indexing of LiDAR data“. I first give a little update on LASzip, then talk about spatial indexing with LAX, before sneak-previewing PulseWaves – our new and open LiDAR format for storing full waveform data.
[youtube=http://www.youtube.com/watch?v=FMcBywhPgdg]
Some more detail:
Airborne LiDAR surveys collect large amounts of elevations samples, often resulting in Terabytes of data. The acquired LiDAR points are typically stored and distributed in the LAS format or – its lossless compressed twin – the LAZ format. However, managing a folder of LAS or LAZ files is not a trivial task when a survey consists, for example, of 500 flight strips containing around 200 million points each. Even a simple area-of-interest (AOI) query requires opening all files and loading all those whose bounding box overlaps the queried AOI. One solution is to copy the survey into a dedicated data base such as Oracle Spatial or PostgreSQL. We present a much simpler alternative that works directly on the original LAS or LAZ files.
Our minimal-effort spatial indexing scheme has very small setup costs, avoids creating a second copy of the data, and is already in use in the LAStools software suite. For each LiDAR file we generate a tiny LAX file that resides in the same folder as the *.las or *.laz file and has the same name but with a *.lax extension. The LAX files are generally as small as 0.01 percent (for a LAS file) or 0.1 percent (for a LAZ file) of the file containing the LiDAR data and they can be generated as fast as the points can be read off disk.
The LAX files describe an adaptive quadtree over the x and y coordinates of all points. Each occupied quadtree cell stores a list of point index intervals that together reference all points falling into this cell. By merging all intervals of a cell that are less than 1000 apart in point index space we significantly reduce the number of intervals, the size of the LAX files, and the number of file seek operations.
Although individual cells typically reference too many points this is usually amortized as a typical AOI query will require returning a union of all intervals from many quadtree cells. However, our in-place spatial indexing relies on a certain degree of spatial coherency to be present in the point order. A simple measure of the efficiency of the existing order is to calculate the overhead factor when loading each quadtree cell individually from disk.
The source code for LASindex is part of the open source library LASlib of LAStools. It has been extensively field-tested in the LiDAR delivery pipeline of Open Topography (OT) where it is used to efficiently gather data from folders of LAZ files in accordance to area-of-interest queries that are generated by users via OT’s popular web-based LiDAR download interface. Another important use is on-the-fly point buffering. When batch processing, for example, 2km by 2km LiDAR tiles to create DTMs via rasterization of a temporary TIN, it is beneficial to load a 100 meter point buffer around each tile to avoid tile boundary artifacts. The presence of LAX files allows doing so efficiently on-the-fly.