Getting Started With OpenStreetMap Data

In 2004, Steve Coast set up OpenStreetMap (OSM) in the U.K. It subsequently spread worldwide, powered by a combination of donations and volunteers willing to do ground surveys with tools such as handheld GPS units, notebooks, and digital cameras.

OSM Data Usage

OpenStreetMap’s map and data are free to use so long as you provide attribution and share any corrections back to the project. JavaScript libraries and plugins for WordPress, Django and other content-management systems allow users to display their own maps. Most of these depend on OSM title servers but if you want full control, you can download the OSM data and then render your own 256 x 256 pixel tiles. (Alternatively, some third-party suppliers will do the rendering for you, sometimes for free.)

OpenStreetMap’s full dataset is over half a terabyte in size when unzipped—but unless you need the entire mapped world, subsets for particular countries are more usable. For example, the OSM file for Great Britain is just 14.2 GB.

The older OSM download format, which is XML and PBF, is based on Google protocol buffers, and produces smaller files; to speed up processing OSM files, there’s a utility, osmconvert (written in C), that converts OSM files to .o5m and other formats. These .o5m files have the human readability of OSM but are smaller.

OSM Data

Geographical Information Systems (GIS) can have very complex data structures, but OSM uses just three different data types: Nodes, Ways and Relations. Nodes define a point with an ID, latitude and longitude; Ways are an ordered list of 2 to 2,000 nodes, used to represent rivers and roads; Relations define relationships between nodes and ways (for example, listing roads at a traffic junction). All three types have tags which are key/value pairs, such as highway=residential. Tags are what make the platform more than just a map.

Developing With OSM

Osmconvert, which I mentioned earlier, was written in C. Another programming language you can use with OSM is C++11 (pick a compiler like Clang 3.4 or later or GCC 4.8). It’s possible to work with it in Windows, but you need to either use Cygwin or have a recent Visual Studio C++ 64 bit version, typically VS 2015. is a good place to start, home to the Osmium library (libosmium). Fetch and build Libosmium; on Linux/Unix systems there are a fair number of dependencies that you’ll need as well; these are listed within the links. If you prefer JavaScript or Python, there are bindings for those. The osmcode site has other tools such as osmcoastline, which extracts coastline data from a PBF format file.

As an alternative for Java developers, there’s Osmosis, which is a command-line application for processing OSM data. It includes components for reading/writing databases and files, deriving/applying changes to data sources, and sorting data; it can be extended with new components.

Given the (typically) multi-gigabyte size of the datasets at hand, it’s unlikely you’ll be reading files completely into memory to work on them, unless you have masses of RAM (at least 32 GB or even higher). More typical are extraction utilities. If you use GIS (Graphical Information Systems), such as the excellent open-source QGIS, you can import data into it from OSM.


Given the amazing quantity of data available on OpenStreetMap, the platform is truly a playground for anyone interested in geo-location and mapping.

Upload Your ResumeEmployers want candidates like you. Upload your resume. Show them you’re awesome.

Image: OpenStreetMap