The tool which imports data in to Postgres is called osm2pgsql. The OSM parsing is handled by libosmium, a header only library for parsing OSM files. It contains multiple backends, all of which except Postgres will be ignored.

osm2pgsql passes each file to libosmium, which memory maps the file, decodes each block, and executes a callback for each node, way, and relation as it is decoded.

If data is compressed, it must be decompressed before being used. If data is not compressed, a lot of it can be used "in place".

As each node is processed, it is added to up to 3 places:

  • An in memory cache mapping ID -> Location
  • A temporary database table, planet_osm_nodes, mapping ID -> Location. This is the fallback for the cache
  • Potentially a permanent database table, planet_osm_points, mapping ID -> Location + Tags. First tags are filtered, and then if certain tags are present, it will be added to the database, otherwise it is ignored.

As each way is processed, it is added to a number of potential places:

  • A temporary database table, planet_osm_ways, mapping ID -> List of Node IDs + Tags
  • After filtering of tags, if certain tags are present, it will ask the cache for all the locations of the nodes in the way. If a location can't be found in the cache, it falls back to planet_osm_nodes. Remember that exports may not contain all the data, so it is expected that sometimes a location can never be looked up. In this case, osm2pgsql acts as though the node wasn't in the way at all.
  • If more than 2 nodes are successfully found, the way is then added to either planet_osm_lines or planet_osm_polygons depending on whether it is a line or an area. Some lines may also be added to planet_osm_roads, a special table for different zoom levels.

As each relation is processed, it goes through a similar process:

  • Added to a temporary planet_osm_rels table, mapping ID -> Node IDs+Way IDs+Rel IDs+Tags
  • If the relation passes the filter, all ways contained in it are pulled from the planet_osm_ways table and processed further, typically as a polygon with holes in it, or a polygon consisting of multiple ways.

After the PBF file is fully processed, osm2pgsql then potentially deletes the planet_osm_nodes, planet_osm_ways, and planet_osm_rels tables and kicks off an indexing job for the remaining 4 tables (it can keep the temporary tables and index all 7, but this is not required for our usage.)

At this point, it's done.