Faster OpenStreetMap imports 1/n: introduction
As part of generating GPS trail overlays for timelapse videos, I needed to import part of the OSM dataset in to a PostGIS database. After a couple of days struggling with the import, I figured it might be worth understanding the problem space better to optimize the process to my needs.
The rough plan:
- Understand OSM PBF files
- Understand the import process
- Change the system to be more efficient (draw the rest of the owl)
The series of blogs will document the final solution. If you aren't technical, this isn't going to be very interesting. If you don't care about maps, this isn't going to be very interesting. If you don't care about optimizing things, this isn't going to be very interesting. Can you see the pattern? The pattern is not very interesting.
This is a results driven project. The correctness of the PostGIS database is not important, only correct maps at the end. The entire process is free to be manipulated.
That being said, everything in this space is C/C++/Java, none of which are languages I have a strong familiarity with, so the less manipulation in the existing tools, the better.
This took approximately 16 hours to fail.
We have two goals:
- don't fail
- do it in less than 16 hours
Side note: it was able to import when the system had swap enabled, and it took approximately 3 days*. Server is an i7-5820K with 32GB of RAM, with a bunch of stuff running on it. Disk (including swap) is spinning rust. Lowering the cache size helps with crashing, but that makes us swap**, which is exponentially slower, and I aborted it after a week.
* This journey started early September. I had a complete import of the US months ago, it just took 3 days. But that wasn't good enough.
** I use the term swap throughout this blog to refer to disk thrashing in any form, because it's 4 letters ("swap") instead of 14 letters ("disk thrashing")***. Much of the swapping is paging in memory mapped files, which is all a swap file is****
*** It's because of that kind of efficiency that they pay me the big bucks.
**** I know this is wrong, deal with it.