Friday, February 28, 2025

General Purpose to Special Purpose

I've been working on DMS, my driving management system. It's a "basic app" with a primary goal, tell me how far each of the cities are along the route when I'm taking a road trip. 


I started with using Google Distance Matrix API, however after a heavy usage month and having to pay, I thought, I have computers, maybe I can find a way to do that myself. Enter Open Street Maps. Cool, but well.....I am gonna need ram.

OpenStreetMap - Wikipedia
Historically, I used a raspberry pi for the whole operation. Actually scratch that, in the beginning I was using Python everywhere and single page apps, which meant I didn't need my own host, but I also had to go keep checking in on that app, and decided I'd go local.

Ok, back on topic, the title of this post is General Purpose to Special Purpose, and well this is to talk about my journey of using more general purpose compute than was necessary, and slowly but surely shrinking it to my special purpose needs.

If you look up the requirements to run the Open Street Maps Routing Machine (OSRM) , for the USA, it's something less than 64 GB but over 50 GB which, if I am being honest...is a bit bigger than my raspberry pi has in ram (Raspberry Pi 4 with 4 GB of ram). 

As a rule of thumb, the RAM needed to run OSRM is about 5x the file size of the map you are using. So for a map of the USA us-latest.osm.pbf that is 9.3 GB in size, you'll need about 50 GB of memory*

 My first iteration was to use my laptop to host both OSRM, and Nominatim, a lat/lon to city and vice versa lookup tool. This worked for my needs, but means I gotta lug that monster around and waste energy and all that stuff....so what to do, what to do.

Well first up. Was trying to figure out if I can use different databases. Like what if I sorted out that I'm only going to be in one state the entire time so I would only need to load one state worth of data. Sure, that'll save me some memory, but what if I'm crossing multiple States now? We're starting to get into territory where we're taking up a lot of space again. It's still significantly smaller than the entire United States but still problematic.

This is where things get really interesting to me. You see you can carve polygons out of this data. In fact, that's all that's being done when we get a subset of the entire world. It's just a polygon cut out of it.

Then, I got to thinking what if I took the route that I'm planning on going and cut that route out. That would be significantly smaller so I tried it. It didn't work. Just to be clear, I'm not saying it couldn't work. I'm just saying that it didn't. So what I did was add a 50 mile border around my entire route. This still saved me tons of memory and enabled me to work within the bounds of my Raspberry Pi's memory.

 

Region

Disk Space Size**

Estimated Ram

United States of America

10.2 GB

51 GB

US Midwest

2.0 GB

10 GB

US Northeast 

1.5 GB

7.5 GB

US Pacific

152 MB

760 GB

US South

3.5 GB

17.5 GB

US West

2.8 GB

14 GB

Colorado

304 MB

1.5 GB

Utah

134 MB

670 MB

My route crossing 5 states

648 MB

3.2 GB

My route crossing 5 states (using route polygon)

439 MB

2.1 GB


What is also interesting is if I pre-plan all my routes that I'm planning on using ahead of time I can trade hard drive memory for RAM memory. When I'm ready to take a particular trip I just swap out the database and I'm off and running. And let me tell you I can throw a big hard drive on a Raspberry Pi.

If you look in the table above you can see that for me to cross 5 states, I was able to do it in 439 MB of hdd space as opposed to the unoptimized version taking up 648 MB. This doesn’t seem like a lot but when you multiply that delta by 5 it’s a lot.


 

I should also note, While crossing 5 states and just loading those states up, would fit in the Pi’s memory, we also need to be able to run nominatim, which requires around 1GB of RAM. I did try to use the route polygon carved version for Nominatim but I was having lots of issues finding lat/lon < -- > city mappings which my tool really needs.

By shrinking the data we went from using a general purpose solution to a special purpose solution and it was able to work for our needs.

I hope you enjoyed hearing about my journey!

* https://blog.afi.io/blog/hosting-the-osrm-api-on-amazon-ec2-running-osrm-backend-as-a-web-service/ 

** Source: https://download.geofabrik.de/north-america/us.html