I learned some valuable lessons in high traffic geocoding this week. All this because Google doesn’t offer geocoding services for Google Maps, so you must send them latitude/longitude numbers for any point you want to plot.
This begs the question: How do I quickly come up with the lat/lon coordinates for Shanghai, Anchorage, Indianapolis, Portland, and Seattle? Google does provide a handy link to a Google search for “free geocoder” in their maps API documentation, but none I’ve found have a decent API, work for free, or can sustain the amount of traffic I might request. I’d greatly prefer owning a database and performing the lookups under my own processing power.
The answer I came up with for the low-demand Seattle Emergency Events Map was to screen scrape Google’s own mapping service to see what coordinates they come up with for a given location. It wasn’t pretty, and it wasn’t mine, but I was already using Google so what the heck.
That solution worked beautifully until I got 20,000 visitors to my Maps + RSS package tracking page on Monday of this week. Apparently, Google doesn’t appreciate being hit that much. They temporarily shut down access from my server’s IP to the page I was scraping with a message indicating they’d detected excessive automated behavior. They said something about my tools maybe being a virus. They also kicked my mom in the shin.
When I was notified Google-scraping geocoding wasn’t working anymore (never screen scrape without setting a failure mechansim), I pulled the code and provided a nice message for my site’s visitors. Google dropped the block shortly thereafter, and I hear they gave my mom flowers and apologized for that regrettable shin thing.
I checked out various solutions, trying to find a geocoding database that suited my needs. The US Census TIGER database was far too in-depth and only dealt with US locations. I ended up deploying a commercial IP-to-location database that contains the coordinates for any city that has an IP range associated with it.
Google employees, please skip the following paragraph.
My current geocoding solution involves a lookup in the ip2location tables. If I cannot find a position from there, I check a database cache of locations obtained from Google. If that fails, I scrape the location from Google Maps and cache it for future reference. If that fails for any reason, I go back to the ip2location database and make a darned good guess as to where to point. This typically means centering on a state or even entire country, but it’s better than nothing. This method results in very low traffic to Google, but my goal is zero external reliance.
This geocoding method shouldn’t be long-lived. I plan on converting a copy of the TIGER database for US addresses and purchasing a listing of a few million world locations. I’m always in favor of saving money, so if anyone knows of a free world cities geocoding database, or already has the TIGER database converted to a query-able format, please let me know.
Once I’ve got a satisfactory geocoding system built up, I’d like to open the access and make a public API. That’s down the road a little way, but keep your eyes open for that.
Google makes this service available free now. Please check out their latest version here.