site banner

Friday Fun Thread for March 3, 2023

Be advised: this thread is not for serious in-depth discussion of weighty topics (we have a link for that), this thread is not for anything Culture War related. This thread is for Fun. You got jokes? Share 'em. You got silly questions? Ask 'em.

4
Jump in the discussion.

No email address required.

I've spent the weekend playing around with geodata. My original idea was simple: I wanted to draw a fantasy map of Russia. For that I needed a vector map of population density. Russian federal subjects are huge, so I needed a more granular map. I couldn't find one I could load into QGIS, so I thought to myself: how hard can it be?

  1. I downloaded a data dump for Russia from OSM.

  2. I converted it into Spatialite to query for admin boundaries.

  3. Turns out the JDBC driver for SQLite doesn't work in DBeaver when you want to load a native extension. Like Spatialite. Or rather, it works in the paid version of DBeaver. DBeaver itself recognizes the geometry field, but I can't run any geo functions.

  4. While I was screwing around with the data in DBeaver I realized it contained pieces of other countries. No, I am not talking about Crimea.

  5. I gave up on Spatialite and installed PostGIS. This allowed me to check for weird intersections between admin boundaries. None found. Good job, OSM mappers!

  6. I also realized I needed to somehow link the data to the census data to get the population. My only connection was going to be Wikidata.

  7. Turns out 12 admin boundaries in OSM don't have a Wikidata link. I actually went and fixed them in OSM.

  8. I then queried Wikidata for each boundary and got a bunch of JSON I parsed.

  9. Then I finally got around to the big problem of Russia having two types of boundaries: admin (OKATO) and municipal (OKTMO). They match, but not quite. Oh well, that's a problem for future orthoxerox to solve.

  10. But at the very least I now had a list of (presumably admin) boundaries from OSM I could join with Wikidata data, netting me a bunch of OKATO and OKTMO codes.

  11. I downloaded the list of OKATO codes and joined it with my boundaries for a sanity check. Found a few more admin boundaries from China and Japan and a few Wikidata entries without OKATO codes. Something for me to fix later.

  12. What's worse, a few small pieces of Russia were missing in the OSM dump. I missed them because I loaded the vector layer in QGIS, took a look at it and thought: "yep, looks like Russia to me".

  13. Now I am trying to find out what went wrong with the dump, but OSM forum authentication decided to die as soon as I registered there.

All in all, pretty entertaining. Now that I'm blocked from proceeding further, I can go look at the way DuckDB reads legacy Parquet timestamps.

I've never worked with GIS software before, so I don't know what they are usually capable of.

But I did work on a nasty boundary polygon problem in the past and the turfpy (I think the original package was made for node) package along with the geojson package in python was tremendously useful for sorting out some of the issues that were similar to yours.

Thank you, I'm pretty happy with the tools I have. PostGIS and QGIS have got me covered, it's the data quality that got me this time.