All posts by Matt

Mapping the melting pot: plotting historical census data on a map

Our recently-launched interactive map is based on the 1881 census data, and offers facilities to search Govan’s immigrant population of the day by surname or birth nationality. Each census record refers to an individual person living at a given address at the time of the census, with records grouped into households. These data, however, presented a number of technical challenges as we worked to plot them on a modern Google Map.

Irish in Govan, 1881

First, the geography of the parish of Govan has changed quite significantly over the years. While it might be unfair to suggest that Glasgow enjoys a propensity for tearing down its past, it is undeniable that large portions of the parish – which includes the entirely redeveloped Gorbals area in the south east of the city – has seen significant regeneration in the period since the 1881 census. So, the modern Google Map doesn’t entirely resemble the geography to which the data refers. A straightforward part of the solution was to use the National Library of Scotland’s excellent Historic Maps API*, which allowed us to layer a slightly more contemporaneous map (from the early twentieth century) over the jarringly modern Google version. This gives a more authentic flavour to the presentation, but does not address the altogether trickier issue of street names that have changed over the years.

The process of geocoding refers to converting a street address, such as those recorded in the census, into geographic coordinates, such a latitude and longitude, which may be plotted on a map. For this process to work, the Google Geocoding API, the service we used to generate our latitude and longitude values, must recognise the address. For those streets that had been renamed since 1881, then, our research team had to generate a sort of gazetteer that linked the original 1881 street names with their modern, Google-friendly equivalents. With this conversion done, we were ready to begin the process of geocoding.

Barr in Govan, 1881Now, however, we faced problems with the sheer quantity of data: our original Govan dataset comprised nearly 200,000 census records, while the Google Geocoding API imposes a daily limit of 2,500 geocoding requests. Maths might not be my strong suit, but one of those numbers is significantly greater than the other. The solution was to first rationalise the data and geocode only unique addresses, since each address might refer to a six apartment tenement building with each apartment housing families of various sizes and compositions. Then, we had to be patient. A script was written to go off and geocode around 2,500 addresses at a time, and the script run each day until all of our address records possessed latitude and longitude values.

With the geocoding complete, the remaining challenges related only to determining exactly how we should present the data, and manually correcting some of the latitude and longitude values generated by Google. I say “only”, but it is the finer points of how the map looks and feels that, arguably, require the most significant thought and discussion…

* An application programming interface (API) is a means by which a software-based service such as Google Maps may be access by other website and applications.