Holiday tinkering – R & Tableau – #1

Over the holidays I’ve decided that I should try and teach myself something new.  I rather like data visualisation and also have an interest in maps, so I went looking for something to bring together these two areas.  Down the track I might rue not having an actual question to answer , but after listening to a podcast with Ben Wellington from I Quant NY who described how he sometimes started without a specific goal in mind, I might be ok…

My starting point is the PSMA Geocoded National Address File (G-NAF) dataset which contains more than 13 million Australian physical address records including latitude and longitude map coordinates.  My initial aim is to create a subset of the data in my local area (Hornsby NSW 2077, so I can add context by proximity & familiarity) using both R and Tableau.

Using some of the scripts that come with the data as a guide I manage to load the address data to a MYSQL database on an old home NAS.  I then create a view with a subset of fields which might let me add some colour to the initial plots.  For now I extract some data in a semicolon delimited text file based on a subset of postcodes in my local area. (I should look to connect directly to the data at a later stage.)

First Plots

First plots were in R, using both Google Maps and Open Street Map (OSM) and colouring by suburb.

R plot 1 – Google Maps R plot 1 – OSM

While the Google Maps are more familiar to me, I couldn’t work out how to enlarge the area to include all of the data points (without just having a much larger square) – the zoom function is a bit course.  The area for the OSM map however could be specified by minimum and maximum latitude and longitude resulting in a map that contained all the data points for these postcodes.

What’s next?

There are a couple of things which I’d like to look at next:

  • Does the G-NAF dataset have any interesting features (aside from having every address in the country) I can investigate
  • What other data sets can I start to merge in to add more insight? e.g. can I get shape files that describe the boundaries between suburbs?

If you’ve any suggestions please let me know!