The good folks at GeoScience Australia provide a comprehensive set of Australian gazetteer data for free download from their website. Using R and python, I constructed a simple geolocation application to make use of this data. I used the data in the gazetteer to determine the geographic locations of incidents reported by the Country Fire Authority in the rss feed of current incidents provided on their website.
First, I used the sqlite database facilities provided in R, to construct a new sqlite database (gazetteer.db) containing the downloaded gazetteer data. This could just as easily have been done in python, but R served my purposes well:
#R code to read in the gazetteer data and build an sqlite database table for it.
names(gazdata)<-c("ID_num", "ID_code", "Authority_ID", "State_ID", "Name", "Feature_Code", "Status", "Postcode", "Concise_Gazetteer", "Longitude", "LongDeg", "LongMin", "LongSec", "Latitude", "LatDeg", "LatMin", "LatSec", "Map_100K", "CGDN", "Something")
system('sqlite3 gazetteer.db', wait=FALSE)
dbWriteTable(connect, "places", gazdata, overwrite=T, row.names=F, eol="\r\n")
Next, I wrote a python script to download the rss feed, extract the incident locations (both using the feedparser module for python), match the locations with the place names listed in the gazetteer database (using the sqlite3 module of python), and plot a map (in png format) of the incident locations, by calling R from python, using the rpy2 module:
#! /usr/bin/env python
import rpy2.robjects as robjects
from sqlite3 import *
from time import strftime
#download incident feed using feedparser module
NumInc=len(feed.entries) #number of incidents
updatetime=strftime("%a, %d %b %Y %H:%M", feed.updated) #time feed was updated
#step through incidents and extract location
for i in range(NumInc):
inc =inc.split(',') #strips out just what is before the comma (usually town/locality)
incidents[i] =inc.title() #make first letter of each word UC.
#connect to sqlite database of Australian place names
#run query and store lats and longs of incident locations...
lat=[""]*NumInc #storage for latitudes
long=[""]*NumInc #storage for longitudes
misses=0 #counter for incident locations not matched in db.
misslist=list() #list to store locations not found in db
#query location of each incident and find latitude and longitude of best-match location
query='select Latitude,Longitude from places where \
Name LIKE ? AND State_ID="VIC" AND \
(Feature_Code="RSTA" OR Feature_Code="POPL" OR Feature_Code="SUB" OR Feature_Code="URBN" OR Feature_Code="PRSH" OR Feature_Code="BLDG")'
for k in range(NumInc):
t=('%'+incidents[k]+'%',) #match using "like" with wild cards for prefix/suffix of string
if get is not None: #check if any rows returned (i.e. no matched to locations), only assign result if exists
lat[k] = get
if get is None:
missstring='\n'.join(misslist) #convert list of unmatched locations to a string
#use Rpy2 module and R to plot a nice annotated map of locations to a png file
r = robjects.r
r.png("incident_map.png", width=800, height=600)
r.points(y=lat, x=long, col="red", pch=16)
r.text(y=lat, x=long, labels=incidents, adj=1.1, col="red", cex=0.85)
r.axis(2, at=r.seq(-34, -39),labels=r.seq(34, 39), las=1)
r.title(r.paste(NumInc+1, "CFA incidents @",updatetime))
r.text(x=148.5, y=-33.6, labels=r.paste(misses," unmapped incidents:"))
The script works nicely, although some incident locations aren’t found in the database due to spelling errors, unusual formatting, or omission of locations from the geoscience australia data. I included some code to list the unmatched locations beside the map, for easy reference.
Here’s a map of tonight’s incidents: