Applying a circular moving window filter to raster data in R

The raster package for R provides a variety of functions for the analysis of raster GIS data. The focal() function is very useful for applying moving window filters to such data. I wanted to calculate a moving window mean for cells within a specified radius, but focal() did not provide a built-in option for this. The following code generates an appropriate weights matrix for implementing such a filter, by using the matrix as the w argument of focal().

require(raster)
#function to make a circular weights matrix of given radius and resolution
#NB radius must me an even multiple of res!
make_circ_filter<-function(radius, res){
  circ_filter<-matrix(NA, nrow=1+(2*radius/res), ncol=1+(2*radius/res))
  dimnames(circ_filter)[[1]]<-seq(-radius, radius, by=res)
  dimnames(circ_filter)[[2]]<-seq(-radius, radius, by=res)
  sweeper<-function(mat){
    for(row in 1:nrow(mat)){
      for(col in 1:ncol(mat)){
        dist<-sqrt((as.numeric(dimnames(mat)[[1]])[row])^2 +
          (as.numeric(dimnames(mat)[[1]])[col])^2)
        if(dist<=radius) {mat[row, col]<-1}
      }
    }
    return(mat)
  }
out<-sweeper(circ_filter)
return(out)
}

This example uses a weighs matrix generated by make_circ_filter() to compute a circular moving average on the Meuse river grid data. For a small raster like this, the function is more than adequate. For large raster datasets, it’s quite slow though.

#make a  circular filter with 120m radius, and 40m resolution
cf<-make_circ_filter(120, 40)

#test it on the meuse grid data
f <- system.file("external/test.grd", package="raster")
r <- raster(f)

r_filt<-focal(r, w=cf, fun=mean, na.rm=T)

plot(r, main="Raw data") #original data
plot(r_filt, main="Circular moving window filter, 120m radius") #filtered data


Mapping georss data using R and ggmap

Readers might recall my earlier efforts at using R and python for geolocation and mapping of realtime fire and emergency incident data provided as rss feeds by the Victorian Country Fire Authority (CFA). My realisation that the CFA’s rss feeds are actually implemented using georss (i.e. they already contain locational data in the form of latitudes and longitudes for each incident), makes my crude implementation of a geolocation process in my earlier python program redundant, if not an interesting learning experience.

I provide here an quick R program for mapping current CFA fire and emergency incidents from the CFA’s georss, using the excellent ggmap package to render the underlying map, with map data from google maps.

Here’s the code:

library(ggmap)
library(XML)
library(reshape)

#download and parse the georss data to obtain the incident locations:
cfaincidents<-xmlInternalTreeParse("http://osom.cfa.vic.gov.au/public/osom/IN_COMING.rss")
cfapoints <- sapply(getNodeSet(cfaincidents, "//georss:point"), xmlValue)
cfacoords<-colsplit(cfapoints, " ", names=c("Latitude", "Longitude"))

#map the incidents onto a google map using ggmap
library(ggmap)
library(XML)
library(reshape)

#download and parse the georss data to obtain the incident locations:
cfaincidents<-xmlInternalTreeParse("http://osom.cfa.vic.gov.au/public/osom/IN_COMING.rss")
cfapoints <- sapply(getNodeSet(cfaincidents, "//georss:point"), xmlValue)
cfacoords<-colsplit(cfapoints, " ", names=c("Latitude", "Longitude"))

#map the incidents onto a google map using ggmap
png("map.png", width=700, height=700)
timestring<-format(Sys.time(), "%d %B %Y, %H:%m" )
titlestring<-paste("Current CFA incidents at", timestring)
map<-get_map(location = "Victoria, Australia", zoom=7, source="google", maptype="terrain")
ggmap(map, extent="device")+ 
  geom_point(data = cfacoords, aes(x = Longitude, y = Latitude), size = 4, pch=17, color="red")+
  opts(title=titlestring)
dev.off()

And here’s the resulting map, showing the locations of tonight’s incidents. Note that this is a snapshot of incidents at the time of writing, and should not be assumed to represent the locations of incidents at other times, or used for anything other than your own amusement or edification. The authoritative source of incident data is always the CFAs own website and rss feeds.

Using R for spatial sampling, with selection probabilities defined in a raster

The raster package for R provides a range of GIS-like functions for analysing spatial grid data. Together with package sp, and several other spatial analysis packages, R provide a quite comprehensive set of tools for manipulating and analysing spatial data.

I needed to randomly select some locations for field sampling, with inclusion probabilities based on values contained in a raster. The code below did the job very easily.

library(raster)

#an example raster from the raster package
f <- system.file("external/test.grd", package="raster")
r<-raster(f)

plot(r)

#make a raster defining the desired inclusion probabilities 
#for the all locations available for sampling
probrast<-raster(r)
#inclusion probability for cells with value >=400 
#will be 10 times that for cells with value <400
probrast[r>=400]<-10 
probrast[r<400]<-1
#normalise the probability raster by dividing 
#by the sum of all inclusion weights:
probrast<-probrast/sum(getValues(probrast), na.rm=T)

#confirm sum of probabilities is one
sum(getValues(probrast), na.rm=T)

#plot the raster of inclusion probabilities
plot(probrast, col=c(gray(0.7), gray(0.3)))

#a function to select N points on a raster, with 
#inclusion probabilities defined by the raster values.
probsel<-function(probrast, N){
  x<-getValues(probrast)
  #set NA cells in raster to zero
  x[is.na(x)]<-0
  samp<-sample(nrow(probrast)*ncol(probrast), size=N, prob=x)
  samprast<-raster(probrast)
  samprast[samp]<-1 #set value of sampled squares to 1
  #convert to SpatialPoints
  points<-rasterToPoints(samprast, fun=function(x){x>0}) s
  points<-SpatialPoints(points)
 return(points)
}

#select 300 sites using the inclusion probabilities 
#defined in probrast
samppoints<-probsel(probrast, 300)
plot(probrast, col=c(gray(0.7), gray(0.3)), axes=F)
plot(samppoints, add=T, pch=16, cex=0.8, col="red")

Here’s the result. Note the higher density of sampled points (red) within the parts of the raster with higher inclusion probability (dark grey).

Things to learn this year

ggplot – I think this R package has the potential to make my graphics code more concise, quicker to write, and easier to modify and re-use. Hopefully a bit of time invested in learning the syntax will pay off in the form of nicer graphs and neater, easier to modify later code.

git – using a proper version control system should make tinkering with existing code, trying new things, and maintaining complex collections of code easier. Github will facilitate sharing code where desired.

Rough-and-ready geolocation using python and R

The good folks at GeoScience Australia provide a comprehensive set of Australian gazetteer data for free download from their website. Using R and python, I constructed a simple geolocation application to make use of this data. I used the data in the gazetteer to determine the geographic locations of incidents reported by the Country Fire Authority in the rss feed of current incidents provided on their website.

First, I used the sqlite database facilities provided in R, to construct a new sqlite database (gazetteer.db) containing the downloaded gazetteer data. This could just as easily have been done in python, but R served my purposes well:

#R code to read in the gazetteer data and build an sqlite database table for it.
gazdata<-read.csv("Gazetteer2010_txt.csv", header=FALSE)
names(gazdata)<-c("ID_num", "ID_code", "Authority_ID", "State_ID", "Name", "Feature_Code", "Status", "Postcode", "Concise_Gazetteer", "Longitude", "LongDeg", "LongMin", "LongSec", "Latitude", "LatDeg", "LatMin", "LatSec", "Map_100K", "CGDN", "Something")
library(DBI)
library(RSQLite)
system('sqlite3 gazetteer.db', wait=FALSE)
driver<-dbDriver("SQLite")
connect<-dbConnect(driver, dbname="gazetteer.db")
dbWriteTable(connect, "places", gazdata, overwrite=T, row.names=F, eol="\r\n")
dbListTables(connect)
sqliteCloseConnection(connect);
sqliteCloseDriver(driver)

Next, I wrote a python script to download the rss feed, extract the incident locations (both using the feedparser module for python), match the locations with the place names listed in the gazetteer database (using the sqlite3 module of python), and plot a map (in png format) of the incident locations, by calling R from python, using the rpy2 module:

#! /usr/bin/env python
import feedparser
import rpy2.robjects as robjects
from sqlite3 import *
from time import strftime

#download incident feed using feedparser module
feed=feedparser.parse("http://osom.cfa.vic.gov.au/public/osom/IN_COMING.rss")
NumInc=len(feed.entries) #number of incidents
updatetime=strftime("%a, %d %b %Y %H:%M", feed.updated)  #time feed was updated

#step through incidents and extract location
incidents=[""]*NumInc
for i in range(NumInc):
	inc=feed.entries[i].title
	inc =inc.split(',')[0] #strips out just what is before the comma (usually town/locality)
	incidents[i] =inc.title() #make first letter of each word UC.

#connect to sqlite database of Australian place names
conn=connect('gazetteer.db')
curs=conn.cursor()

#run query and store lats and longs of incident locations...
lat=[""]*NumInc #storage for latitudes
long=[""]*NumInc #storage for longitudes
misses=0 #counter for incident locations not matched in db.
misslist=list() #list to store locations not found in db
#query location of each incident and find latitude and longitude of best-match location
query='select Latitude,Longitude from places where \
Name LIKE ? AND State_ID="VIC" AND \
(Feature_Code="RSTA" OR Feature_Code="POPL" OR Feature_Code="SUB" OR Feature_Code="URBN" OR Feature_Code="PRSH" OR Feature_Code="BLDG")'
for k in range(NumInc):
	t=('%'+incidents[k]+'%',) #match using "like" with wild cards for prefix/suffix of string
	curs.execute(query, t)
	get=curs.fetchone()
	if get is not None: #check if any rows returned (i.e. no matched to locations), only assign result if exists
		lat[k] = get[0]
		long[k]=get[1]
	if get is None:
		misslist.append(incidents[k])
		misses=misses+1
missstring='\n'.join(misslist) #convert list of unmatched locations to a string

#use Rpy2 module and R to plot a nice annotated map of locations to a png file
r = robjects.r
r.library("oz")
r.png("incident_map.png", width=800, height=600)
r.vic()
r.points(y=lat, x=long, col="red", pch=16)
r.text(y=lat, x=long, labels=incidents, adj=1.1, col="red", cex=0.85)
r.axis(1)
r.axis(2, at=r.seq(-34, -39),labels=r.seq(34, 39), las=1)
r.title(r.paste(NumInc+1, "CFA incidents @",updatetime))
r.text(x=148.5, y=-33.6, labels=r.paste(misses," unmapped incidents:"))
r.text(x=148.5, y=-34,labels=r.paste(missstring))
r.box()
r['dev.off']()

The script works nicely, although some incident locations aren’t found in the database due to spelling errors, unusual formatting, or omission of locations from the geoscience australia data. I included some code to list the unmatched locations beside the map, for easy reference.

Here’s a map of tonight’s incidents:

State-space occupancy model using PyMC

I’ve continued my experimentation with PyMC, using it to fit occupancy models to wildlife survey data with imperfect detectability. Chris Fonnesbeck has provided code for a simple, single-survey occupancy model here, which provides a good starting point for experimentation. I wanted to construct my model using the alternative, state-space parameterisation for occupancy models described by Royle and Kéry(2007). Unlike the multi-season, dynamic occupancy model described by Royle and Kéry, I am only fitting a single-season occupancy model, where the site-states (occupied or unoccupied) are assumed to be constant. The model uses a hierarchical approach, where sites are occupied with probability \psi, and the true occupancy states of the sites, z are inferred from repeated surveys at each site based on a probabilistic detection model (in this case a simple Bernoulli model, with conditional probability of detection at each survey p). Fitting this model using MCMC has the advantage that a finite-sample estimate of occupancy rates among the sampled sites can be easily computed by sampling from \sum z.

from pymc import *
from numpy import *

"""
Alternative implementation of single season occupancy estimation for the Salamander data (from MacKenzie et al. 2006), using a 
state-space approach.

Modified from original example code and data provided by Chris Fonnesbeck at https://github.com/pymc-devs/pymc/wiki/Salamanders
"""


# Occupancy data - rows are sites, with replicate surveys conducted at each site
salamanders = array([[0,0,0,1,1], [0,1,0,0,0], [0,1,0,0,0], [1,1,1,1,0], [0,0,1,0,0], 
                     [0,0,1,0,0], [0,0,1,0,0], [0,0,1,0,0], [0,0,1,0,0], [1,0,0,0,0], 
                     [0,0,1,1,1], [0,0,1,1,1], [1,0,0,1,1], [1,0,1,1,0], [0,0,0,0,0], 
                     [0,0,0,0,0], [0,0,0,0,0], [0,0,0,0,0], [0,0,0,0,0], [0,0,0,0,0], 
                     [0,0,0,0,0], [0,0,0,0,0], [0,0,0,0,0], [0,0,0,0,0], [0,0,0,0,0], 
                     [0,0,0,0,0], [0,0,0,0,0], [0,0,0,0,0], [0,0,0,0,0], [0,0,0,0,0], 
                     [0,0,0,0,0], [0,0,0,0,0], [0,0,0,0,0], [0,0,0,0,0], [0,0,0,0,0], 
                     [0,0,0,1,0], [0,0,0,1,0], [0,0,0,0,1], [0,0,0,0,1]])

# Number of replicate surveys at each site
k = 5

#number of detections at each site (row sums of the data)
y=salamanders.sum(axis=1)

#vector of known/unknown occupancies to provide sensible starting values for latent states, z.
#Equal to 1 if at least 1 detection, otherwise zero.
z_start = y>0

# Prior on probability of detection
p = Beta('p', alpha=1, beta=1, value=0.99)

# Prior on probability of occupancy
psi = Beta('psi', alpha=1, beta=1, value=0.01)

#latent states for occupancy
z = Bernoulli('z', p=psi, value=z_start, plot=False)

#Number of truly occupied sites in the sample (finite-sample occupancy)
@deterministic(plot=True)
def Num_occ(z=z):
   out = sum(z)
   return out

#unconditional probabilities of detection at each site (zero for unoccupied sites, p for occupied sites)
@deterministic(plot=False)
def pdet(z=z, p=p):
    out = z*p
    return out

#likelihood
Y = Binomial('Y', n=k, p=pdet, value=y, observed=True)

Fitting of the model was accomplished by running the following code, which constructs the model, collects some MCMC samples for parameters of interest, and generates plots of the results:

from pylab import *
from pymc import *

import model

#get all the variables from the model file
M = MCMC(model)

#draw samples
M.sample(iter =40000, burn = 20000, thin = 5)

#plot results
Matplot.plot(M)

Her are some summary plots (traces and histograms) for the MCMC samples of the parameters \psi, p and \sum z

The same model can easily be fitted using OpenBUGS, with comparable results..

model{
psi~dbeta(1, 1)
p~dbeta(1, 1)
for(i in 1:sites){
   z[i]~dbern(psi)
   pdet[i]<-p*z[i]
  for(j in 1:surveys){
 Y[i,j]~dbern(pdet[i])
   }
  }
}

data
list(
surveys=5,
sites=39,
Y=structure(.Data = 
c(0,0,0,1,1, 
0,1,0,0,0, 
0,1,0,0,0, 
1,1,1,1,0, 
0,0,1,0,0, 
0,0,1,0,0, 
0,0,1,0,0, 
0,0,1,0,0, 
0,0,1,0,0, 
1,0,0,0,0, 
0,0,1,1,1, 
0,0,1,1,1, 
1,0,0,1,1, 
1,0,1,1,0, 
0,0,0,0,0, 
0,0,0,0,0, 
0,0,0,0,0, 
0,0,0,0,0, 
0,0,0,0,0, 
0,0,0,0,0, 
0,0,0,0,0, 
0,0,0,0,0, 
0,0,0,0,0, 
0,0,0,0,0, 
0,0,0,0,0, 
0,0,0,0,0, 
0,0,0,0,0, 
0,0,0,0,0, 
0,0,0,0,0, 
0,0,0,0,0, 
0,0,0,0,0, 
0,0,0,0,0, 
0,0,0,0,0, 
0,0,0,0,0, 
0,0,0,0,0, 
0,0,0,1,0, 
0,0,0,1,0, 
0,0,0,0,1, 
0,0,0,0,1),
 .Dim=c(39,5))
)