A simple method for including Māori vowels in R plots

My Kiwi buddy Andrew Gormley was having trouble including the Māori language vowels with macrons (ā, ē, ī, ō, ū) in his R plots.

I wrote a quick R function “maorify.r” (code in the gist below), which provides a simple method for including these characters in R plots without having to type out the unicode in full each time. I’m sure there’s a simpler or more general purpose way to do this, but it does work. Perhaps it might be useful to anyone analysing Kiwi data with R.






Tracking #365 papers with IFTTT

After vacillating for a little while, I’ve decided to get aboard the #365papers bandwagon on Twitter.

The idea behind this initiative (with credit for originating and popularising the idea largely attributable to Jacquelyn Gill and Meghan Duffy), is that researchers make a New Year’s resolution to read a scientific paper each day for a year, and tweet about each paper using the hashtag #365papers. By setting a target for reading of papers, participating researchers are encourage to read more, and to share interesting and topical papers with their followers on Twitter.

It’s now a week into 2016, so I’m a late starter. Maybe #365papers is a little ambitious for me, and I’ll only read #100papers or #52papers, but it seems like a fun and challenging idea I’d like to try.

Having a target is one thing, but how to keep track and measure progress towards the goal? Ideally I’d like to be able to keep track of what papers I have read and when I tweeted about them, and thereby track my progress towards my goal of 365 papers for the year. One simple solution would be to maintain a spreadsheet or database, and to manually enter the details of each paper when tweeting.  That’s how Meghan kept track of her #365papers progress, and it seems to work well. But perhaps there’s a better, more automated way to achieve the same result?

I’ve long been a fan of the web service IFTTT. Simply put, this free service provides a framework for linking together the different web services in intelligent ways, so as to automate common (and not-so-common) tasks. There’s an endless range of useful things you can do with IFTTT, from having it email you when you are mentioned on Facebook, to sending you an SMS when there is a storm warning for your area, or alerting you to interesting things to buy on Ebay.

The power of IFTTT is it’s integration with an enormous range of popular web services. Naturally enough, IFTTT includes integration with Twitter, and it also includes integration with Google Drive’s online spreadsheet software. To solve my #365papers tracking needs, I have set up an IFTTT recipe that monitors my twitter feed for tweets I have made that include the hashtag #365papers. When the IFTTT detects such a tweet, it copies the details of the tweet (time, date, text etc) into a new row on a google spreadsheet I have set up. The spreadsheet will now contain an automatic, up-to-date record of my #365papers  progress (or lack thereof!). If my lack of progress isn’t too embarrassing, I might share some details of how I go in another post.

Screenshot 2016-01-07 14.15.26

Details of my IFTTT recipe to track my #365papers progress

Screenshot 2016-01-07 16.22.45

….and the resulting spreadsheet on GoogleDrive


Adding phylopic.org silhouettes to R plots

Over at phylopic.org there is a large and growing collection of silhouette images of all manner of organisms – everything from Emus to Staphylococcus. The images are free (both in cost, and to use), are available in vector (svg) and raster (png) formats at a range of resolutions, and can be searched by common name, scientific name and (perhaps most powerfully) phylogenetically.

[EDIT: as two commenters have pointed out, not all phylopic images are totally free of all restrictions on use or reuse: some require attribution, or are only free for non-commercial use. It’s best to check before using an image, either directly at the phylopic webpage, or by using the phylopic API]

Phylopic images are useful wherever it is necessary to illustrate exactly which taxon a graphical element pertains to, as pictures always speak louder than words.

Below I provide an example of using phylopic images in R graphics. I include some simple code to automatically resize and position a phylopic png within an R plot. The code is designed to preserve the original png’s aspect ratio, and to place the image at a given location within the plot.

A plot with phylopic logos

I should also point readers to Scott Chamberlain‘s R package fylopic, which provides the ability to make use of the phylopic API from within R, including the ability to search for and download silhouettes programatically.

If you find phylopic useful, I’m sure they would appreciate you providing them with silhouettes of your study species. More information on how to submit your images can be found here.

Applying a circular moving window filter to raster data in R

The raster package for R provides a variety of functions for the analysis of raster GIS data. The focal() function is very useful for applying moving window filters to such data. I wanted to calculate a moving window mean for cells within a specified radius, but focal() did not provide a built-in option for this. The following code generates an appropriate weights matrix for implementing such a filter, by using the matrix as the w argument of focal().

#function to make a circular weights matrix of given radius and resolution
#NB radius must me an even multiple of res!
make_circ_filter<-function(radius, res){
  circ_filter<-matrix(NA, nrow=1+(2*radius/res), ncol=1+(2*radius/res))
  dimnames(circ_filter)[[1]]<-seq(-radius, radius, by=res)
  dimnames(circ_filter)[[2]]<-seq(-radius, radius, by=res)
    for(row in 1:nrow(mat)){
      for(col in 1:ncol(mat)){
        dist<-sqrt((as.numeric(dimnames(mat)[[1]])[row])^2 +
        if(dist<=radius) {mat[row, col]<-1}

This example uses a weighs matrix generated by make_circ_filter() to compute a circular moving average on the Meuse river grid data. For a small raster like this, the function is more than adequate. For large raster datasets, it’s quite slow though.

#make a  circular filter with 120m radius, and 40m resolution
cf<-make_circ_filter(120, 40)

#test it on the meuse grid data
f <- system.file("external/test.grd", package="raster")
r <- raster(f)

r_filt<-focal(r, w=cf, fun=mean, na.rm=T)

plot(r, main="Raw data") #original data
plot(r_filt, main="Circular moving window filter, 120m radius") #filtered data

Mapping georss data using R and ggmap

Readers might recall my earlier efforts at using R and python for geolocation and mapping of realtime fire and emergency incident data provided as rss feeds by the Victorian Country Fire Authority (CFA). My realisation that the CFA’s rss feeds are actually implemented using georss (i.e. they already contain locational data in the form of latitudes and longitudes for each incident), makes my crude implementation of a geolocation process in my earlier python program redundant, if not an interesting learning experience.

I provide here an quick R program for mapping current CFA fire and emergency incidents from the CFA’s georss, using the excellent ggmap package to render the underlying map, with map data from google maps.

Here’s the code:


#download and parse the georss data to obtain the incident locations:
cfapoints <- sapply(getNodeSet(cfaincidents, "//georss:point"), xmlValue)
cfacoords<-colsplit(cfapoints, " ", names=c("Latitude", "Longitude"))

#map the incidents onto a google map using ggmap

#download and parse the georss data to obtain the incident locations:
cfapoints <- sapply(getNodeSet(cfaincidents, "//georss:point"), xmlValue)
cfacoords<-colsplit(cfapoints, " ", names=c("Latitude", "Longitude"))

#map the incidents onto a google map using ggmap
png("map.png", width=700, height=700)
timestring<-format(Sys.time(), "%d %B %Y, %H:%m" )
titlestring<-paste("Current CFA incidents at", timestring)
map<-get_map(location = "Victoria, Australia", zoom=7, source="google", maptype="terrain")
ggmap(map, extent="device")+ 
  geom_point(data = cfacoords, aes(x = Longitude, y = Latitude), size = 4, pch=17, color="red")+

And here’s the resulting map, showing the locations of tonight’s incidents. Note that this is a snapshot of incidents at the time of writing, and should not be assumed to represent the locations of incidents at other times, or used for anything other than your own amusement or edification. The authoritative source of incident data is always the CFAs own website and rss feeds.

Using R for spatial sampling, with selection probabilities defined in a raster

The raster package for R provides a range of GIS-like functions for analysing spatial grid data. Together with package sp, and several other spatial analysis packages, R provide a quite comprehensive set of tools for manipulating and analysing spatial data.

I needed to randomly select some locations for field sampling, with inclusion probabilities based on values contained in a raster. The code below did the job very easily.


#an example raster from the raster package
f <- system.file("external/test.grd", package="raster")


#make a raster defining the desired inclusion probabilities 
#for the all locations available for sampling
#inclusion probability for cells with value >=400 
#will be 10 times that for cells with value <400
#normalise the probability raster by dividing 
#by the sum of all inclusion weights:
probrast<-probrast/sum(getValues(probrast), na.rm=T)

#confirm sum of probabilities is one
sum(getValues(probrast), na.rm=T)

#plot the raster of inclusion probabilities
plot(probrast, col=c(gray(0.7), gray(0.3)))

#a function to select N points on a raster, with 
#inclusion probabilities defined by the raster values.
probsel<-function(probrast, N){
  #set NA cells in raster to zero
  samp<-sample(nrow(probrast)*ncol(probrast), size=N, prob=x)
  samprast[samp]<-1 #set value of sampled squares to 1
  #convert to SpatialPoints
  points<-rasterToPoints(samprast, fun=function(x){x>0}) s

#select 300 sites using the inclusion probabilities 
#defined in probrast
samppoints<-probsel(probrast, 300)
plot(probrast, col=c(gray(0.7), gray(0.3)), axes=F)
plot(samppoints, add=T, pch=16, cex=0.8, col="red")

Here’s the result. Note the higher density of sampled points (red) within the parts of the raster with higher inclusion probability (dark grey).