Mosaic time series in R

I really like this chart as featured on flowingdata.com (from www.weathersealed.com).  Here's my brief attempt to recreate it.

It looks to like a multivariate time plot where the area above the lines is filled. My only thought is to use a mosaic chart (as in this post on the Learning R blog), but this was the best I could do with a little bit of effort.  I think that using geom_ribbon would be better but I couldn't get the colors to work.

Here's the code. Is there an easier way to do this? How can I make the axes more like the original? What about the white lines between boxes and the gradual change between years? The sort order is also different.

    
    library(XML)
    library(plyr)
    library(ggplot2)
    theurl <- "http://en.wikipedia.org/wiki/List_of_Crayola_crayon_colors"
    tables <- readHTMLTable(theurl)
    n.rows <- unlist(lapply(tables, function(t) dim(t)[1]))
    crayola <- tables[[which.max(n.rows)]]
    x <- crayola[,c("Hex Code", "Issued", "Retired")]
    colnames(x) <- c("color", "issued", "retired")
    for (i in 1:ncol(x)) x[, i] <- type.convert(as.character(x[, i]))
    x[is.na(x[,"retired"]), "retired"] <- 2010
    x$color <- as.character(x$color)

    years <- min(x$issued):max(x$retired, na.rm=T)
    x2 <- na.omit(ldply(years, function(yr, x) {
      idx <- x$issued <= yr & x$retired >= yr
      x2 <- data.frame(year=yr, color=x[idx,"color"], size=(1/length(which(idx))))
      x2 <- x2[order(x2$color, decreasing=TRUE),]
      x2[,"xmin"] <- rep(0, nrow(x2))
      x2[,"xmax"] <- rep(1, nrow(x2))
      x2[-1,"xmin"] <- cumsum(x2$size[-1])
      x2[-nrow(x2),"xmax"] <- cumsum(x2$size[-nrow(x2)])
      x2
    }, x=x))

    p <- ggplot(x2, aes(xmin = year, xmax = year+1, ymin = xmin, ymax = xmax, fill=color))
    p <- p + theme_bw() + opts(legend.position = "none", panel.grid.major = theme_line(colour = NA),
                panel.grid.minor = theme_line(colour = NA))
    p.rect <- p + geom_rect() + scale_fill_identity()
    p.rect



Further improvements

Well, the R community never ceases to amaze. I posted this and within hours a vastly improved version was created by the Learning R blog (with some help from Baptiste on the color sorting). All the code is posted on that site. A suggestion was also made by Tobias to smooth the image with Cairo. Great work!

One crucial difference in his version (besides the vastly cleaner code) is his use of geom_area instead of the geom_rect in my version. That also allows you to set a white border above the image.

I would go so far as to say that (with the exception of things like better fonts and other touch ups) this R version is actually better than the original because it is more accurate. As I said previously, there were no color changes early in the timeline, despite that implication in the original chart.

Be Sociable, Share!

4 thoughts on “Mosaic time series in R

Leave a Reply

%d bloggers like this: