Computational Statistics, Machine Learning, et. al.

Mosaic time series in R

I really like this chart as featured on (from  Here's my brief attempt to recreate it.

It looks to like a multivariate time plot where the area above the lines is filled. My only thought is to use a mosaic chart (as in this post on the Learning R blog), but this was the best I could do with a little bit of effort.  I think that using geom_ribbon would be better but I couldn't get the colors to work.

Here's the code. Is there an easier way to do this? How can I make the axes more like the original? What about the white lines between boxes and the gradual change between years? The sort order is also different.

    theurl <- ""
    tables <- readHTMLTable(theurl)
    n.rows <- unlist(lapply(tables, function(t) dim(t)[1]))
    crayola <- tables[[which.max(n.rows)]]
    x <- crayola[,c("Hex Code", "Issued", "Retired")]
    colnames(x) <- c("color", "issued", "retired")
    for (i in 1:ncol(x)) x[, i] <- type.convert(as.character(x[, i]))
    x[[,"retired"]), "retired"] <- 2010
    x$color <- as.character(x$color)

    years <- min(x$issued):max(x$retired, na.rm=T)
    x2 <- na.omit(ldply(years, function(yr, x) {
      idx <- x$issued <= yr & x$retired >= yr
      x2 <- data.frame(year=yr, color=x[idx,"color"], size=(1/length(which(idx))))
      x2 <- x2[order(x2$color, decreasing=TRUE),]
      x2[,"xmin"] <- rep(0, nrow(x2))
      x2[,"xmax"] <- rep(1, nrow(x2))
      x2[-1,"xmin"] <- cumsum(x2$size[-1])
      x2[-nrow(x2),"xmax"] <- cumsum(x2$size[-nrow(x2)])
    }, x=x))

    p <- ggplot(x2, aes(xmin = year, xmax = year+1, ymin = xmin, ymax = xmax, fill=color))
    p <- p + theme_bw() + opts(legend.position = "none", panel.grid.major = theme_line(colour = NA),
                panel.grid.minor = theme_line(colour = NA))
    p.rect <- p + geom_rect() + scale_fill_identity()

Further improvements

Well, the R community never ceases to amaze. I posted this and within hours a vastly improved version was created by the Learning R blog (with some help from Baptiste on the color sorting). All the code is posted on that site. A suggestion was also made by Tobias to smooth the image with Cairo. Great work!

One crucial difference in his version (besides the vastly cleaner code) is his use of geom_area instead of the geom_rect in my version. That also allows you to set a white border above the image.

I would go so far as to say that (with the exception of things like better fonts and other touch ups) this R version is actually better than the original because it is more accurate. As I said previously, there were no color changes early in the timeline, despite that implication in the original chart.

4 thoughts on “Mosaic time series in R

  1. I think the big problem is the discontinuities - you need to some how smooth over them. I also suspect you could come up with a better ordering (perhaps using hierarchical clustering + 1d pca)

  2. Thanks for publishing this... unfortunately I get an error on the second last line:

    "Error: When _setting_ aesthetics, they may only take one value. Problems: fill"

    any ideas?

    R 2.8.1 / Ubuntu Jaunty 32-bit

Leave a Reply

Your email address will not be published. Required fields are marked *