Sunday, 15 April 2012

ggplot2 Time Series Heatmaps

How do you easily get beautiful calendar heatmaps of time series in ggplot2? E.g:
From MarginTale
I was impressed by the lattice-based  implementation from Paul Bleicher of Humedica, which you can find referenced in Then, when other blogs like picked up the topic, I decided to try a ggplot2 implementation. In a comment to the above Revolution Analytics post, Hadley already presented a quick ggplot rendition, upon which I build here.

How do you attack the problem? Looking at the example output above:
  1. We facet_grid by "months" and "years" 
  2. The data itself is plotted by "week of month" and "day of week" and coloured according to the value of interest
So, given a time series we just have to fiddle with time indexes to create a data.frame containing the time series as well as per observation the corresponding "month", "year", "week of month", "day of week". The rest is then a one-liner of code with Hadley's wonderful ggplot2 system.

The following code contains step by step comments:

It should be easy to wrap into a function and I hope its useful.

Sunday, 4 March 2012

Boxplots and Day of Week Effects


After following some R-related quant finance blogs like Timely PortfolioSystematic Investor or Quantitative thoughts-  to name some of my favourites - I decided to start my own. I'll first focus on R snippets which come in handy, and will potentially expand to quant trading and backtesting as time allows.

I'll start with a simple graphical boxplot analysis of "days of the week effects" with two R snippet/tidbits regarding:
  1. How do you adapt the ggplot2 plotting of boxplots to a mundane 50%-box 95%-line 5%-dots view?
  2. How do you subdivide your days in weekdays easily and robustly? 

Lets jump directly into the code which can be downloaded at

Running the code, we get following output:
From MarginTale

These boxplots now show 50% of the observations in the box, the vertical lines cover 95% and the dots 2.5%. I find this easier to communicate than the standard definition. This is implemented in the functions myBoxPlotSummary and myBoxPlotOutliers which are in turn called from stat_summary in ggplot.

A second issue I tripped over is the sorting of days in the above boxplot. If one uses the obvious way and just defines a factor as "weekdays(index(...))" then the plot function will alphabetically sort the days - not exactly what you want. If you then try to order the factors, your solution will depend on how locale (the language you use) specifies the abbreviations of the weekdays. A robust solution shown  in the code is to use the function .indexwday from the package xts.