But can ravens forecast?

Why forecast sales?

Humans have the magical ability to plan for future events, for future gain. It’s not quite a uniquely human trait. Because apparently ravens can match a 4-year-old.

An abundance of data, and some very nice R packages, make our ability to plan all the more powerful.

A couple of months ago we looked at sales from an historical perspective in Digital Marketplace. Six months later. In this post, we’ll use the sales data to March 31st to model a time-series forecast for the next two years. The techniques apply to any time series with characteristics of trend, seasonality or longer-term cycles.

Why forecast sales? Business plans require a budget, e.g. for resources, marketing and office space. A good projection of revenue provides the foundation for the budget. And, for an established business, with historical data, time-series forecasting is one way to deliver a robust projection.

The forecast assumes one continues to do what one’s doing. So, it provides a good starting-point. Then one might, for example, add assumptions about new products or services.

The power of iteration

Businesses typically deal with many product / service lines. So the ability to iteratively forecast multiple time series is very powerful.

We’ll deal first with each government framework: G-Cloud (cloud services), and DOS (Digital Outcomes & Specialists). Then we’ll iterate through G-Cloud’s lot structure: Cloud Hosting, Cloud Software and Cloud Support. Only three child levels, but the principle is easily scaled up.

Cleaning data

G-Cloud suppliers are contractually-obliged under the government’s framework to report monthly on their buyer invoicing. So, if some months were missed, then there would be a one-time catch-up in a later month. This could result in the odd outlier in the Digital Marketplace sales data. However, as revealed in the more detailed analysis (with code), none was discovered.

Seasonal decomposition

By decomposing the historical data we can tease out the underlying trend and seasonality:

    • Trend:  G-Cloud sales have grown over time as more suppliers have added their services to the government frameworks. And more Public Sector organizations have found the benefits of purchasing Cloud services this way. Why? Because it’s a faster, simpler, more transparent and competitive contracting vehicle.
    • Seasonality:  Suppliers often manage their sales and financials based on a quarterly cycle. There’s a particular emphasis on a strong close to the financial year (often December 31st for commercial enterprises). And government buyers may want to make optimal use of their budgets at the close of their financial year (March 31st). Consequently, we see quarterly seasonality with an extra spike in March, and a secondary peak in December.

Forecasting sales for each framework

Using AutoRegressive Integrated Moving Average (ARIMA) modelling, we can select from close to 100 models to describe the autocorrelations in the data. Then we can use the generated model to forecast future sales.

In the plot below, we project two years ahead with 80% and 95% prediction intervals. This means the darker-shaded 80% range should include the future sales value with an 80% probability. Likewise with a 95% probability when adding the wider and lighter-shaded area.

The DOS framework (for project-related services) was launched more recently in June 2016. It exhibits different time-series characteristics. Hence a different ARIMA model.

Forecasting sales for the component lots

The G-Cloud framework comprises three lots. There are different ways of forecasting multiple time series. We will do so in one shot, with the best model tailored to each lot. The possible approaches, and code used, are detailed here.

Ravens aren’t yet ready for forecasting with R. But then neither are 4-year-olds, are they?

R toolkit

R packages and functions (excluding base) used in this analysis.

purrrmap[6]; map2_df[1]; possibly[1]; set_names[1]; simplify[1]; some[1]; when[1]
readrguess_encoding[3]; locale[2]; read_csv[2]; parse_number[1]
dplyrmutate[12]; filter[6]; group_by[6]; if_else[4]; summarise[4]; desc[3]; first[3]; select[3]; arrange[1]; as_tibble[1]; between[1]; bind_rows[1]; case_when[1]; collapse[1]; count[1]; data_frame[1]; n[1]; summarize[1]
tibbleas_tibble[1]; data_frame[1]; enframe[1]
stringrstr_c[7]; fixed[2]; str_remove[2]; str_count[1]; str_detect[1]; str_extract[1]; str_replace[1]
rebusor[4]; alpha[1]; literal[1]; whole_word[1]
lubridatemonth[11]; year[4]; ceiling_date[1]; date[1]; days_in_month[1]; myd[1]; parse_date_time[1]; tz[1]; ymd[1]
sweepsw_glance[4]; sw_sweep[1]
tidyrfill[5]; unnest[3]; nest[1]
forecastforecast[14]; auto.arima[8]; autoplot[7]; BoxCox[4]; mstl[2]; ndiffs[2]; nsdiffs[2]; tsclean[2]; BoxCox.lambda[1]; seasonal[1]
ggplot2autoplot[7]; xlab[6]; ylab[6]; aes[5]; theme[5]; labs[4]; element_rect[3]; geom_ribbon[2]; unit[2]; alpha[1]; element_line[1]; element_text[1]; facet_wrap[1]; geom_line[1]; geom_path[1]; geom_text[1]; ggplot[1]; margin[1]; scale_x_date[1]
scalesor[4]; alpha[1]; literal[1]; whole_word[1]
cowplotdraw_label[1]; plot_grid[1]
kableExtrakable[3]; kable_styling[2]
knitrkable[3]; opts_chunk[1]

View the code here.


R Development Core Team (2008). R: A language and environment for
statistical computing. R Foundation for Statistical Computing,
Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.

Hyndman R, Athanasopoulos G, Bergmeir C, Caceres G, Chhay L, O’Hara-Wild M, Petropoulos F, Razbash S, Wang E, Yasmeen F (2018). forecast: Forecasting functions for time series and linear models. R package version 8.4, http://pkg.robjhyndman.com/forecast.

Hyndman RJ, Khandakar Y (2008). “Automatic time series forecasting: the forecast package for R.” Journal of Statistical Software, 26(3), 1–22. http://www.jstatsoft.org/article/view/v027i03.

Contains public sector information licensed under the Open Government Licence v3.0.

2 Replies to “But can ravens forecast?”

  1. This is the first time I’ve run across thinkr. Always love when I find new great sources for modeling in R!

    I wanted to suggest looking into using the case_when() function (from dplyr) in your tidy step in place of the nested if_else() statements to assign your “lot” variable. Just like all other dplyr functions, it allows the code to be more linear.

    Thanks for taking the time to post such helpful and informative information!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.