What are the best practices for dealing with shifting, inconsistent seasonality when making daily forecasts?
-
This question is related to a post I've looked at at CrossValidated (http://stats.stackexchange.com/questions/3249/calculation-of-seasonality-indexes-for-complex-seasonality), but deals with more granular data (daily instead of weekly), and transforming holiday seasons (instead of just holiday) creating a larger question about the reasoning behind dummy variables. As an introduction, I'm looking to forecast daily retail sales for a highly cyclical e-commerce company. Each year, buying patterns shift for several reasons: Calendar shift (where, for example, 5/17/13 is a Friday, while 5/17/12 is a Thursday); Shifting of holidays set by the day of the week (i.e. Labor Day, Thanksgiving, Mother's Day); Shifts in the time and length of shopping seasons (i.e. in 2012, there were 32 calendar days, 21 workdays, and the season started on Nov. 28 in the holiday shopping season, while this year, there are 26 calendar days, 17 workdays, and the season starts on Nov. 29). These changes affect the overall shape and size of the seasonal curve, and not just the behavior at or near the holiday. Among the retail forecasters I've encountered, there's a few options for dealing with forecasting shifting seasonality: One is to only use exact calendar equivalent years to parcel out seasonality -- for example, 2013 exactly matched 2002 in terms of the placing of holidays. Then, you apply a 365-day ARIMA on the data. But, in e-commerce (and overall) a lot has happened since 2002 that affects how seasonality looks. The second is exponential smoothing with dummy variables. But, as consumer behavior doesn't just change on the holiday (or the days approaching it), but relates to how far the holiday is in relation to other shifting dates (and the day of the week), you hit the problems of either creating a boatload of dummy variables for the distance from and to each holiday that can over-specify your model (and not make it easily generalizable to the year you want to forecast), or get a "general" seasonality that usually under-preforms. You could also do a fractional polynomial to approximate the curve by days until (or between) one holiday or another, but this creates the usual problems with fracpolys -- notably over-fitting and wildly strange results at certain points in and out of the sample. The third option I've seen is to adjust the data from previous years to match the size of each holiday season by, in the case when the holiday shopping season is a week shorter than previous years, taking out a week and then adding back that revenue across the remaining season, or doing the reverse when the season is a week longer. After that's done, you can do a seasonal decomposition to parcel out the seasonality. The issue with this is that it involves a ton of subjectivity, and assumes that shoppers will evenly spread the lost week, rather than "back load" their behavior as the holiday more quickly approaches. Fixing the "back load" would then create even more subjectivity of when shoppers will start responding to the proximity to the holiday (such as Christmas). So, I'm wondering the experience of other forecasters (especially those with retail experience) with this issue of shifting seasonality by day. Looking forward to your thoughts. I can also add some dummy data to the original CrossValidated posting (at http://stats.stackexchange.com/questions/59309/best-practices-for-dealing-with-shifting-inconsistent-seasonality) if you want to see exactly what I'm talking about if you'd like.
-
Answer:
My hunch is you may be forcing the forecasting toolset (exponential smoothing and ARIMA) on what may be better handled as a predictive modelling problem. I say this because what you are describing sounds very similar to the bike sharing Kaggle competition I am working on. As far as I know there are still debates about the difference between 'prediction' and 'forecasting', and this one is tricky because you are putting a time series in front of me, but at the same time it also sounds like a lot of drivers that would affect the outcome may not be adequately captured using the forecasting paradigm. I would remodel the data (what day of the week it is, is the day you are looking to forecast/predict part of the holiday season, etc), throw a glm at it and see how it does on a validation set. You can also capture seasonality and time series to a large extent by adding lagging variables to the model. I wish you well and will be following this answer to learn as well.
Jason T Widjaja at Quora Visit the source
Related Q & A:
- What are some best practices to follow when designing for users completely unfamiliar with computers?Best solution by User Experience
- What is the best camera to get if I want to start making youtube videos?Best solution by Yahoo! Answers
- What is the best music making software?Best solution by Yahoo! Answers
- What is the best camcorder for making short films?Best solution by Yahoo! Answers
- What are the best survey sites for making money?Best solution by Yahoo! Answers
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.