Introduction
The aim of this project is to forecast the daily lettuce demand for each of the four given restaurants of a fast-food restaurant chain from 06/16/2015 to 06/29/2015. The provided dataset consists of transaction information such as purchases of specific menu items, their data and quantity, ingredients lists and needed recipes for each menu items, as well as metadata on the four restaurants, including location and store type. A database diagram can be summarised as following:
After combining the above presented information based on the tables’ relationships, a final dataset of the daily lettuce demand for each of the four restaurant (in the corresponding time horizon that data is available) is created as following:
Analysis Outline
The outline of the methodology that is followed in order to forecast the lettuce demand of each of the four restaurants, can be summarised as following:
a. Time Series Object
After creating the daily lettuce demand for each of the four restaurants, the time-series object of each restaurant is created by making use of the ts() function, with a weekly frequency (frequency = 7), since daily data is provided. For the restaurants in NY (20974, 4904), the time-series object will have a starting day as the first one after omitting the initial days where sequential missing data occur. The reason for omitting the missing data at the very beginning of the time-series is the biased estimation as well as the failure of improving the forecast power. A plot of the time-series lettuce demand of each restaurant per day will be provided.
b. Split time series into Training and Test Set
After creating the time-series object for each store, the latter is split into a training and test set of demand with proportions 80% and 20% respectively. The training set will be used to train each model and the test set to validate the out-of-sample performance of each model that will lead to an accuracy of the forecast power. Therefore, for each model both the in-sample (how well fitted is the model to the provided data) and the out-of-sample (how accurate is the corresponding forecast) performance will be measured in order to choose the model with the best accuracy and forecast power. At this point it is mentioned that each model’s fitting parameters (trend, seasonal-non-seasonal, additive or multiplicative) are first identified from the initial time-series in order to select the appropriate type of model, since a limited amount of data is provided. Even though the model characteristics should theoretically be found based on the training set (when significant amount of time-series data is provided), the given time series corresponds to a significantly limited amount of data and therefore “cutting” off 20% of the time series, will result in different model’s characteristics (i.e. in a model with a linear trend which will not be the actual case if someone examines the whole time series). Therefore, since only the model’s characteristics are identified by the initial time-series and all the proposed models are built on the training set, estimations of the in-sample and out-of-sample performance (on the test set) are not biased. In any event, it is preferred to identify the actual components of a model to fit the data and introduce some bias, rather than to forecast based on a model that does not underline the key needed components of a time-series object.
c. Seasonal Decomposition of Time Series by Loess
Each newly created time-series object is seasonally decomposed by using the function stl() proposed by Loess in order to obtain the trend and seasonal components. That is, the identification of the needed components of possible models, that will be generated at a latter stage to fit the times series data, can be projected.
d. Holt Winters Exponential Smoothing method – Exponential Smoothing State Space Model (ETS)
After observing the needed factors of the systematic component of the demand of each store, the Holt-Winters Exponential Smoothing method is applied in order to create a predictive model. By making use of the HoltWinters() function, the optimal smoothing factors (alpha, beta and gamma) of the level, trend and seasonal factors respectively are computed by minimizing the sum of the square forecast errors of the model. In addition, an Exponential Smoothing State Model is created by using the ets() function, which simultaneously estimates the smoothing factors (alpha, beta and gamma) as well as the initial states of the factors by maximizing the likelihood function.
e. In-sample performance comparison of Holt-Winters – ETS. Selection of best model
The models that were created based on the Exponential Smoothing from Holt-Winters and ETS methods, are then compared between each other based on their in-sample performance at the training set. More specifically, some measures of accuracy are computed, such as the Root-Mean-Square-Error (RMSE), underlining how well each model fits the initial time-series.
f. Evaluation of out-of-sample performance
The models that were proven to have the best in-sample performance (HoltWinters and ETS) are then compared to find the one that has the best forecasting power. Since the accuracy of forecasts can only be determined by considering how well a model performs on new data that were not used when fitting the model, all models are again trained on the training set and their out-of-sample (test set) forecast accuracy is measured based on the actual values of the test set. This way potential problems of over-fitting will be avoided as well. Finally, the model with the best performance is chosen as a final candidate for the selection of final forecast (will be compared to the corresponding best ARIMA model at a latter stage).
g. Forecast errors evaluation
After examining the best model so far, the latter is investigated for further improvement. If the predictive model cannot be improved upon, there should be no correlations between forecast errors in the time-series. More specifically, the forecasted errors are examined for autocorrelations by ACF plots of residuals as well as by performing the Ljung–Box “portmanteau” test. The Ljung–Box Test examines the null hypothesis of an independent time series into an alternative of an exhibition of serial correlation in the time series. In addition, the forecast errors are tested on their normal distribution with a zero mean and a constant variable. The forecast errors’ constant variance is tested by a time plot of the in-sample forecast errors, whereas their normal distribution with zero mean by a histogram plot of the forecast errors, with an overlaid normal curve that has mean zero and the same standard deviation as the distribution of forecast errors.
h. ARIMA models – In-and-out-of-sample performance comparison. Selection of best model
The Box-Jenkins procedure is followed in order to fit Autoregressive Integrated Moving Average (ARIMA) models in the time-series of each restaurant. ARIMA models include an explicit statistical model for the irregular component of a time series that allows for non-zero autocorrelations in the irregular component, in comparison to the previous examined Exponential Smoothing models. The stages that were followed in creating the best ARIMA models can be summarised as following:
- Identification:
The time-series object of each restaurant is required to be stationary in order to create an ARIMA model, which means that its properties do not depend on the time at which the series is observed. The stationarity of each time series is first recognised from the ACF plot and then tested by performing the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) and Augmented Dickey-Fuller (ADF) tests. If needed, the time series is either rescaled, by taking its logarithm form, or differenced by the examined required order, until it exhibits stationarity. - Estimation:
As soon as a stationary time-series is achieved, the process of estimating the best possible candidate models to fit the time-series takes place. More specifically, for a non-seasonal time-series ARMA(p,d,q) models are created with the values of the orders of the Moving Average (q) and Autoregressive (p) components examined by the ACF and PACF plots respectively. For a seasonal ARIMA model (p,d,q)(P,D,Q)[periodicity], the seasonal part is taken into account and observed by the seasonal lags of the plots. By making use of the observed orders of each component, in addition to the auto.Arima() function that generates the best possible model, different ARIMA models are estimated by maximising the likelihood function, and compared between each other regarding their in-sample performance measured by the RMSE and BIC. The best ARIMA models are then compared on their out-of-sample performance, following the Training-Test set approach as mentioned before. Following the same procedure with the exponential smoothing models, all models are again trained on the Training set and their out-of-sample (Test set) forecast accuracy is measured based on the actual values of the test set. The model with the best performance is chosen as a final candidate for the final forecast. - Verification:
A residual analysis is implemented in order to test the randomness and the existence of a non-zero mean and non-constant variance by plotting the residuals, their ACF and PACF, as well as performing the Ljung–Box Test. In addition, the procedure of the forecast errors is followed at this stage as well.
i. Selection of final model – Forecast of lettuce demand
The best model in terms of sample performance and forecast power of the best candidates of Exponential Smoothing and ARIMA models is selected. Then, it is fitted (trained) on the initial time-series of the restaurant’s lettuce demand (including both the training and the test set) to enhance the power and accuracy of the final forecast. The final forecast of lettuce demand for each store from 06/16/2015 to 06/29/2015 is computed.