User Tools

Site Tools


workflow

Basic Forecasting Work-flow

Getting forecast in GMDH Shell BF consist of the following steps:

  1. Preparing the data
  2. Connecting the data source
  3. Selecting the forecasting template
  4. Setting the forecast horizon
  5. Setting the post-process operations
  6. Using and evaluating the forecast

1. Preparing the data

This is performed outside of the program and entails selecting the type of data format to use and preparing the files. How to prepare data for the CSV/XLS/XLSX connection is explained in Preparing your data section. Here we will focus on preparation of input data file for the Order list connection.

Order list connection

This data format was specially designed for reading Excel file having a list of orders. For example, you have daily sales data stored in an Excel file, and you want to get monthly forecast for one year ahead. GMDH Shell BF does it via a special Order list importer, which can aggregate data by given period (day, month or year). The Input data table is required to have column format and include at least Date, Item code and Order quantity columns. Optionally quantity On hands can be set. Example of this format is shown below.


Header of the columns in the file won’t be the same as strings mentioned above. You can set which one GMDH Shell should consider as Date or as Item code.

You can download it to see directly.

2. Connecting the data source

This step includes specifying the connection type and setting its options. The connection type depends on the file format which is connected to the project. Besides reading the file, the connection creates a reference from the project to the data file which is used to automatically reload the data when you open the project. So, even if the file has been changed outside GMDH Shell BF, the program always keeps the project in actual state.

GMDH Shell BF has four connection types:

  • CSV/XLS/XLSX connection. Used to read data tables from text files, including *.csv format, and read *.xls, *.xlsx Excel file formats.
  • ODBC/OLEDB connection. Used to load data tables from databases.
  • Sage Accounting inventory. Connection designed to read data form Sage Simply Accounting software database.
  • Order list connection. Connection to specially prepared Excel file comprising set of orders. Connection importer allows to aggregate data by time period when reading the data.

3. Selecting the forecasting template

Forecasting template is a set of GMDH Shell BF options that were chosen to solve specific task. It includes necessary data transformations, forecasting methodology settings and settings for other optimization procedures. GMDH Shell BF has two predefined templates. Basically, each of them stores the best practice to forecast various data patterns.

4. Setting the forecast horizon

Forecast horizon defines the number of observations, you want to forecast in advance. Will forecast be required for one month ahead, for 6 months, or for ten years? Keep in mind here: the longer forecast horizon, the lower forecast accuracy.

5. Setting the post-process operations

Often you forecast things that run into integers. For example, cars, bottles, computers and so on. It would be reasonable to round forecasts to the nearest integers and make the forecasts nonnegative.

6. Using and evaluating the forecast

Once a model has been built, it is used to make forecasts. The performance of the model can only be properly evaluated after the data for the forecast period have become available. There are two basic types of forecast errors: learning error and testing error. Learning error is an error that is measured over the historical data. So, it is really, how well do you fit the history. Testing error is measuring forecast performance against what actually happens. There are two ways to do that:

  • Holdout approach. Here we don’t use all the data available to build the model. We generate a forecast and evaluate that forecast based on the part of the data that we did not use to build the model.
  • “Wait and see” approach. This one is time tracking. When the forecast has been generated it is retained in the database. And, for example, next month, when the actual sales are known, we can compare the previous forecast to what actually happens. This is done on an ongoing basis.

Each error type described above can have two forms:

  • Percentage-based error. The difference between model value and actual one is calculated as a percentage. So, the result, for example, may sound like: “The forecast is off by 50%”.
  • Unit-based error. This error answers the question: “How many units the forecast is off by?”. So, error measurement is expressed in terms of units.

There are many statistic errors (see Table below) but let us concentrate on the two most common:

  • Mean absolute percentage error (MAPE). Tells you the average error size expressed as a percentage. In other words, it’s answer on the question: “What is the mean percent of deviation between the actual value and the forecast?”
  • Mean absolute error (MAE). Tells you the average error size expressed in terms of units. For example, the MAE = 1000 means that the average forecast is off by one thousand.
Error form Error type
Mean Root mean square
Absolute Mean absolute error (MAE) Root mean square error (RMSE)
Percentage Range Normalized mean absolute error (NMAE) Normalized root mean square error (NRMSE)
Target Mean absolute percentage error (MAPE) Root mean square percentage error (RMSPE)

There are some pros and cons in the difference between MAPE and MAE.

The advantage of using the MAPE it that it is easy to interpret. If somebody tells you that we are off by 20%, we don’t need much to know about the data. People are used to working in percentages. The downside of the MAPE is that it is scale-sensitive. If you have low-volume data or zero demand periods in the history, the MAPE is pretty much worthless and should not be used. Low-volume data will have very high MAPEs.

The MAE is pretty good statistics, but it is tied to the volume and cannot be used to make comparisons between series that are on different scales.

workflow.txt · Last modified: 2017/06/02 09:31 (external edit)

Page Tools