Table of Contents

Templates


Time series forecasting template

The general template for time series forecasting is appropriate for multivariate and univariate time series analysis and forecasting.

First of all you should select Input and Target variables in your data and if necessary transform them.

This template performs the following actions:

  1. Holds out 1 observation to illustrate the performance of the model.
  2. Sets the interval of forecasting to 1 step ahead. For example, if you analyze monthly data, the forecasting interval is 1 month.
  3. Sets the data window that is used for model creation to 30 latest observations.
  4. Shuffles dataset rows using two bins (odd/even reordering).
  5. Uses 2-fold validation to select the best models.
  6. Uses RMSE as the error measure during validation
  7. Employs Neural-type generator of candidate models.
  8. Uses linear GMDH neurons with 2 inputs and limits network depth to 2 layers.
  9. Keeps 300 top-ranked models (200 single-layer, 100 double-layer).

You should find appropriate lengths of learning window (learning frame) and make sure that it allows you to forecast hold-out data perfectly. Then it is recommended to apply your model to new data rather than to turn-off hold-out data reservation and obtain a new model that predicts unseen data.

Classification template

The classification template solves both the multi-class and two-class problems.

To classify data with more than two class labels it uses the one-vs-all approach. In case of a multi-class problem with numerical labels the one-vs-all approach will not be applied. If you want to use it with numerical target variable you should transform the target using the decomposition operator available in the Data manager > Transformations > Miscellaneous > Decompose.

By default all columns are used as model Inputs except the last column that serves as prediction target.

This template performs following actions:

  1. Shuffles the dataset using the data reordering by odd/even row.
  2. Uses 2-fold validation for selection of best models.
  3. Uses the misclassification rate as the error measure during validation.
  4. Employs Neural-type generator of candidate models.
  5. Uses quadratic GMDH neurons with 2 inputs and limits network depth to 5 layers.

Regression template

The general regression template is appropriate for finding multivariate dependencies in numerical data. It suits for such tasks as finding functions, ranking variables, prediction of outcome for a set of input values.

First of all you should select Input and Target variables in your data and transform them if necessary.

This template performs the following actions:

  1. Holds out 10 observations to illustrate the performance of the model.
  2. Shuffles dataset rows using two bins (odd/even reordering).
  3. Uses 2-fold validation to select the best models.
  4. Uses RMSE as the error measure during validation.
  5. Employs Neural-type generator of candidate models.
  6. Uses linear GMDH neurons with 2 inputs and limits network depth to 2 layers.
  7. Keeps 300 top-ranked models (200 single-layer, 100 double-layer).

The template is optimized for speed rather than for prediction accuracy.

Curve fitting template

This template performs polynomial curve fitting with automatic selection of polynomial function.

First of all you should select curves in your data and move them to Target variables. Leave the Input variables area blank, in this case the preprocessor module will automatically use a time counter as the input variable.

This template performs the following actions:

  1. Generates 21 univariate polynomial terms from x10 to x-10.
  2. Shuffles dataset rows using two bins (odd/even reordering).
  3. Uses leave-one-out cross-validation to select the best models.
  4. Uses RMSE as the error measure during validation.
  5. Employs combinatorial generation of all possible candidate models as subsets of generated polynomial terms.
  6. Returns 20 most optimal fitting functions in terms of balance between the complexity of models and their fitting accuracy.