Solver

Solver module produces predictive models for target variables.

Reorder rows

Reorder rows is used to achieve uniform statistical characteristics of training and testing samples and to makes them equally informative. This technique works well for all problem types including time series models.

Options	Description
`No`	Reordering is turned-off
`Odd/even`	Places all even instances after odd instances. Example: `1,2,3,4,5,6` → `1,3,5,2,4,6`
`Asc.+ Odd/even`	Ascending sorting of observations (by target values) prior to Odd/even reordering.
`Desc.+ Odd/even`	Descending sorting of observations (by target values) prior to Odd/even reordering.

Validation strategy

Validation strategy is used to select a method for model validation and sorting out.

Options	Description
`Training/testing`	Splits dataset into two parts, uses the training part to find model coefficients and uses the testing part to compare and select a set of the best models.
`Whole data testing`	Splits dataset, trains model using the training part, but uses both parts for testing
`k-fold validation`	Splits dataset onto k parts, trains a model k times using k-1 parts, each time measuring model performance using a new remaining part. Finally residuals obtained from all testing parts are added and used for model comparison.
`k-fold training`	In contrast to the k-fold validation, k-fold training uses k-1 parts to test the model and only 1 part to estimate coefficients. In other aspects it's similar to the k-fold validation. This is rather an extreme method that is used to stop over-fitting when other methods do not work.
`Leave-one-out CV`	This is a k-fold cross-validation with the number of folds equal to the number of observations in the dataset

Train/test ratio is used to split the dataset into training and testing part. Percentage or exact quantities are used. For example, if we would like to split a dataset containing 200 observations using the ratio 4:1, we can apply percentage 80:20 or exact number of observations 160:40.

Validation criterion

Defines model selection criterion for both the core algorithm and variables ranking.

Options	Description
`RMSE`	select models with the lowest RMSE calculated for the testing sample.
`MAE`	select models with the lowest MAE calculated for the testing sample.
`RMSE⋅balance`	RMSE criterion penalized by the difference between training and testing sample RMSE.

Variables ranking

Variables ranking turns on preliminary ranking and reduction of variables.

Options	Description
`No`	Turned-off
`by error (independent)`	Ranking of variables according to their individual ability to predict testing data
`by usage (combinatorial)`	Ranking of variables according to their importance for Combinatorial Core algorithm with limited complexity (equal to 2). Importance is calculated as the number of times the variables appear in the set of best models.

Drop variables after rank n

Reduces the number of variables to n i.e. keeps n most important variables according to the selected ranking algorithm. Preliminary reduction of variables may reduce the quality of models, but it is definitely useful for quicker processing of high-dimensional datasets.

Core algorithm

You can select one of the available statistical learning algorithms. The description of algorithms implemented in GMDH Shell can be found in Learning algorithms

Options	Description
`Combinatorial`	Best subsets regression, combinatorial search
`Neural-type`	Polynomial neural networks of GMDH-type.

Combinatorial

Limit Complexity to n

Any particular model may consist of not more than n terms.

Additional variables

Expands dataset with the new artificial features. Higher-dimensional space frequently helps to improve the Classification and Regression models. Be careful with expanding of more than 20 initial variables because the number of all possible pairs grows fast.

Options	Description
`No`	No additional variables except constant term.
`x_i·x_j`	Adds all possible multiplied pairs.
`x_i·x_j, x_j²`	Adds all possible multiplied pairs and squares.
`x_i·x_j, x_i/x_j`	Adds all possible multiplied and divided pairs. Skips pairs that cause dividing by zero.
`Custom`	Uses terms of custom polynomial function as new variables, see Custom polynomial

Be aware of quick growing of memory and time consumption.

The number of initial variables	Resulting number of variables
The number of initial variables	`No`	`x_i·x_j`	`x_i·x_j, x_i²`	`x_i·x_j, x_i/x_j`
2	3	4	6	5
3	4	7	10	10
5	6	16	21	26
10	11	56	66	101
20	21	211	231	401
50	51	1276	1326	2501
100	101	5051	5151	10001
200	201	20101	20301	40001
500	501	125251	125751	250001

Neural-type

Neuron inputs

The number of input variables allowed for a neuron. It is quite efficient to use two inputs for any neuron. Otherwise the computational task may become too complex.

Neuron function

Sets the type of the internal function for neurons. The neurons are active, i.e. each neuron can drop some of the function terms in order to increase overall predictive power of the model.

Options	Description
`a₀ + a₁·x_i + a₂·x_j`	Linear
`a₀ + a₁·x_i + a₂·x_j + a₃·x_i·x_j`	Polynomial
`a₀ + a₁·x_i + a₂·x_j + a₃·x_i·x_j + a₄·x_i² + a₅·x_j²`	Quadratic polynomial
`Custom`	Uses custom polynomial function defined by user, see Custom polynomial

Max. number of layers

Sets the upper limit for the number of network layers created by the algorithm.

Initial layer width

Initial layer width defines how many neurons are added to the set of inputs at each new layer.

Set parallel threads manually

When turned-on, this option allows manual control of the number of parallel processing threads. When turned-off, the number of threads is equal to the number of logical processors i.e. processor cores or hyper-threading cores in your PC.

Custom polynomial

You can configure a Custom polynomial function to be used for generation of Additional variables or as a Neuron function. When the 'Custom' option is selected in the corresponding list of options, a dialog window called Custom polynomial is shown.

Max. power of a variable

Sets the upper limit for power of any variable in a polynomial term.

Min. power of a variable

Sets the lower limit for power of any variable in a polynomial term.

For example, if Max. power is 3 and Min. power is -2, then the following terms are included to the custom polynomial: x₁³, x₁², x₁, 1 (constant term), 1/x₁, 1/x₁². If Min. power is 1 or higher then resulting polynomial will not include a constant term.

Max. total power in a term

Sets a limit for sum of absolute powers of all variables in a polynomial term.

For example, if Max total power is 3 then the following terms can be included: x₁*x₂², x₁*x₂^-2, x₁*x₂*x₃, …

Max. number of variables in a term

Sets the maximum number of variables in any polynomial term. For example, if Max. number of variables is 3, then the following terms can be included: x₁*x₂*x₃, (x₁²)*(x₂⁴)*(x₃^-1) …

Here are some configuration examples for two input variables:

Max. power = 2, Min. power = 0, Max total power = 2, Max number in a term = 2 results in:

y(x₁,x₂) = a₀ + a₁*x₁ + a₂*x₂ + a₃*x₁*x₂ + a₄*x₁² + a₅*x₂²

Max. power = 4, Min. power = 0, Max total power = 4, Max number in a term = 1 results in:

y(x₁,x₂) = a₀ + a₁*x₁ + a₂*x₁² + a₃*x₁³ + a₄*x₁⁴ + a₅*x₂ + a₆*x₂² + a₇*x₂³ + a₈*x₂⁴

GMDH Shell Documentation

Sidebar

External links

General topics

Reference

Table of Contents

Solver

Reorder rows

Validation strategy

Validation criterion

Variables ranking

Drop variables after rank n

Core algorithm

Combinatorial

Limit Complexity to n

Additional variables

Neural-type

Neuron inputs

Neuron function

Max. number of layers

Initial layer width

Set parallel threads manually

Custom polynomial

Max. power of a variable

Min. power of a variable

Max. total power in a term

Max. number of variables in a term

GMDH Shell Documentation

User Tools

Site Tools

Sidebar

External links

General topics

Reference

Table of Contents

Solver

Reorder rows

Validation strategy

Validation criterion

Variables ranking

Drop variables after rank n

Core algorithm

Combinatorial

Limit Complexity to n

Additional variables

Neural-type

Neuron inputs

Neuron function

Max. number of layers

Initial layer width

Set parallel threads manually

Custom polynomial

Max. power of a variable

Min. power of a variable

Max. total power in a term

Max. number of variables in a term

Page Tools