User Tools

Site Tools


Preparing your data

For many users, the data format is an initial stumbling block on the way to success in processing their own data. This usually happens when a user misunderstands some of the details of the data format. If you read the material carefully, and examine the sample data files that are linked below, you will be up and running quickly.

Generally, GMDH Shell has two ways to read data:

  • CSV/XLS/XLSX connection.
  • ODBC/OLEDB connection.

CSV/XLS/XLSX connection

CSV/XLS/XLSX connection is the most frequently used way to import data into GMDH Shell.

Let us describe data requirements, general data file format and data file types supported by this connection. GMDH Shell works with historical time series, i.e. sequences of values of a variable at some time interval. Time series are usually expressed in terms of the calendar, and can be daily, weekly, monthly or quarterly. GMDH Shell can work with any type of time series.

Data length

GMDH Shell works by fitting a statistical model to your data and extrapolating it via the fitted model. Thus, your data must be long enough to provide reasonably stable estimates of the most important features of the data especially in time series forecasting. Very short or very noisy historic records could yield very simple models because the data are too short to support statistical estimates of important features like seasonality.

For more than four points but less than two years worth of data, the software can fit and forecast trends but not seasonality. If your data are in fact nonseasonal, your forecasts are likely to be adequate. Otherwise, the forecasts are likely to be poor – GMDH Shell cannot extract or forecast the seasonality and part of the seasonality may be mistaken for trending.

Seasonality can be estimated and forecasted from two to three years of data, but this amount of data is marginal, especially when your data are noisy or intermittent. In some cases, patterns in the noise are mistaken for seasonality, yielding inappropriate “seasonal” forecasts.

The Robust capture of seasonality requires three or more years of data. Four to seven years is even better, since there is more information from which the program can separate seasonality and trend from the noise.

Missing values

Each of the statistical forecasting methods requires an unbroken stream of historic values. Unfortunately, however, missing values are common in business data. You therefore need a well-formulated approach to missing data.

There are two parts of the approach. You must first decide how to mark missing values in the input data file. Then you must decide how these missing values are to be treated by the software.

GMDH Shell has predefined set of missing value marks (e.g. ‘N/A’, ‘NULL’) in the Data connection dialog where you can select one of them. You can also type-in your own mark. Moreover, the program has the option ‘consider text cells as missing’, which helps to omit text data when reading. Once the data have been read, missing values are depicted as blank cells in the Data Explorer tab.

GMDH Shell utilizes several approaches of how to treat missing data: replace them with mean or most frequently used value, or interpolate and so on. You can choose one of them, which will be applied to all the blank cells in the Data Explorer tab.

Header information

Header information describes the meaning of time series textually and consists of a set of stings or labels. It is recommended to provide input data table with the header information, but it is not obligatory. If there is no such information, GMDH Shell automatically assigns labels ‘x1’, ‘x2’, and so on to the variables imported. We strongly recommend you to add date/time variables (e.g. year, month) and their labels to the input data table, if you deal with time series forecasting.

GMDH Shell can also correctly (in historical order) read retrospective time series.

Data format

GMDH Shell reads column as well as row format. In the column format each variable occupies a single column in the data table. In the row format – a single row. Example of the data table in column format is shown on the figure below.

Input File types

GMDH Shell supports two file types: text file and Excel file (*.xls, *.xlsx). A special text file format *.csv that can be opened and created in Excel is also supported.

Text files

To read text file, GMDH Shell automatically recognizes delimiter used to separate values in the file. If the automatic option fails, you can explicitly type-in your delimiter. Tab-delimited text file in column format is shown below.

You can download it to see directly.

If the delimiter is used in label string, this string should be quoted to be read as one label. If the delimiter is space symbol, then several consecutive spaces are considered as one delimiter. Tab character is also considered as the delimiter in this case. However, if tab character is the delimiter, then the space symbol is not treated as a delimiter, it's just a space.

Excel files

GMDH Shell can load data tables from Excel files *.csv, *.xls, *.xsls. The data are assumed to reside in the topmost (first) spreadsheet in the workbook. GMDH Shell grabs data table from Excel spreadsheet starting from the first filled left top corner cell ending with the last filled right bottom corner cell. If the grabbed data table has blank cells in the Excel spreadsheet, they automatically recognized as having missing data in GMDH Shell.

The program can also read data from all sheets of the workbook simultaneously, if they have the same format in every sheet.

You can download an xlsx file to see directly.

ODBC/OLEDB connection

ODBC provides direct data communications between GMDH Shell and a wide variety of databases. This is accomplished through intermediary ODBC drivers that lie between GMDH Shell and your database. You must obtain the driver from the database manufacturer or a third party and install it according to the directions provided.

ODBC drivers are available for many database products including Access, Oracle, DB2, SQL Server, FoxPro, Paradox and Btrieve.

GMDH Shell reads data from structured tables or views defined in the database. To use this connection you should know how to write queries using SQL.

setting_data.txt · Last modified: 2017/06/02 09:31 (external edit)

Page Tools