User Tools

Site Tools



This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
import [2012/06/12 09:34]
Oleksiy [CSV/XLS/XLSX Import]
import [2017/06/02 05:31] (current)
Line 1: Line 1:
-====== Import ====== 
-The import module reads data from data sources. +====== Data connection ======
-  * CSV/​XLS/​XLSX Import +
-  * ODBC/OLEDB Import +
-  ​+
-==== CSV/​XLS/​XLSX Import ====+**Data connection** module is used to set up connections with files or databases. GMDH Shell accesses data files or databases in the beginning of each forecasting launch.
-The module has support for +There are a number of data connection options: 
-  * Tables that store variables as columns (by default). +  * **CSV/XLS/XLSX connection** is used to connect one or more Excel spreadsheets or text files with delimiters
-  ​Tables that store variables as rows. +  * **ODBC/​OLEDB connection** is used to connect databases ​or third party programs via the ODBC interface
-  ​* CSVXLS and XLSX file formats. +  * **Order list connection** available only in the Business Forecasting package is used to extract historical demand from the raw list of customer orders.  
-  ​Reading table headings from the the first row. +  * **Sage Accounting connection** available only in the Business Forecasting package is used to extract retail sales data from 'Sage 50' and other versions of Sage accounting software.  
-  ​Reading row labels from a column that contains unique Timestamps ​or IDs+===== CSV/​XLS/​XLSX connection =====
-  * Reading row labels composed from two or three separate columns: Year + Month, Year + Week, Year + Quarter, Year + Month + Day+
-  * Detection of text (categorical) variables. (blue columns) +
-  ​Detection of Date/time variables ​in the dataset.(green columns) +
-  * Reading records ​in backward order+
-  * Missing values detection.+
-:!: Limitations:​ \\ +**File > New project > CSV/​XLS/​XLSX connection** is used to connect data files, this can be one or several Excel files or ASCI files with delimiters, for example, ''​.csv'',​ ''​.txt'',​ etc. At all stages you can keep your files opened for editing, for example in Excel or Notepad. The following limitations apply
-  * The module ​reads data only from the first sheet of .xls or .xlsx file.\\ +  * The module ​works with columns. If series ​of observations are rows (not columns), **Transpose data** option must be turned-on
-  * The module ​can'​t ​read from password-protected ​files.  ​+  * The module ​cannot ​read password-protected ​spreadsheets.  ​
-Data files may consist ​of numeral, categorical (text) or date/time columns. There is a color-based indication of column ​types:+In the right part of the connection dialog there is a preview window with preliminary import results. It uses the following ​color-based indication of data types: 
 +  * Numeric cells - black. 
 +  * Text cells (categorical data) - blue. 
 +  * Date/time cells - green. 
 +  * Missing values - light gray. (if the cell is not empty) 
 +  * Discarded columns - gray cells. ​
-  * Categorical (text) - blue. +{{ :​img:​dialog_data_connection.png?​width=543 |Import dialog}}
-  * Date/time - green. +
-  * Missing values - light grey. (if the cell in not empty)+
-Column ​names are allowed but not required. +== Column ​labels ==
-GMDH Shell does not provide tools for file editing but allows a user to keep data file opened for editing. It is not required to close the editor (Excel, Notepad, etc) before clicking at the Import or the Start button. You can modify data file using your editor, save changes and immediately start recalculation of results in GMDH Shell.+
-=== Import dialog ===+**Column labels** are used as variable names, therefore you should either instruct GMDH Shell to **Use 1st row** for this purpose or to generate labels automatically:​ ''​**x1,​ x2, x3...**''​. Both variants can be used simultaneously with **Custom labels** allowing you to replace any labels with your own. For example, **Custom labels** ''​**,,​Date,​**''​ will be interpreted as labels ''​**x1,​ x2, Date, x4**''​
-  - Click on the Import button {{:​img:​button-24_import.png?​height=16|Import button}} located in the toolbar.  +== Row labels (IDtimestamp) ==
-  - Select one of your data filespress OK. Then the Import configuration dialog opens. +
-  - In the Import dialog set the importing parameters, press OK. +
-  - If your project folder doesn'​t contain project settings yet, the Template selection dialog appears next to the Import configuration dialog.  +
-  - In the Template selection dialog choose a relevant [[Templates|Template]] and press OK.+
-If your project folder contains several data files, the Import module makes all of them available in the Data manager. Selection of just one file points importer module ​to the whole directory.+**Row labels** are used to refer to observations,​ **Row labels** must be unique like date and time or like identifiers.
-The file selected during the import procedure receives a special status ''​Current'' ​in the Data managerOnly variables ​from the Current file can be used as model inputs or targets without filename prefixes, for example, ''​var1''​ instead ​of ''​filename.var1''​ required for other files.+If your dataset store timestamps elements (year, month, day, week and time) in separate columns, you can compose ​the timestamp from several columnsFor this purpose use the option **Compose ID from several columns**. All timestamp elements in the dataset must be neighboring columns and the option **Read row lables from column N** must point to the first of themSupported combinations are: **Year + Month**, **Year + Week**, **Year + Quarter**, **Year + Month + Day**, **Date + Time**.
-Current file path will appear in the title bar of GMDH Shell window: 
-{{:​img:​_window_title_bar.png|}}+==Missing values==
-==== Import ​configuration ====+Import ​ ​module is responsible for detection of missing values. It replaces various types of missing values with regular NULL values and thus allows us to handle them properly at the [[Preprocess]] stage.
-{{:​img:​dialog_import_csv-xls-xlsx.png?​width=513|Import dialog}}+**Missing value mark** is used to type in the missing mark or to select one of the standard marks
-== Read column labels from the 1st row ==  +**Consider text cells as missing** ​is used to replace all non-numeric cells with regular NULL values.
-Reads column names from the first row of data file(s). The number of elements in the first row is used for  detection of data table width.+
-== Read row labels (ID, timestamp) from column N == +== Other settings ​==
-If you have unique data row identifiers,​ for example, calendar dates then you can tell the Importer in which column they are located and use them for visualization instead of default ID marks. For example, row labels serve as timestamps for time series charts. +
-In case of multiple data files, the row labels will be taken from the ''​Current''​ data file.+
-Quite often datasets have date marks such as year and month or week located in separate columns. Then you can compose timestamps from several columns using the option ​**Compose ID from several columns**. +**Delimiter** is used to set the delimiter type such as coma, space, tab, or any otherThis option ​is applicable only to ASCI files with delimiters (''​.csv'',​ ''​.txt'',​ etc.).
-In this case all aggregated columns in the dataset must be neighboring and the option ​**Read row lables from column N** must point to the first of them.+
-==CSV delimiter==+**Import all files with the same extension** is used to connect all files with the same extension within the current directory.
-Sets delimiter type. Applicable ​to CSV files only.+**Import all sheets of workbook** is used to connect all sheets of one or many ''​.xls''​ or ''​.xlsx''​ workbooks.
-==Missing value mark==+**Transpose tables, i.e. read columns from rows** is used to support data series formatted as rows (not columns).
-Import ​ module ​is responsible for detection ​of missing values. It replaces various cells that fall into missing value conditions with regular NULL values ​and thus allows [[Preprocess|Preprocessor]] module to handle missing values appropriately.+**Reverse order of rows** ​is used to support data series where the most resent observations are in the top of the table and the oldest observations are in the bottom
-==Consider text cells as missing==+**Import rows starting from** is used to skip a number of rows in the top of the table, in particular this option is used to skip the header information.
-Replaces any non-numeric values with regular NULL values.+==== ODBC/OLEDB connection ====
 +**File > New project > ODBC/OLEDB connection** is used to connect various databases. Most database vendors provide at least a minimal ODBC driver with their database. This import module requires knowledge of SQL quires.
-==== Data file examples ==== 
-\\+==== Sage Accounting connection ====
-{{:​img:​window_openoffice_dataset.png|}}+**File > New project > Sage Accounting connection** is used to easily extract retail sales data from 'Sage 50' and other versions of Sage accounting software ( The module is only available in the '​Business forecasting'​ version of GMDH Shell. ​  
-==== ODBC/OLEDB Import ==== 
-This preprocessor will be available soon. 
import.1339508097.txt.gz · Last modified: 2017/06/02 05:31 (external edit)