User Tools

Site Tools


concepts

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

concepts [2012/06/12 08:41]
Oleksiy [Project folders and Templates]
concepts [2017/06/02 09:31]
Line 1: Line 1:
-===== Concepts and Features ===== 
  
- 
-GMDH Shell is a Windows application, its GUI consists of a host application and plug-ins that draw their panels inside the host window. There are two areas - tabs and the sidebar where plug-in panels can be placed. 
- 
-GMDH Shell plug-ins are linked in a chain that can be executed by clicking at the Start button or from the command line.   
- 
-{{:img:pic_processing_chain.png |Concept}} 
- 
- 
-==== Project folders and Templates ==== 
- 
- 
- 
-When you click at the Start button GMDH Shell saves all modified settings (several files) to a folder where the dataset is located. At program start-up plug-ins try to read settings from this folder and if some of configuration files are not found they use default settings located in the program installation directory. 
- 
-So, project folder is a folder that contains data sources and settings. All settings inside the project folder can be applied only to datasets stored in this folder. 
- 
-Task specific project settings called Templates can be loaded using the Menu > File > Load template. 
- 
-==== Dataset ==== 
- 
- 
- 
-GMDH Shell can read a dataset from CSV (text) and XLS files composed of columns and rows. In the program GUI we usually mention the columns and rows as variables and observations. In case of classification problem we call them variables and instances. 
- 
- 
- 
-GMDH Shell uses part of dataset variables as model inputs and another part (one or more variables) as prediction targets. Multivariate datasets consist of two variables or more, while univariate datasets consist of only one variable.  
- 
-   
- 
-==== Problem types ==== 
- 
- 
- 
-To solve real-world predictive analytics problems we should formulate them in terms of standard problem types. GMDH Shell is able to produce categorical and continuous value predictions that allow it to solve classification and regression problems respectively. It is notable that GMDH Shell provides sophisticated tools for Time series forecasting which is a special type of continuous value prediction. Other popular tasks that GMDH Shell can help with are Feature ranking, Function finding and Curve fitting. 
- 
- 
- 
-=== Time series forecasting === 
- 
- 
- 
-Time series are time-ordered datasets (univariate or multivariate). The recent part of data in time series is usually more important for model training than other historical observations. GMDH Shell has a special Time series preprocessor for proper management of ordered observations. Time series preprocessor allows core algorithms to learn from a window of latest data. Other useful feature of Time series preprocessor is the ability to launch iterative step-back simulations for evaluation of method accuracy. 
- 
- 
- 
-=== Classification === 
- 
- 
- 
-Classification is prediction of a category of unknown instance. GMDH Shell has a special Classification & Regression preprocessor that allows two-class and multi-class classification. GMDH Shell requires all text data to be encoded with numbers.Target variables with more than two categories can be encoded and decomposed into binary variables or just encoded with numbers. 
- 
- 
- 
-=== Regression ===  
- 
- 
- 
-Regression is a prediction of continuous values. Unknown points of target variable can reside in the end of the dataset or be involved during application of a model. 
- 
- 
- 
-==== Results ==== 
- 
- 
- 
-As a result of processing GMDH Shell returns a set of predictive models and their predictions. The best model for the first target variables is shown in visualization panels by default. Other models of the same target can be viewed using the [[Model browser]] panel. Also GMDH Shell calculates importance of each variable and model performance for the known part of modeled dataset. 
- 
- 
- 
-====Feature list==== 
- 
- 
-==Solving modeling problems:== 
- 
-   * Multivariate time series forecasting  
- 
-   * Regression (continuous value prediction) 
- 
-   * Classification (prediction of a category) 
- 
-   * Ranking and selection of variables 
- 
-   * Polynomial curve fitting 
- 
-== Modeling simulation outputs the following results: == 
- 
-   * A set of models that can be exported to Excel 
- 
-   * Predictions 
- 
-   * Importance of input variables 
- 
-   * Analysis of out-of-sample model accuracy 
- 
-== Predictive modeling work-flow:== 
- 
-   * Create a model 
- 
-   * Save the model  
- 
-   * Export the model's formula to Excel (deploy a model) 
- 
-   * Load a model from a save-file 
- 
-   * Apply the model to unknown instances within the analyzed file 
- 
-   * Apply the model to a new data-file (scoring) 
- 
-== Learning algorithms: == 
- 
-   * GMDH-type neural networks 
- 
-   * Combinatorial GMDH 
- 
-== Embedded data exploration: == 
- 
-   * File preview 
- 
-   * Descriptive statistics 
- 
-   * Line charts 
- 
-   * Bar charts 
- 
-   * Scatter plot 
- 
-   * Histogram 
- 
-   * Autocorrelation chart 
- 
-   * Pair-wise correlations with ranking 
- 
-   * Contour plot 
- 
-   * Heat map 
- 
-   * 3D surface 
- 
-== Data-file formats: == 
- 
-   * CSV (and any other text files with delimiters)  
- 
-   * XLSX 
- 
-   * XLS 
- 
-   * File sets with the same extension 
- 
-== Data pre-processing: == 
- 
-   * Visual handling of input and output (target) variables and data transformations 
- 
-   * Handling of missing values 
- 
-   * Converting categorical (text) data into numeric values (encoding and binary decomposition) 
- 
-   * Weighting of dataset rows (handling of imbalanced classification problems)  
- 
-   * Time series preprocessing (lags, differences, moving average, incremental weighting of dataset rows) 
- 
-   * Elementary functions (logarithmic transformation, normalization, etc.) 
- 
-== Dynamic post-processing == 
- 
-   * Average of top-ranked models 
- 
-   * Quantization of predictions 
- 
-== Miscellaneous:== 
- 
-   * Background execution mode via the command line 
- 
-   * Dataset examples and project templates 
- 
-   * One-click result recalculation for dynamically updated data files 
- 
-   * Support for multi-core processors  
- 
-   * Support for clustered Linux systems (Enterprise edition) 
- 
- 
- 
- 
-~~UP~~ 
concepts.txt ยท Last modified: 2021/06/01 03:27 (external edit)