General Info

Home
News
CEO
Mail us
Contact
Links

Product Info

Shih Data Miner
Apps & Services
Screenshots
Technical Info
Download
ChangeLog
Manual
License
Recent users

Income rules
Motors case

Publications

Classification methods

ChangeLog

Oct 29 2004, Release 3.25 Bug fix. Loading a saved tree model now works. Thanks to our user Mario Cordon from Office Depot who reported the bug !
Oct 13 2004, Release 3.24 Weight variable option is disabled due to incorrect behavior. It will be enabled once it works correctly.
Sept 3 2004, Release 3.23 Weight variable option. First version. It has not been tested, it might have bugs, please report.
Aug 3 2004, Release 3.22 Variables ranking shown in XOutput and Control Center text area ! Selection of terminal nodes by clicking on a row on the lift curves table allows to find interesting nodes fast.
July 29 2004, Release 3.21 Option to update to the latest version by selecting File -> Update Shih in the main application window. Much more practical than having to download from the website !
June 26 2004, Release 3.20 The sampling seed is now simply called seed. It is used not only to sample the test data but also to sample missing values. For a fixed seed the missing values are filled always with the same values, therefore the same Goodness of a split (GOS) will be obtained when you press several times the button GOS in the control panel for a variable that has missings. Thanks to Dr Rainer Block for suggesting this improvement.
June 23 2004, Release 3.19 The button in the control center used to draw the Goodness of split function in terms of the splitting values is removed. This button was needed in order to be able to manually split a node. Now the function is automatically drawn when you select a variable and you can immediatly split manually the node. This is more straightforward and user friendly than first using the button to draw the function in order to be able to split manually. For classification trees, the bar distribution of the target variable is drawn when no node is selected. In such case the distribution of the root node is drawn. Also for classification trees a text area is added in order to print the costs of the model and to have a more compact display of the information. This allows copying the information to the clipboard, the same was done for regression trees since version 3.17.
June 19 2004, Release 3.18 We thank Dr Rainer Block for his suggestions to implement LAD regression and to suggest the use of statistic indicators for regression trees. Added R squared and Pearson correlation statistics for deviations from the median. Added the Akaike information Criterion (AIC) statitic for regression trees. The AIC statistic takes into account the trade off between the sum of squared errors and the complexity of a model penalizing models with many nodes i.e. complex models. This is a statitic to look at to avoid underfitting and overfitting the data. The smaller the AIC the better. Performance tuning for large databases in the GUI. Trees built with 10ths of thousands examples can be moved faster giving the impression that they are very light. 15% speed gain from the construction to the display time of classification trees.
June 18 2004, Release 3.17 Performance tuning for regression trees generated with big databases. They can be moved fast and easily in the TreePanel and the zoomed view. Text area shows statistics for regression trees allowing cut and paste operations and a more compact display of information.
June 10 2004, Release 3.16 Added Least Absolute Deviation (LAD) regression besides to Ordinary Least Squares regression (OLS). This is an alternative algorithm for predicting continuous variables. LAD is a special case of quantile regression, the case of predicting the 50% quartile, which is the median. LAD selects splits that maximize the decrease of deviation to the median, or the decrease of the sum of absolute errors. On the other hand OLS predicts the mean and selects splits maximizing the decrease of variance, which is the decrease of summed squared errors. LAD is more or less as fast as OLS depending on the data. The main benefit is that it is robust to outliers and at the same time it can be applied with the same data used by OLS. Therefore it is worth to try both alternatives. Experiments show that it works better than OLS in several cases, while worst on others. The models generated are usually simpler with less nodes, and the pruning sequence is smaller. Empirical evidence indicate that models generalize better, validating good in the testing data.
May 26 2004, Release 3.15 Speed boost for large databases, i.e. above 50.000 rows. 20% faster during the main tree construction phase.
May 18 2004, Release 3.14 Distribution at node level of categorical variables is implemented for regression trees. It is represented as a horizontal bar under the node (an additional bar is displayed for the testing data). Each color in the bar represents a value for the displayed variable. If you want to select or change the categorical variable to display select in the tree window the menu Tree->Distributions to get the variable selector dialog. The variable selector dialog will also show the exact numerical percentages of each value for the selected variable and the selected node. Note that if no node is selected the distribution shown in the dialog will default to the whole data which corresponds also to the root node.
May 9 2004, Release 3.12 Two new model statistics incorporated for the evaluation of regression trees. Namely the R-squared, i.e. the proportion of the variance of the dependent variable explained by the model and the Pearson's correlation between the observed and the predicted values.
May 7 2004, Release 3.11 Added new radio buttons in the model parameters window to select if a loaded model will be used entirely to construct a tree, by selecting use model structure, or just the loaded parameters, by selecting create new tree. This gives a high level of flexibility, since you can build, for example a regression tree and when you open its saved model file, you can change in the parameters the type of tree to Gini and see how the tree built with regression performs as a classifier! This sometimes can perform better than building the tree directly as a classification tree ! Of course many other type of experiments can be done following the same idea.
May 5 2004, Release 3.1 Added filter xml, txt, cvs, and dat to openfile dialogs, this limits possible mistakes. For example someone trying to load an excel file instead of a tree data file. Added radio buttons in menu Preferences from the main window
May 3 2004, Release 3.0 Tree can be loaded from the XML model file ! Select on the main window the menu File -> Tree -> Build from XML model
April 29 2004, Release 2.91 Added the GTK look and feel. Use it by selecting on the main window the menu Preferences -> Look & Feel -> GTK
April 27 2004, Release 2.9 - 1) For classification trees, under the nodes, horizontal bars show by default the distribution of the classification variable. The default can be changed to display a user selected categorical variable by selecting on the Tree window the menu Tree->Distributions. 2) Verbose information is now printed by default in the XOutput console. 3) When selecting the Variables tab in the Model Parameters Window the cursor changes to a clock while the data is being processed.
April 7 2004, Release 2.8 - 1) New fuctionalities for drawing lift curves and sorting of terminal nodes by different criterias. 2) Saved XML/PMML model can be partially reloaded. The variables selections, the misclassification costs, and the train and test files used to create the model are saved and can be recovered without having to redefine them again when you want to build the same tree. 3) The variable panel from the model parameters window, where the target and used variables are defined, can be personalized to modify the number of rows displayed. 4) Input train and text files can have empty lines between examples or trailing empty lines at the end of the file.
Mar 27 2004, Release 2.74 - New about box.
Mar 25 2004, Release 2.73 - In an effort to have a native windows version of the program changes in the gui are necessary. All internal windows have to be modified. Their menus will be removed and replaced by popup menus accessible via right clicking on the mouse inside the window. The first window that was modified is the model parameters window. It's menu was replaced by a popup menu. The train and test files can be accessed via the popup menu or via two new file open buttons. Finally the last change was on the manual, the online version of the manual can only be instantiated once, and the links now work!
Mar 24 2004, Release 2.72 - Print node information (error, num cases, etc) for regression trees by right clicking on the node and selecting Info to output.
Mar 23 2004, Release 2.71 - Access the manual by selecting Help -> Online manual in the main window menu.
Mar 22 2004, Release 2.7 - Copy the tree to a jpg image file and then to your favorite application, by selecting Tree -> Save image (jpg) in the Tree Window menu.
Mar 18 2004, Release 2.64 - Added menu preferences in the model parameters window in order to define the tabs on top or on bottom. Handy if you have low screen resolutions and cannot see the default tab on the bottom.
Mar 17 2004, Release 2.63 - Categorical variables that have the quotation i.e. the " character can be now be used in the PMML models without having to preprocess the data.
Mar 2 2004, Release 2.62 - Blank character treated as missing value.
Dec 18 2003, Release 2.61 - More information in prediction output for regression trees.
Nov 04 2003, Release 2.6 - Association rules filtered simoultaneously by several metrics.
Oct 29 2003, Release 2.57 - Additional options to generate association rules.
Oct 02 2003, Release 2.56 - Missing values estimation optimized gives improved results. Missing values implemented for Regression Trees. Yates corrected p-values for tree rules and association rules.
Sep 30 2003, Release 2.55 - Statistical significance filter for association rules eliminating rules that were generated by chance. German version (Select Preferences -> Language ).
Sep 18 2003, Release 2.53 - p-values (chi square) giving the probability that rules were generated by chance. After building a classification tree select Tree -> Save Rules -> IF .. THEN to see them in the console or in a file.
Sep 17 2003, Release 2.51 - Output of mean, std dev and count of values for tree variables.
Aug 30 2003, Release 2.5 - Association rules (apriori)!
Aug 20 2003, Release 2.4 - Dutch, French, Spanish versions in preferences !
July 20 2003, Release 2.36 - Missing values estimated dynamically by local density estimation (conditioned to node and class) !
July 9 2003, Release 2.35 - New editable output window to document the construction of the models !
July 6 2003, Release 2.3 - With JDBC access popular SQL databases such as Oracle, MySql, SAP DB or MS Access.
Jan 17 2003, Release 2.2 - Classify/predict new cases from GUI! (see Readme.txt )
Jan 7 2003, Release 2.1 - Saving model as PMML/XML file + single/batch scoring of new cases !
Nov 23 2002, Release 2.0 - Speed boost: faster than the fastest tree tool!
Oct 12 2002, Release 1.9 - Color visualization + GUI enhancement!
Oct 8 2002, Release 1.82 - Regression look ahead algo! Great results.
Oct 3 2002, Release 1.81 - User interface additions.
Sept 30 2002, Release 1.80 - Readable compact rules + speed boost.
Sept 22 2002, Release 1.79 - added the Metouia L&F + model color parameters.
Sept 21 2002, Release 1.78 - added the Kunststoff Look and Feel.