General Info
Home
News
CEO
Mail us
Contact
Links
Product Info
Shih Data Miner
Apps & Services
Screenshots
Technical Info
Download
ChangeLog
Manual
License
Recent users
Income rules
Motors case
Publications
Classification methods
ChangeLog
Oct 29 2004, Release 3.25 Bug fix. Loading a saved tree model now works.
Thanks to our user Mario Cordon from Office Depot who reported the bug !
Oct 13 2004, Release 3.24 Weight variable option is disabled due to incorrect behavior. It will be enabled once it works correctly.
Sept 3 2004, Release 3.23 Weight variable option. First version. It has not been tested, it might have bugs, please report.
Aug 3 2004, Release 3.22 Variables ranking shown in XOutput and Control Center text area !
Selection of terminal nodes by clicking on a row on the lift curves table allows to find interesting nodes fast.
July 29 2004, Release 3.21 Option to update to the latest version by selecting File -> Update Shih
in the main application window. Much more practical than having to download from the website !
June 26 2004, Release 3.20 The sampling seed is now simply called seed. It is used not only to sample the test data
but also to sample missing values. For a fixed seed the missing values are filled always with the same values, therefore
the same Goodness of a split (GOS) will be obtained when you press several times the button GOS in the control panel
for a variable that has missings. Thanks to Dr Rainer Block for suggesting this improvement.
June 23 2004, Release 3.19 The button in the control center used to
draw the Goodness of split function in terms of the splitting values is removed. This button was needed in order to
be able to manually split a node. Now the function is automatically drawn when you select a variable and you can
immediatly split manually the node. This is more straightforward and user friendly than first using the button to draw
the function in order to be able to split manually. For classification trees, the bar distribution of the target
variable is drawn when no node is selected. In such case the distribution of the root node is drawn. Also for
classification trees a text area is added in order to print the costs of the model and to have a more compact display
of the information. This allows copying the information to the clipboard, the same was done for regression trees since
version 3.17.
June 19 2004, Release 3.18 We thank Dr Rainer Block for his suggestions to implement LAD regression and to suggest
the use of statistic indicators for regression trees. Added R squared and Pearson correlation statistics for
deviations from the median. Added the Akaike information Criterion (AIC) statitic for regression trees.
The AIC statistic takes into account the trade off between the sum of squared errors and the complexity of a model
penalizing models with many nodes i.e. complex models. This is a statitic to look at to avoid underfitting
and overfitting the data. The smaller the AIC the better. Performance tuning for large databases in the GUI.
Trees built with 10ths of thousands examples can be moved faster giving the impression that they are very light.
15% speed gain from the construction to the display time of classification trees.
June 18 2004, Release 3.17 Performance tuning for regression trees generated with big databases.
They can be moved fast and easily in the TreePanel and the zoomed view. Text area shows statistics for
regression trees allowing cut and paste operations and a more compact display of information.
June 10 2004, Release 3.16 Added Least Absolute Deviation (LAD) regression
besides to Ordinary Least Squares regression (OLS).
This is an alternative algorithm for predicting continuous
variables.
LAD is a special case of quantile regression, the case of predicting
the 50% quartile, which is the median. LAD selects splits that maximize the
decrease of deviation to the median, or the decrease of the sum of
absolute errors.
On the other hand OLS predicts the mean and selects splits maximizing
the decrease of variance, which is the decrease of summed squared errors.
LAD is more or less as fast as OLS depending on the data.
The main benefit is that it is robust to outliers and at the same time it can be applied
with the same data used by OLS. Therefore it is worth to try both alternatives.
Experiments show that it works
better than OLS in several cases, while worst on others.
The models generated are usually
simpler with less nodes, and the pruning sequence is smaller.
Empirical evidence indicate that models generalize
better, validating good in the testing data.
May 26 2004, Release 3.15 Speed boost for large databases, i.e. above 50.000
rows. 20% faster during the main tree construction phase.
May 18 2004, Release 3.14 Distribution at node level of categorical
variables is implemented for regression trees. It is represented as a horizontal
bar under the node (an additional bar is displayed for the testing data). Each
color in the bar represents a value for the displayed variable. If you want to select
or change the categorical variable to display select in the tree window the menu
Tree->Distributions to get the variable selector dialog.
The variable selector dialog will also show the exact numerical percentages of
each value for the selected variable and the selected node.
Note that if no node is selected the distribution
shown in the dialog will default to the whole data which corresponds also to the
root node.
May 9 2004, Release 3.12 Two new model statistics incorporated for the
evaluation of regression trees. Namely the R-squared, i.e. the proportion
of the variance of the dependent variable explained by the model and the
Pearson's correlation between the observed and the predicted values.
May 7 2004, Release 3.11 Added new radio buttons in the model parameters
window to select if a loaded model will be used entirely to construct a tree,
by selecting use model structure, or just the loaded parameters, by selecting
create new tree. This gives a high level of flexibility, since you can build,
for example a regression tree and when you open its saved model file,
you can change in the parameters the type of tree
to Gini and see how the tree built with regression performs as a classifier!
This sometimes can perform better than building the tree directly as a
classification tree !
Of course many other type of experiments can be done following the same idea.
May 5 2004, Release 3.1 Added filter xml, txt, cvs, and dat to openfile dialogs,
this limits possible mistakes. For example someone trying to load an excel file instead
of a tree data file. Added radio buttons in menu Preferences from the main window
May 3 2004, Release 3.0 Tree can be loaded from the XML model file ! Select
on the main window the menu File -> Tree -> Build from XML model
April 29 2004, Release 2.91 Added the GTK look and feel. Use it by selecting
on the main window the menu Preferences -> Look & Feel -> GTK
April 27 2004, Release 2.9 - 1) For classification trees, under the nodes,
horizontal bars show by default the distribution of the classification variable.
The default can be changed to display a user selected categorical variable by
selecting on the Tree window the menu Tree->Distributions.
2) Verbose information is now printed by default in the XOutput console.
3) When selecting the Variables tab in the Model Parameters Window the cursor changes
to a clock while the data is being processed.
April 7 2004, Release 2.8 - 1) New fuctionalities for drawing lift curves and sorting of terminal nodes by different
criterias.
2) Saved XML/PMML model can be partially reloaded. The variables selections, the misclassification
costs, and the train and test files used to create the model are saved and can be recovered without having to
redefine them again when you want to build the same tree. 3) The variable panel from the model parameters window,
where the target and used variables are defined, can be personalized to modify the number of rows displayed.
4) Input train and text files can have empty lines between examples or trailing empty lines at the end of
the file.
Mar 27 2004, Release 2.74 - New about box.
Mar 25 2004, Release 2.73 - In an effort to have a native windows version of the program changes in the gui are necessary.
All internal windows have to be modified. Their menus will be removed and replaced by popup menus accessible via
right clicking on the mouse inside the window. The first window that was modified is the model parameters window.
It's menu was replaced by a popup menu. The train and test files can be accessed via the popup menu or via two
new file open buttons. Finally the last change was on the manual, the online version of the manual can only
be instantiated once, and the links now work!
Mar 24 2004, Release 2.72 - Print node information (error, num cases, etc) for regression trees by right clicking on the node
and selecting Info to output.
Mar 23 2004, Release 2.71 - Access the manual by selecting Help -> Online manual in
the main window menu.
Mar 22 2004, Release 2.7 - Copy the tree to a jpg image file and then to your favorite application,
by selecting Tree -> Save image (jpg) in the Tree Window menu.
Mar 18 2004, Release 2.64 - Added menu preferences in the model parameters window in order to define
the tabs on top or on bottom. Handy if you have low screen resolutions and cannot see
the default tab on the bottom.
Mar 17 2004, Release 2.63 - Categorical variables that have the quotation i.e. the " character can be now be used
in the PMML models without having to preprocess the data.
Mar 2 2004, Release 2.62 - Blank character treated as missing value.
Dec 18 2003, Release 2.61 - More information in prediction output for regression trees.
Nov 04 2003, Release 2.6 - Association rules filtered simoultaneously by several metrics.
Oct 29 2003, Release 2.57 - Additional options to generate association rules.
Oct 02 2003, Release 2.56 - Missing values estimation optimized gives improved results. Missing values implemented for Regression Trees.
Yates corrected p-values for tree rules and association rules.
Sep 30 2003, Release 2.55 - Statistical significance filter for association rules eliminating rules that were generated by chance. German version (Select Preferences -> Language ).
Sep 18 2003, Release 2.53 - p-values (chi square) giving the probability that rules were generated by chance. After building a classification tree select Tree -> Save Rules -> IF .. THEN to see them in the console or in a file.
Sep 17 2003, Release 2.51 - Output of mean, std dev and count of values for tree variables.
Aug 30 2003, Release 2.5 - Association rules (apriori)!
Aug 20 2003, Release 2.4 - Dutch, French, Spanish versions in preferences !
July 20 2003, Release 2.36 - Missing values estimated dynamically by local density estimation (conditioned to node and class) !
July 9 2003, Release 2.35 - New editable output window to document the construction of the models !
July 6 2003, Release 2.3 - With JDBC access popular SQL databases such as Oracle, MySql, SAP DB or MS Access.
Jan 17 2003, Release 2.2 - Classify/predict new cases from GUI! (see Readme.txt )
Jan 7 2003, Release 2.1 - Saving model as
PMML/XML file + single/batch scoring of new cases !
Nov 23 2002, Release 2.0 - Speed boost: faster than the fastest tree tool!
Oct 12 2002, Release 1.9 - Color visualization + GUI enhancement!
Oct 8 2002, Release 1.82 - Regression look ahead algo! Great results.
Oct 3 2002, Release 1.81 - User interface additions.
Sept 30 2002, Release 1.80 - Readable compact rules + speed boost.
Sept 22 2002, Release 1.79 - added the Metouia L&F + model color parameters.
Sept 21 2002, Release 1.78 - added the Kunststoff Look and Feel.
