General Info
Home
News
CEO
Mail us
Contact
Links
Product Info
Shih Data Miner
Apps & Services
Screenshots
Technical Info
Download
ChangeLog
Manual
License
Recent users
Income rules
Motors case
Publications
Classification methods
Predicting income
This illustrates the type of decision rules you get with a tree model induced them from the training data. Shih data miner
allows to manually modify the rules and see the impact in the model. This is usefull to adjust
them to the company's policies.
The goal is to make a model to classify between 2 types of people, the ones that win more than 50 thousand
dollars, and the ones who don't. This study was needed for the US tax office in order to identify the
people who had high probability of winning above that amount in order to control beforehand if they where paying
their taxes or not.
The following rules are the most important ones in order to identify between the two types of income.
Note that the percentage of people wining more than 50K was 23.5% in the training data, therefore a node
with more percentage than 23.5% will be considered as a node >50K. The rules are ordered in decreasing sequence
from the higher to the lower income probability:
[RULE: 15, Income --> >50K.]
IF capital-gain > 5095.5
AND education-num <= 9.5
AND
relationship IN (' Wife',' Husband')
THEN Income = >50K. with
probability : 97.64706%
[RULE:
24, Income --> >50K.]
IF
capital-gain > 7055.5
AND relationship IN ('
Other-relative',' Unmarried',' Not-in-family',' Own-child')
THEN
Income = >50K. with probability : 96.33028%
[RULE:
14, Income --> >50K.]
IF
capital-loss > 1794.0
AND occupation IN (' Armed-Forces','
Handlers-cleaners',' Transport-moving',' Priv-house-serv','
Sales',' Adm-clerical',' Craft-repair',' Prof-specialty',' ?','
Protective-serv',' Machine-op-inspct')
AND age > 26.5
AND
hours-per-week > 36.5
AND 8.5 < education-num <=
9.5
AND capital-gain <= 5095.5
AND relationship IN ('
Wife',' Husband')
THEN Income = >50K. with probability :
72.22223%
[RULE:
18, Income --> >50K.]
IF age >
27.5
AND education-num > 9.5
AND relationship IN ('
Wife',' Husband')
THEN Income = >50K. with probability :
61.43148%
[RULE: 7,
Income --> >50K.]
IF
occupation IN (' Tech-support',' Exec-managerial')
AND age >
26.5
AND hours-per-week > 36.5
AND 8.5 < education-num
<= 9.5
AND capital-gain <= 5095.5
AND relationship IN
(' Wife',' Husband')
THEN Income = >50K. with probability :
52.499996%
[RULE:
22, Income --> <=50K.]
IF
hours-per-week > 40.5
AND age > 33.5
AND education-num
> 12.5
AND capital-gain <= 4668.5
AND relationship IN
(' Other-relative',' Unmarried',' Not-in-family',' Own-child')
THEN
Income = <=50K. with probability : 62.15139%
[RULE:
13, Income --> <=50K.]
IF
workclass IN (' Never-worked',' Without-pay',' Self-emp-inc','
State-gov',' Federal-gov',' Self-emp-not-inc',' Local-gov','
Private')
AND native-country IN (' France','
Outlying-US(Guam-USVI-etc)',' Trinadad&Tobago',' Greece','
Hong',' Hungary',' Yugoslavia',' Ecuador',' Jamaica',' Scotland','
Iran',' Honduras',' Nicaragua',' China',' Canada',' Italy','
Taiwan',' Cuba',' England',' Laos',' Poland',' Cambodia',' India','
Japan',' Columbia',' South',' Vietnam',' Puerto-Rico','
El-Salvador',' Haiti',' Thailand',' Philippines',' Germany','
Ireland',' Dominican-Republic',' Guatemala',' Peru',' ?','
United-States')
AND age > 35.5
AND capital-loss <=
1794.0
AND occupation IN (' Armed-Forces',' Handlers-cleaners','
Transport-moving',' Priv-house-serv',' Sales',' Adm-clerical','
Craft-repair',' Prof-specialty',' ?',' Protective-serv','
Machine-op-inspct')
AND hours-per-week > 36.5
AND 8.5 <
education-num <= 9.5
AND capital-gain <= 5095.5
AND
relationship IN (' Wife',' Husband')
THEN Income = <=50K.
with probability : 64.29549%
[RULE: 9,
Income --> <=50K.]
IF
occupation IN (' Armed-Forces',' Handlers-cleaners','
Priv-house-serv',' Sales',' Craft-repair',' Prof-specialty',' ?','
Protective-serv',' Machine-op-inspct')
AND fnlwgt <=
185452.0
AND 26.5 < age <= 35.5
AND capital-loss <=
1794.0
AND hours-per-week > 36.5
AND 8.5 <
education-num <= 9.5
AND capital-gain <= 5095.5
AND
relationship IN (' Wife',' Husband')
THEN Income = <=50K.
with probability : 66.666664%
[RULE: 3,
Income --> <=50K.]
IF age >
41.5
AND hours-per-week > 40.5
AND education-num <=
8.5
AND capital-gain <= 5095.5
AND relationship IN ('
Wife',' Husband')
THEN Income = <=50K. with probability :
69.60784%
[RULE:
23, Income --> <=50K.]
IF 4668.5
< capital-gain <= 7055.5
AND relationship IN ('
Other-relative',' Unmarried',' Not-in-family',' Own-child')
THEN
Income = <=50K. with probability : 72.34042%
[RULE:
17, Income --> <=50K.]
IF
education IN (' Preschool',' 1st-4th',' 12th',' 9th',' Assoc-voc','
5th-6th',' Masters',' Bachelors',' 7th-8th',' Prof-school','
10th',' Assoc-acdm',' HS-grad',' 11th')
AND age <= 27.5
AND
education-num > 9.5
AND relationship IN (' Wife','
Husband')
THEN Income = <=50K. with probability : 73.74999%
[RULE:
10, Income --> <=50K.]
IF fnlwgt
> 185452.0
AND 26.5 < age <= 35.5
AND capital-loss
<= 1794.0
AND occupation IN (' Armed-Forces','
Handlers-cleaners',' Transport-moving',' Priv-house-serv','
Sales',' Adm-clerical',' Craft-repair',' Prof-specialty',' ?','
Protective-serv',' Machine-op-inspct')
AND hours-per-week >
36.5
AND 8.5 < education-num <= 9.5
AND capital-gain <=
5095.5
AND relationship IN (' Wife',' Husband')
THEN Income =
<=50K. with probability : 84.210526%
[RULE: 2,
Income --> <=50K.]
IF age <=
41.5
AND hours-per-week > 40.5
AND education-num <=
8.5
AND capital-gain <= 5095.5
AND relationship IN ('
Wife',' Husband')
THEN Income = <=50K. with probability :
84.507034%
[RULE: 8,
Income --> <=50K.]
IF
occupation IN (' Transport-moving',' Adm-clerical')
AND fnlwgt
<= 185452.0
AND 26.5 < age <= 35.5
AND capital-loss
<= 1794.0
AND hours-per-week > 36.5
AND 8.5 <
education-num <= 9.5
AND capital-gain <= 5095.5
AND
relationship IN (' Wife',' Husband')
THEN Income = <=50K.
with probability : 85.0%
[RULE:
12, Income --> <=50K.]
IF
workclass IN (' ?')
AND native-country IN (' France','
Outlying-US(Guam-USVI-etc)',' Trinadad&Tobago',' Greece','
Hong',' Hungary',' Yugoslavia',' Ecuador',' Jamaica',' Scotland','
Iran',' Honduras',' Nicaragua',' China',' Canada',' Italy','
Taiwan',' Cuba',' England',' Laos',' Poland',' Cambodia',' India','
Japan',' Columbia',' South',' Vietnam',' Puerto-Rico','
El-Salvador',' Haiti',' Thailand',' Philippines',' Germany','
Ireland',' Dominican-Republic',' Guatemala',' Peru',' ?','
United-States')
AND age > 35.5
AND capital-loss <=
1794.0
AND occupation IN (' Armed-Forces',' Handlers-cleaners','
Transport-moving',' Priv-house-serv',' Sales',' Adm-clerical','
Craft-repair',' Prof-specialty',' ?',' Protective-serv','
Machine-op-inspct')
AND hours-per-week > 36.5
AND 8.5 <
education-num <= 9.5
AND capital-gain <= 5095.5
AND
relationship IN (' Wife',' Husband')
THEN Income = <=50K.
with probability : 86.206894%
[RULE:
21, Income --> <=50K.]
IF
hours-per-week <= 40.5
AND age > 33.5
AND education-num
> 12.5
AND capital-gain <= 4668.5
AND relationship IN
(' Other-relative',' Unmarried',' Not-in-family',' Own-child')
THEN
Income = <=50K. with probability : 86.27002%
[RULE: 5,
Income --> <=50K.]
IF
hours-per-week <= 36.5
AND occupation IN (' Armed-Forces','
Handlers-cleaners',' Transport-moving',' Priv-house-serv','
Sales',' Tech-support',' Exec-managerial',' Adm-clerical','
Craft-repair',' Prof-specialty',' ?',' Protective-serv','
Machine-op-inspct')
AND 8.5 < education-num <= 9.5
AND
capital-gain <= 5095.5
AND relationship IN (' Wife','
Husband')
THEN Income = <=50K. with probability : 86.36364%
[RULE: 4,
Income --> <=50K.]
IF
occupation IN (' Other-service',' Farming-fishing')
AND 8.5 <
education-num <= 9.5
AND capital-gain <= 5095.5
AND
relationship IN (' Wife',' Husband')
THEN Income = <=50K.
with probability : 89.44444%
[RULE: 6,
Income --> <=50K.]
IF age <=
26.5
AND hours-per-week > 36.5
AND occupation IN ('
Armed-Forces',' Handlers-cleaners',' Transport-moving','
Priv-house-serv',' Sales',' Tech-support',' Exec-managerial','
Adm-clerical',' Craft-repair',' Prof-specialty',' ?','
Protective-serv',' Machine-op-inspct')
AND 8.5 <
education-num <= 9.5
AND capital-gain <= 5095.5
AND
relationship IN (' Wife',' Husband')
THEN Income = <=50K.
with probability : 90.10989%
[RULE:
16, Income --> <=50K.]
IF
education IN (' Doctorate',' Some-college')
AND age <=
27.5
AND education-num > 9.5
AND relationship IN ('
Wife',' Husband')
THEN Income = <=50K. with probability :
92.473114%
[RULE: 1,
Income --> <=50K.]
IF
hours-per-week <= 40.5
AND education-num <= 8.5
AND
capital-gain <= 5095.5
AND relationship IN (' Wife','
Husband')
THEN Income = <=50K. with probability : 93.647545%
[RULE:
20, Income --> <=50K.]
IF age <=
33.5
AND education-num > 12.5
AND capital-gain <=
4668.5
AND relationship IN (' Other-relative',' Unmarried','
Not-in-family',' Own-child')
THEN Income = <=50K. with
probability : 94.09091%
[RULE:
19, Income --> <=50K.]
IF
education-num <= 12.5
AND capital-gain <= 4668.5
AND
relationship IN (' Other-relative',' Unmarried',' Not-in-family','
Own-child')
THEN Income = <=50K. with probability : 97.81643%
[RULE:
11, Income --> <=50K.]
IF
native-country IN (' Portugal',' Mexico')
AND age > 35.5
AND
capital-loss <= 1794.0
AND occupation IN (' Armed-Forces','
Handlers-cleaners',' Transport-moving',' Priv-house-serv','
Sales',' Adm-clerical',' Craft-repair',' Prof-specialty',' ?','
Protective-serv',' Machine-op-inspct')
AND hours-per-week >
36.5
AND 8.5 < education-num <= 9.5
AND capital-gain <=
5095.5
AND relationship IN (' Wife',' Husband')
THEN Income =
<=50K. with probability : 100.0%
