11/02/2016

Chapter 8 (8.1 8.2)

Today, I read Chapter 8 (8.1 8.2).

Summary
Once the tree has been finalized, we begin to assess the relative importance of the predictors to the outcome.
If SSE is the optimization criteria, then the reduction in the SSE for the training set is aggregated for each predictor.
The model tends to rely more on continuous predictors than the binary ones.
Unbiased Regression Tree Techniques:
GUIDE: generalized, unbiased, interaction detection and estimation algorithm (decouple the process of selecting the split variable and the split value)
Conditional Inference Trees: statistical hypothesis tests are used to do an exhaustive search across the predictors and their possible split points.
For a candidate split, a statistical test is used to evaluate the difference between the means of the two groups created by the split and a p-value can be computed for the test.

Regression Model Trees
M5
The splitting criterion is different.
The terminal nodes predict the outcome using a linear model (as opposed to the single average).
When a sample is predicted, it is often a combination of the predictions from different models along the same path through the tree.
The main implementation of this technique is a “rational reconstruction” of this model call M5.
Split criterion:
reduction=SD(S)-∑_(i=1)^P▒〖n_i/n×SD(S_i)〗

n_i is the number of samples in partition i.
The split that is associated with the largest reduction in error is chosen and a linear model is created within the partitions using the split variable in the model.
Once the complete set of linear models have been created, each undergoes a simplification procedure to potentially drop some of the terms.
Adjusted Error Rate=n^*+pn*-pi=1n*yi-yi

n^* is the number of training set data points that were used to build the model and p is the number of parameters.
Model trees also incorporate a type of smoothing to decrease the potential for over-fitting. The technique is based on the “recursive shrinking” methodology.
The two predictions are combined using
y_((p) )=(n_((k)) y ̂_((k) )+cy ̂_((p) ))/(n_((k))+c)
(the equation on page 185 may be wrong)

y ̂_((k) ) is the prediction from the child node, n_((k)) is the number of training set data points in the child node, y ̂_((p) ) is the prediction from the parent node, and c is a constant with a default value of 15.
Pruning & Smoothing
Smoothing the models has the effect of minimizing the collinearity issues.

Tomorrow, I will continue to read Chapter 8.

No comments:

Post a Comment