Summary
MARS: Multivariate
Adaptive Regression Splines
GCV:
generalized cross-validation
There are
two tuning parameters associated with the MARS model: the degree of the
features that are added to the model and the number of retained terms. The
latter parameter can be automatically determined using the default pruning
procedure (using GCV), set by the user or determined using an external
resampling technique.
There are
several advantages to using MARS. First, the model automatically conducts feature selection; the model equation
is independent of predictor variables that are not involved with any of the
final model features. This point cannot be underrated. Given a large number of
predictors seen in many problem domains, MARS potentially thins the predictor
set using the same algorithm that builds the model. In this way, the feature
selection routine has a direct connection to functional performance. The
second advantage is interpretability.
Each hinge feature is responsible for modeling a specific region in the
predictor space using a (piecewise) linear model. When the MARS model is
additive, the contribution of each predictor can be isolated without the need
to consider the others. This can be used to provide clear interpretations of how each predictor relates to the outcome. For nonadditive
models, the interpretive power of the model is not reduced. Finally,
the MARS model requires very little pre-processing of the data; data
transformations and the filtering of predictors are not needed.
Another
method to help understand the nature of how the predictors affect the model is
to quantify their importance to the model. For MARS, one technique for doing
this is to track the reduction in the root mean squared error (as measured using
the GCV statistic) that occurs when adding a particular feature to the model.
The following figure compares the predictors and we can see the importance of predictors from the figure.
Tomorrow, I will continue to read the book.
No comments:
Post a Comment