9/26/2016

Applied Predictive Modeling 2&3

Today, I read the book “Applied Predictive Modeling”. The reading part is Chapter 2 and some of 3, knowing the Data- preprocessing knowledge.

Summary
1.4 data sets
Part 1 General Strategies
An alternative approach for quantifying how well the model operates is to use resampling, instead of simply using the same data used to build the model.
Resampling techniques are discussed in Chapter 4.
Feature selection is discussed in Chapter 19.
Multivariate Adaptive Regression Spline (MARS) model and Quadratic Regression model can appropriate for the prediction of one-predictor issue.
Data Splitting, Predictor Data, Estimating Performance, Evaluating Several Models, Model Selection
The feature engineering methods of data pre-processing depends on the model being used and the true relationship with the outcome.

Data Transformations of Individual Predictors
Centering and Scaling
Center: predictors have a zero mean
Scale: predictors are divided by standard deviation
The only real downside to these transformation is a loss of interpretability of the individual values since the data are no longer in the original units.
Resolve Skewness
A right-skewed distribution has a large number of points on the left side of the distribution (smaller values) than on the right side (larger values).
The formula for the sample skewness statistic is
skewness=(∑▒〖(x_i-x ̅)〗^3 )/((n-1)ν^(3/2) ) where ν=(∑▒〖(x_i-x ̅)〗^2 )/((n-1))
empirical transformation: x^*={█((x^λ-1)/λ,&if λ≠0@log⁡(x),&if λ=0)┤
use maximum likelihood estimation (details are in the paper ‘an analysis of transformations’) to determine the transformation parameter λ

Data Transformations for Multiple Predictors
Tree-based classification models create splits of the training data so the outlier does not usually have an exceptional influence on the model.
Spatial Sign data transformation:
x_ij^*=x_ij/√(∑_(j=1)^P▒x_ij^2 )
It is important to center and scale the predictor data prior to using this transformation.

Principal component analysis (PCA) is a commonly used data reduction technique.
〖PC〗_j=(α_j1×Predictor 1)+(α_j2×Predictor 2)+⋯+(α_jP×Predictor P)

Tomorrow, I will read more of the book.

No comments:

Post a Comment