10/05/2016

Chapter 4 (4.5-4.8)


Tuning parameters
It may be a good idea to favor simpler models over more complex ones and choosing the tuning parameters based on the numerically optimal value may lead to models that are overly complicated.
The fastest is 10-fold cross-validation. Repeated cross-validation, the bootstrap, and repeated training-test splits fit the same number of models and took about 5-fold more time to finish. LOOCV, which fits as many models as there are samples in the training set, took 86-fold longer and should only be considered when the number of samples is very small.
Data splitting
If the sample size is small: repeated 10-fold cross-validation.
Reasons: The bias and variance properties are good and, given the sample size, the computational cost are not large.
If the goal is to choose between models, a strong case can be made for using one of the bootstrap procedures since they have very low variance.
For large sample sizes, simple 10-fold cross-validation should provide acceptable variance, low bias, and relatively quick to compute.
Choosing between models
MARS: Multivariate adaptive regression splines
SVM: Support Vector Machine model
Sensitivity vs. Specificity


Tomorrow, I will start computing in Chapter 4.

2 comments: