Tuning parameters
It may be a good idea to favor simpler models over
more complex ones and choosing the tuning parameters based on the numerically
optimal value may lead to models that are overly complicated.
The fastest is 10-fold cross-validation. Repeated
cross-validation, the bootstrap, and repeated training-test splits fit the same
number of models and took about 5-fold more time to finish. LOOCV, which fits
as many models as there are samples in the training set, took 86-fold longer
and should only be considered when the number of samples is very small.
Data splitting
If the sample size is small: repeated 10-fold
cross-validation.
Reasons: The bias and variance properties are good
and, given the sample size, the computational cost are not large.
If the goal is to choose between models, a strong
case can be made for using one of the bootstrap procedures since they have very
low variance.
For large sample sizes, simple 10-fold
cross-validation should provide acceptable variance, low bias, and relatively
quick to compute.
Choosing between models
MARS: Multivariate adaptive regression splines
SVM: Support Vector Machine model
Sensitivity vs. Specificity
Tomorrow, I
will start computing in Chapter 4.
ok
ReplyDeleteok
Delete