Today, I finished exercises in Chapter 4.
4.3
library(AppliedPredictiveModeling)
data(ChemicalManufacturingProcess)
#objective: find the number of PLS components that yields the optimal R2 value
#7 provides the most paprsimonious model using "one-standard error" method
exercise4.3
#2 is the best if tolerance should be less than 10%
colnames(exercise4.3)=c("Components", "Mean", "Tolerance, %", "Std.Error")
exercise4.3
names(exercise4.3)[names(exercise4.3)=="Tolerance, %"]="Tolerance"
exercise4.3
names(exercise4.3)[names(exercise4.3)=="Tolerance"]="Tolerance, %"
exercise4.3
#select Random Forests model that optimizes R2
#select SVM model with combined consideration of R2, prediction time and model complexity
4.4
library(caret)
data(oil)
oilType
table(oilType)
sam1=sample(oilType, 60, replace = F)
sam1
table(sam1)
set.seed(1)
sam2=createDataPartition(oilType, p=0.6, list=FALSE)
sam2
sam3=oilType[sam2]
sam3
table(sam3)
set.seed(1028)
sam2=createDataPartition(oilType, p=0.6, list=FALSE)
sam3=oilType[sam2]
table(sam3)
#obtain a confidence interval for the overall accuracy
binom.test(16, 20)
binom.test(15, 20)
#try different sample sizes and accuracy rates to understand the trade-off
#between the uncertainty in the results, the model performance and the test set size
#p-value: how extreme the observation is
#confidence interval: the probability of success in the interval
Tomorrow, I will start reading Chapter 5.
OK
ReplyDeleteOK
Delete