12/02/2016

Review of Codes

Today, I review the codes of the book from Chapter 3 to Chapter 5.

I think that I forget some of the codes when I read the book, so I decide to spend some time reviewing them.

Summary

#Coefficient of determination (R^2): the proportion of the information in the data that is explained by the model.
#It is a measure of correlation, not accuracy. It is dependent on the variation in the outcome. Same RMSE, larger variance, larger R^2.
observed=c(0.22, 0.83, -0.12, 0.89, -0.23, -1.30, -0.15, -1.4, 0.62, 0.99, -0.18, 0.32, 0.34,
           -0.3, 0.04, -0.87, 0.55, -1.3, -1.15, 0.2)
predicted=c(0.24, 0.78, -0.66, 0.53, 0.7, -0.75, -0.41, -0.43, 0.49, 0.79, -1.19, 0.06, 0.75,
           -0.07, 0.43, -0.42, -0.25, -0.64, -1.26, -0.07)
#in practice, the vector of predictions would be produced by the model function
residualvalues=observed-predicted
residualvalues
summary(residualvalues)
axisrange=extendrange(c(observed, predicted))
plot(observed, predicted, ylim=axisrange, xlim=axisrange)
abline(0, 1, col="darkgrey", lty=2)
plot(predicted, residualvalues, ylab = "residual")
abline(h=0, col="darkgrey", lty=2)
library(caret)
R2(predicted, observed)
RMSE(predicted, observed)
cor(predicted, observed)
cor(predicted, observed, method = "spearman")

library(AppliedPredictiveModeling)
data(twoClassData)
str(predictors)
str(classes)
set.seed(1)
library(caret)
trainingrows=createDataPartition(classes, p=0.8, list=FALSE)
head(trainingrows)
trainpredictors=predictors[trainingrows, ]
trainclasses=classes[trainingrows]
testpredictors=predictors[-trainingrows, ]
testclasses=classes[-trainingrows]
trainpredictors
trainclasses
testpredictors
testclasses
#maxdissim
#resampling
set.seed(1)
repeatedsplits=createDataPartition(trainclasses, p=0.8, time=3)
str(repeatedsplits)
#to create indicators for 10-fold cross-validation
set.seed(1)
cvsplits=createFolds(trainclasses, k=10, returnTrain = TRUE)
str(cvsplits)
fold1=cvsplits[[1]]
cvpredictors1=trainpredictors[fold1,]
cvclasses1=trainclasses[fold1]
nrow(trainpredictors)

nrow(cvpredictors1)

#e1071package contains the tune function
#errorest function in the ipred package
#the train function in the caret package
library(caret)
library(AppliedPredictiveModeling)
data("GermanCredit")
set.seed(100)
inTrain=createDataPartition(GermanCredit$Class, p = 0.8)[[1]]
inTrain
GermanCreditTrain=GermanCredit[ inTrain, ]
GermanCreditTest=GermanCredit[-inTrain, ]
head(GermanCreditTrain)
head(GermanCreditTest)
set.seed(1056)
svmfit=train(Class~ .,data=GermanCreditTrain, method="svmRadial", preProc=c("center", "scale"), 
             tuneLength=10, 
             trControl=trainControl(method="repeatedcv", repeats=5, classProbs=TRUE))
svmfit
#first method: classification or regression model
#train control: the resampling method
#tuneLength: an integer denoting the amount of granularity in the tuning parameter grid
plot(svmfit, scales=list(x=list(log=2)))
predictedclasses=predict(svmfit, GermanCreditTest)
str(predictedclasses)
predictedprobs=predict(svmfit, newdata=GermanCreditTest, type="prob")
head(predictedprobs)
#between-model comparisons
#set.seed(1056)
#logisticreg=train(Class~ ., data=GermanCreditTrain, method="glm",
#            trcontrol=trainControl(method="repeatedcv", repeats=5))
#logisticreg
#resamp=resamples(list(SVM=svmfit, Logistic=logisticreg))
#summary(resamp)
#modeldifferences=diff(resamp)
#summary(modeldifference)

#kappa: sensitivity of output to changes or errors of input

Next week, I will continue to read the book.

No comments:

Post a Comment