11/08/2016

Chapter 8 Computing

Today, I did Computing of Chapter 8.

Summary
# the R packages used in this section are caret, Cubist, gbm, ipred, party, partykit,
# randomForest, rpart, RWeka

# 1. Single Trees

# formula method
library(rpart)
rparttree=rpart(y~., data=trainData)
library(party)
ctreetree=ctree(y~., data=trainData)

library(caret)
set.seed(100)
rparttune1=train(solTrainXtrans, solTrainY, method="rpart2", tuneLength=12, trControl=trainControl(method="cv"))
rparttune1
set.seed(100)
rparttune2=train(solTrainXtrans, solTrainY, method="rpart", tuneLength=10, trControl=trainControl(method="cv"))
rparttune2
?rpart.control
?ctree_control
plot(rparttune1)
plot(rparttune2)

library(partykit)
# convert the rpart object to a party object
rparttree2=as.party(rparttree)
plot(rparttree2)

# 2. Model Trees
library(RWeka)

# formula method
m5tree=M5P(y~., data=trainData)
m5rules=M5Rules(y~., data=trainData)

set.seed(100)
m5tune=train(solTrainXtrans, solTrainY, method="M5", trControl = trainControl(method = "cv"),
# M=10 is the minimum number of samples needed to further splits the data to be 10
             control= Weka_control(M=10))
plot(m5tune)

# 3. Bagged Trees
library(ipred)
# bagging uses the formula interface and ipredbagg has the non-formula interface
baggedtree=ipredbagg(solTrainY, solTrainXtrans)
baggedtree=bagging(y~., data = trainData)
baggedtree2=as.party(baggedtree)

# mtry is equal to the number of predictors
bagctr1=cforest_control(mtry=ncol(trainData)-1)
baggedtree=cforest(y~., data=trainData, controls = bagCtr1)

# 4. Random Forest
library(randomForest)
rfmodel=randomForest(solTrainXtrans, solTrainY)
rfmodel=randomForest(y~., data=trainData)
plot(rfmodel)
# the default for mtry in regression is the number of prediction divided by 3
rfmodel2=randomForest(solTrainXtrans, solTrainY, importance = TRUE, ntrees=1000)
plot(rfmodel2)
importance(rfmodel2)


Tomorrow, I will continue to do computing of Chapter 8.

No comments:

Post a Comment