Today, I read Chapter 9.
Summary
A
Summary of Solubility Models
With
the exception of poorly performing models, there is a fairly high correlation
between the results derived from resampling and the test set (0.9 for the RMSE
and 0.88 for R2).
There
was a “pack” of models that showed better results, including model trees,
linear regression, penalized linear models, MARS, and neural networks.
The
group of high-performance models include support vector machines (SVMs),
boosted trees, random forests, and Cubist.
There are very few statistically significant differences among high-performance models. Given this, any of these models would be a reasonable choice.
Next week, I will read Chapter 10, which is the last chapter of Part 2 of the book. After finishing Chapter 10, I think it is time to apply them into the real data.
so when you are done with chapter 10... we will restart from the beginning of the book and apply all techniques to the data.
ReplyDeleteinstall new version of IP...discuss with yifu and hao about installing IP .... then we will open the data set
ReplyDeleteOK
Delete