10/17/2016

Chapter 6 (6.1 6.2 6.3)

Today, I read some parts of Chapter 6.

Summary
QSAR: quantitative structure-activity relationship modeling
NIPALS: nonlinear iterative partial least squares algorithm

The objective of ordinary least squares linear regression is to find the plane that minimizes the sum-of-squared errors (SSE) between the observed and predicted response:
On many occasions, relationships among predictors can be complex and involve many predictors. In these cases, manual removal of specific predictors may not be possible and models that can tolerate collinearity may be more useful.


Box-Cox transformation can be applied to the continuous predictors in order to remove skewness.
If the correlation among predictors is high, then the ordinary least squares solution for multiple linear regression will have high variability and will become unstable.
If the number of predictors are greater than the number of observations, ordinary least squares in its usual form will be unable to find a unique set of regression coefficients that minimize the SSE.
Pre-processing predictors via PCA (dimension reduction) prior to performing regression is known as principal component regression (PCR). It has been widely applied in the context of problems with inherently highly correlated predictors or problems with more predictors than observations.
The author recommends using PLS when there are correlated predictors and a linear regression-type solution is desired.
PCA
PLS
While PCA linear combinations are chosen to maximally summarize predictor space variability, the PLS linear combinations of predictors are chosen to maximally summarize covariance with the response.
Prior to performing PLS, the predictors should be centered and scaled, especially if the predictors are on scales of differing magnitude.
PLS has one tuning parameter: the number of components to retain. Resampling techniques can be used to determine the optimal number of components.

Tomorrow, I will continue to read Chapter 6.

2 comments: