Summary
QSAR:
quantitative structure-activity relationship modeling
NIPALS:
nonlinear iterative partial least squares algorithm
The
objective of ordinary least squares linear regression is to find the plane that
minimizes the sum-of-squared errors (SSE) between the observed and predicted
response:
On
many occasions, relationships among predictors can be complex and involve many
predictors. In these cases, manual removal of specific predictors may not be
possible and models that can tolerate collinearity may be more useful.
Box-Cox
transformation can be applied to the continuous predictors in order to remove
skewness.
If
the correlation among predictors is high, then the ordinary least squares
solution for multiple linear regression will have high variability and will
become unstable.
If
the number of predictors are greater than the number of observations, ordinary
least squares in its usual form will be unable to find a unique set of
regression coefficients that minimize the SSE.
Pre-processing
predictors via PCA (dimension reduction) prior to performing regression is
known as principal component regression (PCR). It has been widely applied in
the context of problems with inherently highly correlated predictors or
problems with more predictors than observations.
The
author recommends using PLS when there are correlated predictors and a linear
regression-type solution is desired.
PCA
PLS
While
PCA linear combinations are chosen to maximally summarize predictor space
variability, the PLS linear combinations of predictors are chosen to maximally
summarize covariance with the response.
Prior
to performing PLS, the predictors should be centered and scaled, especially if
the predictors are on scales of differing magnitude.
PLS has one tuning
parameter: the number of components to retain. Resampling techniques can be
used to determine the optimal number of components.Tomorrow, I will continue to read Chapter 6.
try finishing all computing tomorrow.
ReplyDeleteOk, I will try to.
Delete