Index

A

Actual by Predicted Plot report 89, 95, 162, 168–169, 171–172, 197, 213

actual values, compared with predicted values 96–99, 170–172, 179–180

AGPT, transforming 148–150

analysis, performing 79–81, 156–158, 175–176

Andersson, M. 222

applied statistics 1

Arlot, S. 67

assessing models using test set 133–136

B

Baking Bread That People Like example

about 183–184

combined model 215–219

data 184–186

first stage model 187–202

second stage model 202–215

Belsley, D.A. 22

Bias-Variance Tradeoff in PLS

about 250, 261

examples 250–253

motivation 254–255

results and discussion 257–261

simulation study 255–257

BigClassCVDemo.jmp 67–68

bivariate distributions 205, 207

Blue Ridge Ecoregion, PLS models for 173–180

Bradykinin potentiating activity 76, 103

Bread.jmp 185–186, 203, 215–219

C

Cars example 11–15

CarsSmall.jmp 11–15

Cauchy-Schwarz Inequality 233

Celisse, A. 67

centering

data 3–4

example of 28–31

in PLS 37–38

Chackrapani, C. 184

Chaloud, D.J. 140, 142, 155, 167, 179, 181, 182

Chatfield, C. 2

Chong, I.G. 267

Coefficient Plots 131–132, 181, 212

choosing number of factors 65–71

column vector 12–15

combined model, for Baking Bread That People Like example 215–219

Compare_NIPALS_VIP_and_VIPStar.jsl script 264

ComparePLS1andPLS2.jsl script 262–263

comparing

actual values to predicted values 96–99, 170–172, 179–180

residuals 99–101

stage one models for Baking Bread That People Like example 200–202

variable selection error rates 273–280

VIP values 270–273

confidence ellipses,

Scatterplot Matrix 188

X Score Scatterplot Matrix 87

correlation

effect of among predictors 18–23

for Ys and Xs in PLS 38

with factors, loadings as measures of 235–236, 243

structure of Xs 264–266

Correlations report 189

covariance 32, 232–233

creating

formula 148

plots of individual spectra 111–112

stacked data tables 109–111

subsets 173–174

test set indicator columns 107–108

cross validation 18, 65–66, 66–67, 246–248

See also k-fold cross validation

See also leave-one-out cross validation

D

data

Baking Bread That People Like example 184–186

centering 3–4

contextual nature of 2

diversity of 2

imputing missing 146–147

initial visualization 77–79

performance on 96–104

Predicting Biological Activity example 76–79

Predicting Octane Rating of Gasoline example 106–108

reviewing 174–175

scaling 3–4

transforming 3–4

viewing 108–116

Water Quality in Savannah River Basin example 140–141

data filter 46–48, 57, 113–116, 124–125, 257

de Jong, S. 72, 237

deflation 225

design matrix 14

diagnostics

Predicting Biological Activity example 95–96

Predicting Octane Rating of Gasoline example 121–125

pruned PLS model for Savannah River basin 168–169

Diagnostics Plots 88–89, 122 , 160–162, 193–194, 210–211

differences, by ecoregion 150–155

Dijkstra, T.K. 71

dimensionality reduction 34–36

DimensionalityReduction.jsl script 31–34, 34–36

dimensions, of matrices 15

Distance Plots 96, 123

distances, to X and Y models 242

distributions, Water Quality in Savannah River Basin example 147–148

Draper, N.R. 15

E

eigenvalues 26–27, 222–224, 233

eigenvectors 27, 223–224

Eriksson, L. 4, 28, 89, 96, 163

examples

See also specific examples

Bias-Variance Tradeoff in PLS 250–253

Cars 11–15

centering 28–31

of PLS analysis 4–10

prediction 59–64

scaling 28–31

scores and loadings 54–55

excluding test set 116–117

expected value 19

exploratory data analysis, in multivariate studies 31–34

extracting factors 50–51

F

factor loadings 52, 224

factors

choosing number of 65–71

determining number of, using cross validation 246–248

extracting 50–51

3-D scatterplots for one 46

"fat matrices" 106

Fidell, L.S. 48

fingerprint 42

first principal component 26, 36

Fit Model Launch Window 5–6

Fit Model Platform 15

Fitting PLS models 117–118, 120, 136–137, 190–195, 207–208

formula editor window 217

Friedman, J. 23, 39, 54, 250

G

Gasoline.jmp 106–116

generative Gaussian process 252

H

Hastie, T. 23, 39, 54, 250

Hellberg, S. 76, 77

Heuver, G. 76, 77, 79, 103

holdout cross validation

method 66

set 247

Hoskuldsson, A. 232

I

identity matrix 16

imputing missing data 146–147

initial data visualization

Baking Bread That People Like example 77–79

Water Quality in Savannah River Basin example 144–145

initial reports 118–120

inner relation regression 52

inputs 1

inverse of a square matrix 16

J

JMP customizations for SIMPLS 241

JMP Pro

Fit Model launch 5–6

KFold validation 7

Validation Methods 69–70

Johansson, E. 4, 28, 89, 96, 163

Jun, C.H. 267

K

Kalivas, J.H. 106

Kettaneh-Wold, N. 4, 28, 89, 96, 163

k-fold cross validation 66, 67–68

KFold Cross Validation report 119, 167–168, 208, 209

Kourti, T. 126

L

leave-one-out cross validation 66–67, 190–192, 208–209

Leave-One-Out report 190–192, 208–209

left singular vectors 222

linked subset 174

loading matrix 52, 83–85

loading plots 27, 85–86, 127–128

loadings

as measures of correlation with factors 235–236, 243

PLS 50–59

properties of 232

LoWarp.jmp 28–31

lurking variables 103

M

MacGregor, J.F. 126

Make Model Using VIP 92, 130, 136, 195

Make Model Using Selection 92, 93, 130, 164, 166, 178, 212

Mason, R.L. 126

Mateos-Aparicio, G. 71

matrices

dimensions of 15

"fat matrices" 106

identity 16

loading 52, 83–85

scatterplot 83–87, 125–127

singular value decomposition of 222–223

matrix algebra 222

maximization of covariance 232–233

mean squared prediction error (MSPE) 257–261

Microsoft Research 2

missing response values, Water Quality in Savannah River Basin example 145–146

Missing Value Imputation report 147, 159–160

MLR

See multiple linear regression (MLR)

Model Coefficients report 193, 210

Model Comparison Summary report 119, 158–159, 191–192, 208, 209

model fitting

for Baking Bread That People Like example 195–197, 197–199, 212–215

PLS model for Blue Ridge Ecoregion 178–180

pruned PLS model for Savannah River Basin example 166–168

modeling 1–2

models

assessing using test set 133–136

fitting 117–118, 120, 136–137, 207–208

in terms of X scores 52, 228–229, 241

testing 9–10

for X and Y 52–53, 228–229, 241

MSPE (mean squared prediction error) 257–261

multicollinearity 18–23

Multicollinearity.jsl script 18–23, 263–264

multiple linear regression (MLR)

Cars example 11–15

effect of correlation among predictors 18–23

estimating coefficients 15–16

overfitting 16–18

underfitting 16–18

multivariate studies, exploratory data analysis in 31–34

multivariate technique, PLS as a 38–39

N

Nash, M.S. 140, 142, 155, 167, 179, 181, 182

NIPALS algorithm

about 71–72, 226–228

computational results 228–231

extracting factors 50

models in terms of X scores 52

models in terms of Xs 53

notation 225–226

one-factor model 60–63

properties of 231–237

two-factor model 63–64

NIPALS Fit report 159–160, 176–178

NIPALS Fit with 1 Factors report 158–163, 191–192, 193, 196

NIPALS Fit with 2 Factors report 208, 209, 247–248

NIPALS Fit with 3 Factors report 7–8, 120

noise 14

Nomikos, P. 126

nonlinear iterative partial least squares algorithm

See NIPALS algorithm

notation

for NIPALS algorithm 225–226

for SIMPLS algorithm 238–240

number of factors 246–248

O

O'Mahony, M. 184

opening formula editor window 217

optimization criterion, SIMPLS algorithm 237

outputs 1

overfitting 16–18

P

parameters 15

partial least squares (PLS)

See also variable selection

about 1–2

algorithms 224–225

analysis example 4–10

as a multivariate technique 38–39

centering in 37–38

compared with PCA 49–50

how it works 45–49

loadings 50–59

models 155–181, 173–180

models for Blue Ridge Ecoregion 173–180

models for predicting biological activity 79–96

models for predicting octane ratings of gasoline 116–138

report 44, 81–82, 158–159, 191–192, 208–212

reasons for using 39–45

scaling in 37–38

scores 50–59

overview 72–73

in today's world 2–3

variable reduction in 89–90

Partial Least Squares Model Launch window 7

Partial Least Squares report 158–159, 191–192, 208–212

PCA

See principal components analysis (PCA)

PCA platform 27

PCR (Principal Components Regression) 39, 223–224

Penta.jmp 76–77

Percent Variation Explained for X Effects 230, 242

Percent Variation Explained for Y Responses 230, 242

Percent Variation Explained report 121, 137, 192, 209

Pérez-Enciso, M. 89

performing analysis 79–81, 96–104. 156–158, 175–176

plots

construction for individual spectra 111–112

diagnostics 88–89, 122

loading 27, 85–86, 127–128

variable importance 90–93

PLS

See partial least squares (PLS)

PLS platform 69–71

PLS procedure 77

PLS Report 81–82

PLS1 models 222

PLS2 models 222

PLSGeometry.jsl script 45–49

PLS_PCA.jsl script 49–50

PLSScoresAndLoadings.jmp 54–55

PLSvsTrueModel.jmp 59–60

PolyRegr.jsl script 16–18

PolyRegr2.jsl script 250–253

Predicted Residual Sums of Squares (PRESS) statistic 246–248

predicted values, compared with actual values 96–99, 170–172, 179–180

Predicting Biological Activity example

about 75–76

data 76–79

first PLS model 79–93

performance on data from second study 96–104

pruned PLS model 93–96

Predicting Octane Rating of Gasoline example

about 106

data 106–108

first PLS model 116–120

pruned model 136–138

second PLS model 120–136

viewing data 108–116

prediction

example using simulation 59–64

formulas, saving 8–9, 60–64, 169–170

Prediction Profiler 201–202, 214–215, 218–219

predictors, effect of correlation among 18–23, 59–64

PRESS (Predicted Residual Sums of Squares) statistic 246–248

principal components 224

principal components analysis (PCA)

about 25–27, 223–224

compared with PLS 49–50

dimensionality reduction via 34–36

Principal Components Regression (PCR) 39, 223–224

Profiler

comparing via the 201–202

viewing 213–215, 218–219

projection method 48

projection to latent structures 48

properties

of loadings 232

of NIPALS algorithm 231–237

of SIMPLS algorithm 237–238

of scores 232

shared by NIPALS and SIMPLS 53–54

R

regression

inner relation in PLS 52

stepwise 263

regression coefficients 15, 130, 234–235

regression parameters 12

regularization techniques 23

reports

Actual by Predicted Plot 89, 95, 162, 168–169, 171–172, 197, 213

Coefficient Plots 131–132, 181, 212

Diagnostics Plots 88–89, 122, 160–162, 193–194, 210–211

Distance Plots 96, 123

initial 118–120

KFold Cross Validation 119, 167–168, 208, 209

Leave-One-Out 190–192, 208–209

Loading Plots 83–86, 127–128

Missing Value Imputation 147, 159–160

Model Coefficients 193, 210

Model Comparison Summary 119, 158–159, 191–192, 208, 209

NIPALS Fit with 1 Factors 158–163, 191–192, 193, 196

NIPALS Fit with 2 Factors 208, 209, 247–248

NIPALS Fit with 3 Factors 7–8, 120

Partial Least Squares (PLS) 44, 81–82, 158–159, 191–192, 208–212

Percent Variation Explained 121, 137, 192, 209

Profiler 201–202, 213–215, 218–219

Residual by Predicted Plot 89, 95, 122, 168, 197

Score Scatterplot Matrices 86–87, 125–127

SIMPLS Fit with 2 Factors 82–83

Stepwise Regression Control 198–199

T Square Plot 123, 160–161

Variable Importance Plot 44–45, 90–91, 129, 131–132, 165

VIP vs Coefficients Plots 91–93, 130–132, 136, 163–166, 177–178, 194–195, 212

X-Y Scores Plots 82–83, 120–121, 159, 192, 196, 209

Residual by Predicted Plot report 89, 95, 122, 168, 197

residuals

about 14–15, 34

comparing 99–101

right singular vectors 223

RMSE (Root Mean Square Error) 17–18, 67–68

Root Mean PRESS (Predicted Residual Sum of Squares) statistic 69, 119–120, 167, 192, 209, 246–249

Root Mean Square Error (RMSE) 17–18, 67–68

Rose, David 184

S

SAS/STAT 9.3 User's Guide 77, 248

saving prediction formulas 8–9, 96, 133, 169–170, 248

scaling

data 3–4

example of 28–31

in PLS 37–38

scatterplot matrices

loading matrix 83–86

scoring 86–87, 125–127

score vectors 224

scores

PLS (partial least squares) 50–59

properties of 232

Score Scatterplot Matrices report 86–87, 125–127

second principal component 26

Sensory Evaluation of Food: Statistical Methods and Procedures (O'Mahony) 184

SIMPLS algorithm

about 71–72, 237, 240–246

extracting factors 50

fits 64

implications for 237–238

models in terms of Xs 53

notation 238–240

optimization criterion 237

SIMPLS Fit report 82–83

simulation studies

about 249–250

Bias-Variance Tradeoff in PLS 250–261

overfitting 16–18

underfitting 16–18

using PLS for variable selection 263–280

Utility Script to Compare PLS2 and PLS2 261–263

singular value decomposition of a matrix 222–223

singular values 222–223

Sjöström, M. 76, 77

Smith, H. 15

Solubility.jmp 25–27, 34

Spearheads.jmp 4–5, 66

spectra

combined 113–116

constructing plots of individual 111–112

individual 112–113

spectral decomposition, relationship to singular value decomposition 223

SpectralData.jsl script 40–45

SS(YModel) 242

stacked data tables, creating 109–111

stage one MLR model, for Baking Bread That People Like example 197–200

stage one pruned model, for Baking Bread That People Like example 195–197

stage two MLR model, for Baking Bread That People Like example 212–215

stage two PLS model, for Baking Bread That People Like example 207–208

Standardize X option 246

statistical models 1–2

Statistically Inspired Modification of the PLS Method

See SIMPLS algorithm

Statistics in Market Research (Chackrapani) 184

stepwise regression 189, 263

Stepwise Regression Control report 198–199

Stratified sample, creating 155–156

subsets, creating 173–174

sum of squares

for contribution of factor f to X model 230

for factor f to Y model 229

for X 242

for Y 242

T

T Square Plot 123 , 160–161

Tabachnick, B.G. 48

Tenenhaus, M. 89

test set

about 5

assessing models using 133–136

creating indicator columns 107–108

creating stratified sample 155–156

excluding 116–117

testing models 9–10

3-D scatterplots, for one factor 46

Tibshirani, R. 23, 39, 54, 250

Tobias, R.D. 38

Tracy, N.D. 126

training set 5, 65

transforming

creating a column formula 148–149

through a launch window 148–150

weights 236–237

transforming data 3–4

transpose 16

Trygg, J. 4, 28, 89, 96, 163

U

Ufkes, J.G.R. 76, 77, 79, 103

underfitting 16–18

univariate distributions 204

Utility Script to Compare PLS2 and PLS2 261–263

V

validation

k-fold cross validation 66, 67–68, 119, 167–168, 208, 209

leave-one-out cross validation 66–67, 190–192, 208–209

in PLS platform 69–71, 246–249

validation set 65

van den Wollenberg, A.L. 39

van der Meer, C. 76, 77, 79, 103

van der Voet, H. 69, 119

van der Voet tests 69–70, 137, 167–168, 248

Variable Importance for the Projection (VIP) statistic

See VIP (Variable Importance for the Projection) statistic

Variable Importance Plot report 44–45, 90–91, 129, 131–132, 165

variable selection

about 64, 189, 263–264, 280

comparing error rates in simulation study 273–280

computation of result measures for simulation study 268–270

results of simulation study 270–280

simulation 267–268

structure of simulation study 264–267

variables

comparing selection error rates 273–280

lurking 103

reduction in PLS 89–90

relationships between 187–188

visualizing two at a time 152–154

variance, bias toward X directions with high variance 234

viewing

data 108–116

Profiler 201–202, 213–215, 218–219

VIPs for spectral data 131

VIP (Variable Importance for the Projection) statistic

about 129–133

comparing values 270–273

for ith predictor 230–231, 243–244

variable reduction in PLS 89

viewing for spectral data 131

VIP vs Coefficients Plots report 91–93, 130–132, 136, 163–166, 177–178, 194–195, 212

VIP* 231, 244–245, 268–273, 280

Visser, B.J. 76, 77, 79, 103

visualizing

data 77–79

two variables at a time 152–154

Ys and Xs 202–207

W

Water Quality in Savannah River Basin example

about 140–141

data 141–155

defined 140

first PLS model 155–166

pruned PLS model 166–172

WaterQuality2.jmp 155–156

WaterQuality2_Train.jmp 156–158

WaterQuality_BlueRidge.jmp 174–175

WaterQuality.jmp 140–141

WaterQuality_PRESSCalc.jmp 247–248

weights, transforming 236–237

Wikstrom, C. 28

Wold, H. 71

Wold, S. 4, 28, 71, 76, 77, 89, 96, 129, 163, 231

Wynne, H.J. 76, 77, 103

X

X

Active, in simulation 267

correlation structure of, in simulation 264–266

models for 52–53, 241

models in terms of scores 50–52

properties of weights 232

sums of squares for 242

X-Y Scores Plots report 82–83, 120–121, 159, 192, 196, 209

Y

Young, J.C. 126

Y

models for 52–53, 241

sums of squares for 242

Symbols

* matrix multiplication 12–13

β column vector of regression parameters 12–13, 15–16

ε column vector of errors 12–13, 15–16

Σ correlation matrix 38–39

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.247.68