A
alpha (α) 194–196
Alternative Hypothesis 175
alternative models, evaluating 315–319
analysis
See also specific types
capability 398–400
with Fit Y by X 100–101
with multivariate platform 98–99
trend 350–352
analysis of variance (ANOVA)
about 229
applications 245–248
assumptions about 229–230
conducting 287–291
interpreting regression results 257
one-way 230–237
satisfaction of conditions 237–238
two-way 238–245
analysis platform, using 11–14
analytics frameworks, applying 89–91
ANOVA
See analysis of variance (ANOVA)
applications
analysis of variance (ANOVA) 245–248
chi-square tests 213–215
data 37–38, 56–59
discrete distributions 118–121
experimental design 379–384
forecasting techniques 355–358
inference 184–187, 201–204, 226–228
linear regression analysis 261–265
multiple regression 319–322
normal model 138–141
probability 118–121
quality improvement 402–405
regression analysis 338–340
residuals analysis 280–283
residuals estimation 280–283
sampling and sampling distributions 159–162
variables 82–85
applying analytics frameworks 89–91
ARIMA (AutoRegressive Integrated Moving Average) models 352–355
assigning probability values 108
assumptions
about analysis of variance (ANOVA) 229–230
evaluating 240–241
asterisk (*) 178–179
autocorrelation 342–343
AutoRegressive Integrated Moving Average (ARIMA) models 352–355
autoregressive models 352–355
axes, customizing in histograms 50–52
B
bars, customizing in histograms 50–52
beta (β) 194–196
“Big Data” 28
binomial distribution 113–115
bivariate data 61–62
bivariate inference
about 285
life expectancy by GDP per capita 291–293
life expectancy by income group 287–291
research context 285
blocking 369–371
blocks 361–362
bootstrap confidence intervals 181–182
bootstrapping 197
Box, George 352, 360–361
box plot 46
box-and-whiskers plot 46
bubble plots 78–81
“by-hand” method 177
C
capability analysis 398–400
cases 24
casewise data 180
categorical 3
categorical regression models 323
See also regression analysis
categorical variables
distributions of 41–45
inference for 173–187
inference for two 208
one continuous variable and one 71–74
sample observations of 168–169
two 63–71
center of distributions 46, 47
Central Limit Theorem (CLT) 154–156, 218–224
central tendency, of distributions 46, 47
characterization 361
chart variability 395–397
checking data for suitability of normal model 133–137
chi-square distribution 177, 205
chi-square tests
about 205
applications 213–215
contingency tables 209–210
goodness-of-fit test 205–208
of independence 211–213
inference for two categorical variables 208
Classical method, of assigning probabilities 108
CLT (Central Limit Theorem) 154–156, 219, 231
clustering 157–159
coefficients 257
collinearity
about 309–315
dealing with 314–315
example 309–314
column properties 7
Column Switcher 97
columns, of data tables 2
comma (,), with Normal Distribution function 130
comparing
two means with JMP 217–224
two variances with JMP 224–226
complement of an event 107
complex sampling 157–159
conditional probability 107–108
conditional values 210
conducting
analysis of variance (ANOVA) 287–291
significance testing with JMP 174–179, 190–196
confidence band 278
confidence intervals
about 179
bootstrap 181–182
estimating 173
for parameters 276–277
for Y|X 278–279
confidence limits 55
constant variance 273
contingency tables
about 209–210
displaying covariation in categorical variables 68–71
probability and 109–111
continuous columns
continuous data
fitting lines to bivariate 249–253
probability and 123
using Distribution platform for 46–52
continuous variables
inference for single 189–204
one categorical variable and 71–74
sample observations of 169–171
two 74–81
two-sample inference for 217–228
control charts
about 386–395
for individual observations 387–389
for means 389–392
for proportions 392–395
control limits 387, 390
correlation 77–78
covariation
one continuous, one categorical variable 71–74
two categorical variables 63–71
two continuous variables 74–81
creating
data tables 5–8, 36
pseudo-random normal data 137–138
cross-section 24
cross-sectional data 90
cross-sectional sampling 28–29
crosstabulation 68–71
CTRL key 134
cumulative probabilities 115, 129–132
curvature 270
curvilinear regression models 323
See also regression analysis
curvilinear relationships 330–337
customizing histograms 50–52
cycle pattern 341
D
data
See also continuous data
applications 37–38, 56–59
bivariate 61–62
casewise 180
checking for suitability of normal model 133–137
cross-sectional 90
experimental 29–32
longitudinal 90
matched pairs of 199–200
observational 33, 90
panel 25
populations 23–25
processes 23–25
raw case data 36–37
representativeness 25–28
samples and sampling 23–29
study design 29–36
summary 36–37, 182–183
survey 33–36
time-series 90
types of 2–3, 90–91
data analysis
goals of 1–2
role of probability in 105–106
data dictionary 33
Data Filter tool 42–43
Data Grid area 7
data management
See data
data sources
See data
data tables
about 2
creating 5–8, 36
degrees of freedom (DF) 207
density functions 124–126, 163–164
description 1–2
descriptive statistics
about 87
analysis with Fit Y by X 100–101
analysis with multivariate platform 98–99
applying analytics frameworks 89–91
data source and structure 90
exploring relationship with Graph Builder 95–98
interpretation 101
observational units 90
preparation for analysis 92
questions for analysis 88–89
univariate descriptions 92–94
variables and data types 90–91
visualizing multiple relationships 101–103
World Development Indicators (WDI) 87–88
detecting patterns 341–344
DF (degrees of freedom) 207
dichotomous dependent variables 327–330
dichotomous independent variables 323–327
disclosure button 8
discrete distributions
about 105
applications 118–121
as models of real processes 117–118
discrete random variables
about 111
three common 112–115
dispersion, of distributions 46, 47
Distribution command 166
Distribution platform, for continuous data 46–52
“distribution-free” methods 213
distributions
See also discrete distributions
binomial 113–115
of categorical variables 41–45
center of 46, 47
central tendency of 46, 47
chi-square 177, 205
dispersion of 46, 47
Hypergeometric 171
integer 112–113
non-normal 222–224
normal 164–165, 218–224
Poisson 115
probability 111, 163–164
of quantitative variables 45–46
theoretical discrete 111
of variables 40–41
dummy variables 323–327
Dunnett’s method 235–236
E
effect likelihood ratio tests 330
equal variances, compared with unequal variances 221–222
error 183
estimating
confidence intervals 173
population means with JMP 197–199
population proportions with JMP 179–183
evaluating
alternative models 315–319
assumptions 240–241
events
probability of 107
rules for two 107–108
excluded rows 14
expected frequency 208
experimental data 29–32
experimental design
about 359
applications 379–384
blocks and blocking 361–362, 369–371
factorial designs 362–369
factors 361–362
fractional designs 371–375
goals of 360–361
multi-factor experiments 362–369
randomization 361–362
reasons for experimenting 360
response surface designs 375–379
experimental runs 361
exporting JMP results to word-processor documents 16–20
extraordinary sampling variability 167–171
F
factor profiles 242
factorial analysis 234–237
factorial designs 362–369
factors 361–362
Fit Model platform, residuals analysis in 304–306
Fit Y by X, analysis with 100–101
fitted line 77
fitting 11
five-number summary 54
fly ash 360
forecasting techniques
about 341
applications 355–358
autoregressive models 352–355
detecting patterns 341–344
smoothing methods 344–350
trend analysis 350–352
fractional designs 371–375
frequency of values 46
full factorial experimental design 364–369
G
Gaussian density function 126
generalization, simulation to 151–152
goals
of data analysis 1–2
of experimental design 360–361
golden mean 258
goodness-of-fit test 205–208
Gosset, William 199
Grabber 50–51
Graph Builder
about 8–11
exploring categorical data with 44–45
exploring data with 76
exploring relationships with 95–98
using 52–53
graphs, linked 50
H
Hand tool 50–51
Haydn, Franz Joseph 258
Help tool 240
heterogeneity of variance 273
heteroscedasticity 273, 304–306
hidden rows 14
histograms 46, 50–52
Holt, Charles 348
Holt’s Method 348–349
homogeneity 225
homogeneity of variance 273
homoscedasticity 273
Hypergeometric distribution 171
hypothesis testing 173
I
IIP (Index of Industrial Production) 342
independence
about 274–276
chi-square tests of 211–213
independent events 108
Index of Industrial Production (IIP) 342
indicator variables 323–327
individual observations, charts for 387–389
inference
See also bivariate inference; linear regression analysis; univariate inference
about 2, 189, 217
applications 184–187, 201–204, 226–228
comparing two means with JMP 217–224
comparing two variances with JMP 224–226
conditional status of statistical 174
conditions for 189–190, 217
conducting significance testing with 174–179
conducting significance testing with JMP 190–196
confidence interval estimation 173, 179
estimating population means with JMP 197–199
estimating population proportions with JMP 179–183
matched pairs 199–200
satisfying conditions 197
for single categorical variable 173–187
for single continuous variable 189–204
for two categorical variables 208
two-sample 217–228
influential observations 270–272
integer distribution 112–113
interaction effect 239, 241–245
interpretation 101
interpreting regression results 256–261
interquartile range (IQR) 55
inverse cumulative problems, solving 132–133
IQR (interquartile range) 55
irregular pattern 341
J
Jenkins, Gwilym 352
“jitters” 9
JMP
See also specific topics
comparing two means with 217–224
comparing two variances with 224–226
conducting significance testing with 174–179, 190–196
estimating population means with 197–199
estimating population proportions with 179–183
exporting results to word-processor documents 16–20
leaving 21
selecting simple random samples with 145–148
simulating random variation with 116–117
starting 3–4
JMP Scripting Language (JSL) 148
joint probability 107
joint relative frequency 210
joint-frequency table 68–71
JSL (JMP Scripting Language) 148
K
Kruskal-Wallis Test 224
L
label property 7
labeled rows 14
Lack of Fit 257
least squares estimation, conditions for 267–268
leaving JMP 21
linear exponential smoothing (Holt’s Method) 348–349
linear regression analysis
about 249
applications 261–265
assumptions of 255–256
fitting lines to bivariate continuous data 249–253
interpreting regression results 256–261
simple regression model 253–255
linearity 254–255, 269–270
linked graphs/tables 50
logarithmic growth 291
logarithmic models 334–337
longitudinal data 90
longitudinal sampling 28–29
lower fences 55
M
Mann-Whitney U Test 224
margin of error 180
matched pairs 199–200
MDGs (Millennium Development Goals) 88
means
comparing two with JMP 217–224
control charts for 389–392
metadata 5
Millennium Development Goals (MDGs) 88
millimeters of mercury (mmHg) 141
missing data 62, 67
mmHg (millimeters of mercury) 141
model specification 323
modeling types 2–3
modifying analysis 67
Mozart, Wolfgang Amadeus 258
multicollinearity 309
multi-factor experiments 362–369
multiple regression
about 295
applications 319–322
collinearity 309–314
evaluating alternative models 315–319
fitting a model 298–302
model 295–296, 302–304
residuals analysis in Fit Model platform 304–306
visualizing 296–298
multivariate platform, analysis with 98–99
mutually exclusive events 107
N
National Health and Nutrition Examination Survey (NHANES) 33
nominal columns 3
non-linear regression models 323
See also regression analysis
non-linear relationships 330–337
non-normal distributions, comparing two means with JMP 222–224
nonparametric equivalent test 237–238
nonparametric methods 213
non-parametric test 197
non-random sampling 28
normal density function 126
normal distributions 164–165, 218–224
normal model
about 123, 127
applications 138–141
checking data for suitability of 133–137
continuous data and probability 123
density functions 124–126
generating pseudo-random normal data 137–138
normal calculations 128–133
Normal Probability Plot (NPP) 133–137
Normal Quantile function 132
Normal Quantile Plots 133–137
normality 272–273, 294
NPP (Normal Probability Plot) 133–137
null hypothesis 175–176
O
observational data 33, 90
observational units 24, 90
observations 2
one-way analysis of variance (ANOVA) 230–237
optimization 361
ordinal columns 3
ordinary least squares estimation (OLS) 265n1
ordinary sampling variability 167–171
outlier box plots 55
overlap marks 232
P
panel data 25
panel studies 29
panning axes 52
parameter estimates 257–258, 330
parameters, confidence intervals for 276–277
Pareto charts 400–402
partition platform 306–309
patterns, detecting 341–344
percentiles 53–55
Pipeline and Hazardous Materials Program (PHMSA) 117–118
Poisson distribution 115
polynomial functions 331
population means, estimating with JMP 197–199
population proportions, estimating with JMP 179–183
populations 2, 23–25
post-stratification weights 157
power of a test 194–196
predictability, of risks 25
prediction bands 279
prediction intervals, for Y|X 279
Prediction Variance Profile Plot 376
primitives 103
probability and probabilistic sampling
about 105, 163
applications 118–121
assigning values 108
contingency tables and 109–111
continuous data and 123
cumulative probabilities 115, 129–132
events, probability of 107
extraordinary sampling variability 167–171
normal distributions 164–165
ordinary sampling variability 167–171
probability distributions and density functions 163–164
role of in data analysis 105–106
t distributions 164–165
usefulness of theoretical models 166–167
probability distributions 111, 163–164
probability of an event (Pr(A)) 107
probability theory 105–108
process capability 398
processes
about 23–25
in quality improvement 385–386
proportions, charts for 392–395
pseudo-random normal data, generating 137–138
p-value 178–179, 183, 192–194
Q
quadratic models 331–334
quality improvement
about 385
applications 402–405
capability analysis 398–400
control charts 386–395
Pareto charts 400–402
processes 385–386
variation in 385–386
quantile 54
quantitative 3
quantitative variables, distributions of 45–46
R
random error 255
Random function 137–138
random variation, simulating with JMP 116–117
randomization 24, 361–362
Rasmussen, Marianne 106–107, 110
raw case data 36–37
red triangles 6, 100, 134
regression analysis
See also multiple regression
applications 338–340
curvilinear relationships 330–337
dichotomous dependent variable 327–330
dichotomous independent variables 323–327
interpreting results 257
non-linear relationships 330–337
regression tree approach 306–309
relationships
curvilinear 330–337
exploring with Graph Builder 95–98
non-linear 330–337
visualizing multiple 101–103
Relative Frequency method, of assigning probabilities 108
re-launching analysis 67
representativeness, of data 25–28
residuals, normality in 294
residuals analysis
about 267, 268–269
applications 280–283
conditions for least squares estimation 267–268
constant variance 273
curvature 270
in Fit Model platform 304–306
independence 274–276
influential observations 270–272
linearity 269–270
normality 272–273
residuals estimation
about 267, 276
applications 280–283
conditions for least squares estimation 267–268
confidence intervals for parameters 276–277
confidence intervals for Y|X 278–279
prediction intervals for Y|X 279
response combinations, to bivariate data 62
response surface 362
response surface designs 375–379
row states 14–16
Rsquare (r²) 77
Run Chart 386
Rydén, Jesper 258
S
sales lift 376
sample mean, sampling distribution of 152–154
sample proportion, sampling distribution of 148–151
sampling and sampling distributions
about 23–25, 143, 167–168
applications 159–162
Central Limit Theorem (CLT) 154–156
clustering 157–159
complex sampling 157–159
cross-sectional sampling 28–29
defined 2
methods of sampling 144–145
non-random 28
reasons for sampling 143–144
of sample mean 152–154
simple random sampling (SRS) 25–28, 144–145, 145–148
from simulation to generalization 151–152
stratification 157–159
time series sampling 24–25, 28–29
using JMP to select simple random samples 145–148
variability across samples 148–159
sampling error 25
sampling frame 25, 145
sampling variability, ordinary and extraordinary 167–171
sampling weights, comparing two means with JMP 221
saving 20–21
scatterplot 75–76, 78–81
screening 361
script 148
season pattern 341
selected rows 14
session script, saving 21
shadowgrams 51, 126
shape, of distributions 46–47
Shewhart, Walter 405n1
Shewhart Charts
See control charts
shortest half bracket 55
sidereal period of orbit 331
significance testing
about 173
conducting with JMP 174–179, 190–196
simple exponential smoothing 346–348
Simple Moving Average 344–345
simple random sampling (SRS) 25–28, 144–145, 145–148
simple regression model 253–255
simulating
to generalization 151–152
random variation with JMP 116–117
smoothing methods
about 344
linear exponential smoothing (Holt’s Method) 348–349
simple exponential smoothing 346–348
Simple Moving Average 344–345
Winters’ Method 349–350
solving
cumulative probability problems 129–132
inverse cumulative problems 132–133
split plot experiment 199
SRS (simple random sampling) 25–28, 144–145, 145–148
standard deviation 54–55
standard error 155
Standard Normal Distribution 127
starting JMP 3–4
statistics
See descriptive statistics
stratification 157–159
study design 29–36
Subjective method, of assigning probabilities 108
summary data 36–37, 182–183
Summary of Fit 256–257
summary statistics, for single variables 53–55
survey data 33–36
T
t distributions 155–156, 164–165
Table variable note 6
tables, linked 50
See also data tables
tails, in continuous distributions 129
Test Means command 197
testing, for slopes other than zero 258–261
theoretical discrete distribution 111
time series sampling 24–25, 28–29
time-series data 90
transforming the variable 291
treatment effect 229
trend analysis 350–352
trend pattern 341
t-tests 257–258
Tukey’s HSD (Honestly Significant Difference) 235–236, 237–238
two-sample inference, for continuous variables 217–228
two-way analysis of variance 238–245
two-way table 68–71
Type I error 183
Type II error 183–184
U
unequal variances, compared with equal variances 221–222
uniform scaling option 50
union of two events 107
univariate descriptions 92–94
univariate inference
about 285
life expectancy by GDP per capita 291–293
life expectancy by income group 287–291
research context 285
unusual observations, of distributions 46, 47–50
upper fences 55
V
values
assigning probability 108
frequency of 46
variability, across samples 148–159
variability charts 395–397
variables
See also bivariate data; categorical variables; continuous variables
about 39
applications 82–85
defined 2
descriptive statistics 87
dichotomous dependent 327–330
dichotomous independent 323–327
distributions of 40–41
dummy 323–327
indicator 323–327
quantitative 45–46
summary statistics for single 53–55
transforming 291
types of 40–41
heterogeneity of 273
homogeneity of 273
variances, comparing two with JMP 224–226
variation, in quality improvement 385–386
visualizing
multiple regression 296–298
multiple relationships 101–103
W
WDI (World Development Indicators) 87–88
weighting 157
Welch’s test 233
whiskers 55
whole model test 330
Wilcoxon Signed Rank Test 197
Wilson Estimator 180
Winters, Peter 349
Winters’ Method 349–350
word-processor documents, exporting JMP results to 16–20
World Development Indicators (WDI) 87–88
Y
Y-hat 278
Y|X
confidence intervals for 278–279
prediction intervals for 279
Z
z-scores 127
3.145.102.249