Glossary

Statistical Symbols

df degrees of freedom

k number of factors in design

i individual datum

n number of observations in sample

p fraction of design (example 2k–p) or probability value (ex. Prob > F)

PI Prediction Interval

r sample correlation coefficient

R2 index of determination

s sample standard deviation

s2 sample variance

t t-value

X independent variable

Y observed response value

Z uncontrolled variable

* multiplication symbol

— (bar) average (e.g., Y¯ )

^ (hat) predicted (e.g., Y^ )

α (alpha) Type I error rate

β (beta) coefficient or Type II error rate

Δ (delta) difference (e.g., ΔY)

σ (sigma) population (true) standard deviation

Σ (capital sigma) mathematical operator to take the sum of a number series

μ (mu) population (true) mean

Terms

Actual value

The observed value of the response from the experiment. The physical levels of the factors in their units of measure (as opposed to their coded levels, such as −1 or +1).

Adjusted R-squared

R-squared adjusted for the number of terms in the model relative to the number of points in the design. An estimate of the fraction of overall variation in the data accounted for by the model.

Alias

Other term(s) that is (are) correlated with a given coefficient. The resulting predictive model is then said to be aliased. (Also called confounding.)

Analysis of variance (ANOVA)

A statistical method, based on the F-test, that assesses the significance of experimental results. It involves subdividing the total variation of a set of data into component parts.

Antagonism

An undesirable interaction of two factors where the combination produces a response that is not as good as what would be expected from either one alone. The same concept can be applied to higher-order interactions.

Average

See Mean.

Axial points

Design points that fall on the spatial coordinate axes emanating from the overall center point (or centroid in mixture space), often used as a label for star points in a central composite design.

Balanced design

Designs in which low and high levels of any factor or interaction occur in equal numbers.

Bias

A systematic error in estimation of a population parameter.

Block

A group of trials based on a common factor. Blocking is advantageous when there is a known factor that may influence the experimental result, but the effect is not of interest. For example, if all experiments cannot be conducted in one day or within one batch of raw material, the experimental points can be divided in such a way that the blocked effect is eliminated before computation of the model. Removal of the block effect reduces the noise in the experiment and improves the sensitivity to effects.

Case statistics

Diagnostic statistics calculated for each case, that is, each design point in the design after the model has been selected.

Categoric variable

Factors whose levels fall into discrete classes, such as metal versus plastic material. (Also called class or qualitative variable.)

Cell

The blank field to be filled with a response resulting from a given set of input factor levels.

Center point

An experiment with all numerical factor levels set at their midpoint value.

Central composite design (CCD)

A design for response surface methods (RSM) that is composed of a core two-level factorial plus axial points and center points.

Central limit theorem

In its simplest form, this theorem states that the distribution of averages approximates normal as the sample size (n) increases. Furthermore, the variance of the averages is reduced by a factor of n when compared with the variance of individual data.

Centroid

The center point of mixture space within the specified constraints.

Class variable

See Categoric variable.

Coded factor level

See Coding.

Coding

A way to center and normalize factors, e.g., by converting low and high factor levels to −1 and +1, respectively.

Coefficient

See Model coefficient

Coefficient of variation (C. V.)

Also known as the relative standard deviation, the coefficient of variation is a measure of residual variation of the data relative to the size of the mean. It is the standard deviation (root mean square error from ANOVA) divided by the dependent mean, expressed as a percent.

Component

An ingredient of a mixture.

Confidence interval (CI)

A data based interval constructed by a method that covers the true population value a stated percentage (typically 95%) of the time in repeated samplings.

Confounding

See Alias.

Constraint

Limit in respect to component ranges for a mixture experiment.

Continuous variable

See Numeric variable.

Contour plot

A topographical map drawn from a mathematical model, usually in conjunction with response surface methods (RSM) for experimental design. Each contour represents a continuous response fixed at some value.

Cook’s distance

A measure of how much the regression would change if that particular run were omitted from the analysis. Relatively large values are associated with cases with high leverage and large externally studentized residuals. Cases with large values relative to other cases may cause undue influence on the fitting and should be investigated. They could be caused by recording errors, an incorrect model, or a design point far from the remaining cases.

Corrected total

The total sum of squares (SS) corrected for the mean (calculated by taking the sum of the squared distances of each individual response value from its overall average).

Count data

Data based on discrete occurrences rather than from a continuous scale.

Crash and burn

Exceed the operating boundaries (envelope) of a process.

Cumulative probability

The proportion of individuals in a population that the fall below a specified value.

Curvature

A measure of the offset at the center point of actual versus predicted values from a factorial model. If significant, consider going to a quadratic model, which can be fitted to data from a response surface design.

Degree of equation

The highest order of terms in a model. For example, in an equation of degree two, you will find terms with two factors multiplied together as well as squared terms.

Degrees of freedom (df)

The number of independent pieces of information available to estimate a parameter.

Dependent mean

The mean of the response over all the design points.

Design matrix

An array of values presented in rows and columns. The columns usually represent design factors. The values in the rows represent settings for each factor in the individual runs of the design.

Design parameters

The number of levels, factors, replicates, and blocks within the design.

Design of experiment space

An imaginary area bounded by the extremes of the tested factors. (It is also called the experimental region.)

Deterministic

An outcome that does not vary (i.e., it is always the same) for a given set of input factors.

Diagnostics

Statistics and plots, often involving model residuals, which assess the assumptions underlying a statistical analysis.

Distribution

A spatial array of data values.

D-optimal

A criterion for choosing design points that minimizes the volume of the joint confidence intervals for the model coefficients, thereby making them most precise.

Dot plot

A method for recording a response by simply putting points on a number line.

Effect

The change in average response when a factor, or interaction of factors, goes from its low level to its high level.

Envelope

The operating boundaries of a process.

Error term

The term in the model that represents random error. The data residuals are used to estimate the nature of the error term. The usual assumption is that the error term is normally and randomly distributed about zero, with a standard deviation of sigma.

Experiment

A series of test runs for the purpose of discovery.

Experimental region

See Design of experiment space.

Externally studentized residual

Also see Studentized residual. This statistic tests whether a run is consistent with other runs, assuming the chosen model holds. Model coefficients are calculated based on all design points except one. A prediction of the response at this point is then produced. The externally studentized residual measures the number of standard deviations difference between this new predicted value (lacking the point in question) and the actual response. As a rule of thumb, an externally studentized residual greater than 3.5 indicates that the point should be examined as a possible outlier. (For a more exact rule, apply the Bonferroni correction (α/n) and use the two-tailed t-statistic with residual df for limits.) Note: This statistic becomes undefined for points with leverages of one.

Factor

The independent variable to be manipulated in an experiment.

Factorial design

A series of runs in which combinations of factor levels are included.

F-distribution

A probability distribution used in analysis of variance. The F-distribution is dependent on the degrees of freedom (df) for the mean square in the numerator and the df of the mean square in the denominator of the F-ratio.

Foldover

A method for augmenting low-resolution, two-level factorial designs that requires adding runs with opposite signs to the existing block of factors.

Fractional factorial

An experimental design including only a subset of all possible combinations of factor levels, causing some of the effects to be aliased.

F-test

See F-value.

Full factorial

An experimental design including all possible combinations of factors at their designated levels.

F-value

The F-distribution is a probability distribution used to compare variances by examining their ratio. If they are equal, the F-value is 1. The F-value in the ANOVA table is the ratio of model mean square (MS) to the appropriate error mean square. The larger their ratio, the larger the F-value and the more likely that the variance contributed by the model is significantly larger than random error. (Also called the F-test.)

General factorial

A type of full factorial that includes some categoric factors at more than two levels, also know as “multilevel categoric.”

Half-normal

The normal distribution folded over to the right of the zero point by taking the absolute value of all data. Usually refers to a plot of effects developed by statistician Cuthbert Daniel.

Heredity

See Hierarchy.

Hierarchy

(It is referred to as heredity.) The ancestral lineage of effects flowing from main effects (parents) down through successive generations of higher order interactions (children). For statistical reasons, models containing subsets of all possible effects should preserve hierarchy. Although the response may be predicted without maintaining hierarchy when using the coded variables, predictions will not be the same in the actual factor levels unless hierarchy is preserved. Without hierarchy, the model will be scale-dependent.

Homogeneous

Consistent units such as lot-to-lot or operator-to-operator.

Hypothesis (H)

A mathematical proposition set forth as an explanation of a scientific phenomena.

Hypothesis test

A statistical method to assess consistency of data with a stated hypothesis.

Identity column (I)

(Alternatively: Intercept.) A column of all pluses in the design matrix used to calculate the overall average.

Independence

A desirable statistical property where knowing the outcome of one event tells nothing about what will happen from another event.

Individuals

Discrete subjects or data from the population.

Interaction

The combined change in two factors that produces an effect different than that of the sum of effects from the two factors. Interactions occur when the effect one factor has depends on the level of another factor.

Intercept

The constant in the regression equation. It represents the average response in a factorial model created from coded units.

Internally studentized residual

The residual divided by the estimated standard deviation of that residual.

Irregular fraction

A two-level fractional factorial design that contains a total number of runs that is not a power of two. For example, a 12-run fraction of the 16-run full-factorial design on four factors. This is a 3/4 irregular fraction.

Lack of fit (LOF)

A test that compares the deviation of actual points from the fitted surface, relative to pure error. If a model has a significant lack of fit, it should be investigated before being used for prediction.

Lake Wobegon Effect

A phenomenon that causes all parents to believe their children are above the mean. It is named after the mythical town in Minnesota, where, according to author Garrison Keillor, all women are strong, men are good looking, and children are above average.

Least significant difference (LSD)

A numerical value used as a benchmark for comparing treatment means. When the LSD is exceeded, the means are considered to be significantly different.

Least squares

See Regression analysis.

Level

The setting of a factor.

Leverage

The potential for a design point to influence its fitted value. Leverages near 1 should be avoided. If leverage is 1, then the model is forced to go through the point. Replicating such points reduces their leverage.

Linear model

A polynomial model containing only linear or main effect terms.

LSD bars

Plotted intervals around the means on effect graphs with lengths set at one-half the least significant difference. Bars that do not overlap indicate significant pair-wise differences between specific treatments.

Lurking variable

An unobserved factor (one not in the design) causing a change in response. A classic example is the study relating to the number of people and the number of storks in Oldenburg, which led to the spurious conclusion that storks cause babies.

Main effect

The change in response caused by changing a single factor.

Mean

The sum of all data divided by the number of data—a measure of location. (It also is called average.)

Mean square

A sum of squares divided by its degrees of freedom (SS/df). It is analogous to a variance.

Median

The middle value.

Mixed factorial

See General factorial.

Mixture model

See Scheffé polynomial.

Mode

The value that occurs most frequently.

Model

An equation, typically a polynomial, that is fit to the data.

Model coefficient

The coefficient of a factor in the regression model. (It is also called parameter or term.)

Multicollinearity

The problem of correlation of one variable with others, which arises when the predictor variables are highly interrelated (i.e., some predictors are nearly linear combinations of others). Highly collinear models tend to have unstable regression coefficient estimates.

Multilevel categoric

See General factorial.

Multiple response optimization

Method(s) for simultaneously finding the combination of factors giving the most desirable outcome for more than one response.

Nonlinear blending

A second-order effect in a mixture model that captures synergism or antagonism between components. This differs from a simpler factor interaction by the way it characterizes curvature in the response surface for predicted behavior of the mixture as a function of its ingredients.

Normal distribution

A frequency distribution for variable data, represented by a bell-shaped curve symmetrical about the mean with a dispersion specified by its standard deviation.

Normal probability plot

A graph with a y-axis that is scaled by cumulative probability (Z) so normal data plots as a straight line.

Null

Zero difference.

Numeric variable

A quantitative factor that can be varied on a continuous scale, such as temperature.

Observation

A record of factors and associated responses for a particular experimental run (trial).

OFAT

One-factor-at-a-time method of experimentation (as opposed to factorial design).

Order

A measure of complexity of a polynomial model. For example, first-order models contain only linear terms. Second-order models contain linear terms plus two-factor interaction terms and/or squared terms. The higher the order, the more complex shapes the polynomial model can approximate.

Orthogonal arrays

Test matrices exhibiting the property of orthogonality.

Orthogonality

A property of a design matrix that exhibits no correlation among its factors, thus allowing them to be estimated independently.

Outlier

A design point where the response does not fit the model.

Outlier t-test

See Externally studentized residual.

Parameter

See Model coefficient.

Pencil test

A quick and dirty method for determining whether a series of points fall on a line.

Plackett–Burman design

A class of saturated orthogonal (for main effects) fractional two-level factorial designs where the number of runs is a multiple of four, rather than 2k. These designs are resolution III.

Poisson

A distribution characterizing discrete counts, such as the number of blemishes per unit area of a material surface.

Polynomials

Mathematical expressions, composed of powers of predictors with various orders, used to approximate a true relationship.

Population

A finite or infinite collection of all possible individuals who share a defined characteristic, e.g., all parts made by a specific process.

Power

The probability that a test will reveal an effect of stated size.

Power law

A relationship where one variable (e.g., standard deviation) is proportional to another variable (such as the true mean) raised to a power.

Predicted R-squared

Measures the amount of variation in new data explained by the model. It makes use of the predicted residual sum of squares (PRESS) as shown in the following equation: Predicted R-squared = 1 − SSPRESS/(SSTOTAL − SSBLOCKS).

Predicted residual sum of squares (PRESS)

A measure, the smaller the better, of how well the model fits each point in the design. The model is repeatedly refitted to all the design points except the one being predicted. The difference between the predicted value and actual value at each point is then squared and summed over all points to create the PRESS.

Predicted value

The value of the response predicted by the mathematical model.

Prob > F (Probability of a larger F-value)

The p-value for a test conducted using an F-statistic. If the F-ratio lies near the upper tail of the F-distribution, the probability of a larger F is small and the variance ratio is judged to be significant. The F-distribution is dependent on the degrees of freedom (df) for the mean square in the numerator and the df of the mean square in the denominator of the F-ratio.

Prob > t (Probability of a larger t-value)

The p-value for a test conducted using a t-statistic. Small values of this probability indicate significance and rejection of the null hypothesis.

Probability paper

Graph paper with specially scaled y-axis for cumulative probability. The purpose of the normal probability paper is to display normally distributed data as a straight line. It is used for diagnostic purposes to validate the statistical assumption of normality.

Process

Any unit operation, or series of unit operations, with measurable inputs and outputs (responses).

Pure error

Experimental error, or pure error, is the normal variation in the response, which appears when an experiment is repeated. Repeated experiments rarely produce exactly the same results. Pure error is the minimum variation expected in a series of experiments. It can be estimated by replicating points in the design. The more replicated points, the better will be the estimate of the pure error.

p-value

Probability value, usually relating to the risk of falsely rejecting a given hypothesis.

Quadratic

A second order polynomial.

Qualitative

See Categoric variable.

Quantitative

See Numeric variable.

Randomization

Mixing up planned events so each event has an equal chance of occurring in each position, particularly important to ensure that lurking variables do not bias the outcome. Randomization of the order in which experiments are run is essential to satisfy the statistical requirement of independence of observations.

Range

The difference between the largest and smallest value—a measure of dispersion.

Regression analysis

A method by which data are fitted to a mathematical model. (It is also called the method of least squares.)

Replicate

An experimental run performed again from start to finish (not just resampled and/or remeasured). Replication provides an estimate of pure error in the design.

Residual (or “Residual error”)

The difference (sometimes referred to as error) between the observed (actual) response and the value predicted by the model for a particular design point.

Response

A measurable product or process characteristic thought to be affected by experimental factors.

Response surface methods (RSM)

A statistical technique for modeling responses via polynomial equations. The model becomes the basis for 2-D contour maps and 3-D surface plots for purposes of optimization.

Risk

The probability of making an error in judgment (i.e., falsely rejecting the null hypothesis). (Also see Significance level.)

Root mean square error

The square root of the residual mean square error. It estimates the standard deviation associated with experimental error.

R-squared

The coefficient of determination. It estimates the fraction (a number between zero and one) of the overall variation in the data accounted for by the model. This statistic indicates the degree of relationship of the response variable to the combined linear predictor variables. Because this raw R-squared statistic is biased, use the adjusted R-squared instead.

Rule of thumb

A crude method for determining whether a group of points exhibit a nonrandom pattern: If, after covering any point(s) with your thumb(s), the pattern disappears, there is no pattern.

Run

A specified setup of process factors that produces measured response(s) for experimental purposes. (It is also called a trial.)

Run order

Run order is the randomized order for experiments. Run numbers should start at one and include as many numbers as there are experiments. Runs must be continuous within each block.

Sample

A subset of individuals from a population, usually selected for the purpose of drawing conclusions about specific properties of the entire population.

Saturated

An experimental design with the minimum number of runs required to estimate all effects.

Scheffé polynomial

A form of mathematical predictive model designed specifically for mixtures. These models are derived from standard polynomials, of varying degrees, by accounting for the mixture constraint that all components sum to the whole. (It is also called a mixture model.)

Screening

Sifting through a number of variables to find the vital few. Resolution IV two-level fractional factorial designs are often chosen for this purpose.

Significance level

The level of risk, usually 0.05, established for rejection of the null hypothesis.

Simplex

A geometric figure with one more vertex than the number of dimensions. For example, the two-dimensional simplex is an equilateral triangle. This shape defines the space for three mixture components that each can vary from 0 to 100%.

Simplex centroid

A mixture design comprised of the purest blends, binary combinations, etc., up to and including a centroid blend of all components.

Sparsity of effects

A rule-of-thumb that about 20% of main effects and two-factor interactions will be active in any given system. The remainder of main effects, two-factor interactions, and all higher-order effects, are near zero, with a variation based on underlying error.

Split plot

An experiment design that conveniently groups hard-to-change (HTC) factors, which are set up in randomized order, within which the easy-to-change (ETC) factors vary according to a random plan. The groups are called whole plots and the splits are referred to as subplots.

Standard deviation

A measure of variation in the original units of measure, computed by taking the square root of the variance.

Standard error

The standard deviation usually associated with a parameter estimate rather than individuals.

Standard error of a parameter

The estimated standard deviation of a parameter or coefficient estimate.

Standard order

A conventional ordering of the array of low and high factor levels versus runs in a two-level factorial design.

Star points

Axial points in a central composite design.

Statistic

A quantity calculated from a sample to make an estimate of a population parameter.

Studentized

A value divided by its associated standard error. The resulting quantity is a Z-score (number of standard deviations) useful for purposes of comparison.

Stuff

Processed material such as food, pharmaceutical, or chemical (as opposed to “thing”).

Subplot

Experimental units that the whole plot is split into.

Sum of squares (SS)

The sum of the squared distances of the actual values from an estimate.

Synergism

A desirable interaction of two factors where the combination produces a response that is better than what would be expected from either one alone. The same concept can be applied to higher-order interactions.

Tetrahedron

A three-dimensional geometric figure with four vertices. It is a simplex. The tetrahedron looks like a pyramid, but has only three sides, not four.

Thing

Manufactured hard goods, such as electronics, cars, medical devices (as opposed to “stuff”).

Transformation

A mathematical conversion of response values (for example, logY).

Treatment

A procedure applied to an experimental unit. One usually designs experiments to estimate the effect of the procedures (treatments) on the responses of interest.

Trivial many

Nonactive effects. The sparsity of effects principal predicts that all interactions of third order or higher will fall into this category, as well as 80% of all main effects and two-factor interactions.

True

Related to the population rather than just the sample.

Trial

See Experiment.

t-value

A value associated with the t-distribution that measures the number of standard deviations separating the parameter estimate from zero.

Type 1 error

Saying something happened when it really didn’t (a false alarm).

Type 2 error

Not discovering that something really happened (failure to alarm).

Uniform distribution

A frequency distribution where the expected value is a constant, exhibiting a rectangular shape.

Variable

A factor or response that assumes assigned or measured values.

Variance

A measure of variability computed by summing the squared deviations of individual data from their mean, then dividing this quantity by the degrees of freedom.

Vertex

A point representing an extreme combination of input variables subject to constraints. Normally used in conjunction with mixture components.

Vital few

The active effects. (Also see Sparsity of effects.)

Whole plot

Largest experimental unit (group) in a split-plot design.

X-space

See Design of experiment space.

Y-bar (Y¯ )

A response mean.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.219.130