Glossary

#DIV/0!: An Excel error message caused by trying to divide by 0.
#NAME?: An Excel error message generally denoting the misspelling of an Excel function name.
#NUM!: An Excel error message generally caused by trying to perform an undefined mathematical procedure, such as taking the square root of a negative number, or by requesting a result that exceeds Excel's limits, such as =FACT(171).
#VALUE!: An Excel error message generally caused by including a nonnumerical value in a mathematical operation.
=AND(): Excel function that returns the result of two comparisons, TRUE if both comparisons are true and FALSE if either comparison is false.
=AVERAGE(): Excel function that returns the mean of a series of data.
=BINOMDIST(): Excel function that returns the probability for the appearance of any value from a binomial distribution, given the number of trials and the outcome probability for a single trial.
=CHIDIST(): Excel function that returns the probability of a chi-square value, given degrees of freedom.
=CHIINV(): Excel function that returns the chi-square value, given the probability of the chi-square value and degrees of freedom.
=CHITEST(): Excel function that returns the probability of a chi-square value, given the observed and expected values.
=COUNT(): Excel function that returns the number of values in a series of numerical data.
=COUNTIF(): Excel function that returns the number of times a given value appears in a series of data.
=EXP(): Excel function that returns the value of e (approximately 2.718282) raised to the power of the number in the parentheses.
=FACT(): Excel function that returns the factorial of the number in parentheses. Limited to numbers less than 171.
=FDIST(): Excel function that returns the probability of an F value, given degrees of freedom.
=FINV(): Excel function that returns the F value, given the probability of the F value and degrees of freedom.
=FREQUENCY(): Excel function that returns a frequency distribution for a series of data in terms of a series of categories.
=IF(): Excel function that returns the result of an if–then decision.
=MAX(): Excel function that returns the maximum value in a series of data.
=MDETERM(): Excel function that returns the determinant for a square matrix.
=MEDIAN(): Excel function that returns the median value for a series of data.
=MIN(): Excel function that returns the minimum value in a series of data.
=MINVERSE(): Excel function that returns the inverse of a matrix (array).
=MMULT(): Excel function that returns the product of two matrices (arrays).
=MODE(): Excel function that returns the modal value for a series of data. If the data have more than one mode, =MODE() will return the value of the numerically smallest mode.
=NORMDIST(): Excel function that returns probability for any value from a normal distribution, given the mean and standard deviation of the distribution.
=OR(): Excel function that returns TRUE if either or both of two comparisons are true and FALSE if both comparisons are false.
=POISSON(): Excel function that returns the probability of a appearance of any value from a Poisson distribution, given the mean of the distribution.
=RAND(): Excel function that returns a uniform random number between 0 and 1.
=RANDBETWEEN(): Excel function that returns a uniform random number between two selected numbers.
=ROUND(): Excel function that returns the selected number rounded to the number of decimal places specified.
=SQRT(): Excel function that returns the square root of a number.
=STDEV(): Excel function that returns the standard deviation of a series of data assumed to represent a sample.
=SUM(): Excel function that returns the sum of a series of data.
=SUMPRODUCT(): Excel function that returns the sum of the product of the values of two series of data.
=SUMSQ(): Excel function that returns the sum of the squares of a series of data.
=TDIST(): Excel function that returns the probability of a t value, given degrees of freedom and a one- or two-tailed test.
=TINV(): Excel function that returns the t value, given the probability of the t value and degrees of freedom.
=TRANSPOSE(): Excel function that returns the transpose of a matrix (array).
=TRUNC(): Excel function that returns the integer portion of a number.
=TTEST(): Excel function that returns the probability of a t value, given a data set with a numerical dependent variable and a two-level categorical independent variable.
=VAR(): Excel function that returns the variance of a series of data assumed to represent a sample.
=VARP(): Excel function that returns the variance of a series of data assumed to represent a population.
=YEARFRAC: Excel function that returns the number of years between two calendar dates.
A priori probability: Likelihood of the occurrence of an event that can be determined from the nature of the process that generates the event.
Alpha: The level of Type I error, usually set by the researcher at 0.05 or 0.01. See Type I error.
Analysis of variance (ANOVA): A test used to determine whether a numerical variable is independent of one or more categorical variables that may take on more than two values.
Analysis tool pak: Package of statistical procedures that can be added to Excel to perform such things as t tests, analysis of variance, random sampling, and regression.
Array: A set of data in contiguous rows and columns. An Excel designation of a matrix. In Excel, it often refers to a set of cells that are linked so that no one cell can be changed independently of the others.
Bartlett test: A test for homogeneity of within group variance between more than two groups.
Bell-shaped curve: The shape of a normal distribution.
Bernoulli distribution: A probability distribution that contains only values of 1 or 0, the number of which depend on the probability of 1.
Best-fitting line: A line determined by an independent variable that passes closest to the values of a dependent variable in a two-dimensional graph. Usually defined as the line that minimizes the sum of squared differences between the line and the values of the dependent variable for all values of the independent variable.
Beta: The level of Type II error, usually not known unless a specific value is stated for an alternative hypothesis. See Type II error.
Between group variance: Variance that exists among the means of some value for two or more groups.
Binomial distribution: A probability distribution that represents the accumulation of a Bernoulli distribution for any number of trials and any value of the probability of 1.
Bins: Excel designation for the categories into which the =FREQUENCY() function accumulates a data series.
Categorical variable: A variable whose values are logically classified by names (i.e., male, female), as opposed to numbers. May be coded as numbers, however.
Causal variable: A variable whose values are assumed to influence the values of other variables in a given analysis but assumed not to be affected by these others.
Causality: The concept that the value of one variable may be a cause of the value of another variable.
Central tendency: A way of referring to the central or midpoint around which a data series clusters. Measured by the mean, median, or mode.
Chi-square statistic: A statistical test that assesses whether a categorical variable is independent of one or more other categorical variables.
Cluster sample: A sample drawn by first dividing the total population into several mutually exclusive and all-inclusive groups and then selecting some of the groups from which to take all members of the group or a sample of the group.
Conditional probability: The probability of some outcome, given knowledge of some other event—for example, the likelihood that a person will arrive at an emergency room with a true emergency, given that the person arrives during the night.
Confidence interval: Interval on the number scale within which a population value is expected to lie with some predetermined probability, such as 95 percent.
Constant: A number or characteristic that is assigned to members of a sample and that is identical for every member of the sample.
Contingency table: Simultaneous distribution of two usually categorical variables. See cross-tabulation.
Continuous numerical variable: A numerical variable that can, theoretically, be infinitely divided, such as blood pressure.
Control group: Those persons in an experiment who do not receive the actual experimental intervention; they generally receive some placebo intervention that mimics the actual experimental intervention but is expected to have no effect.
Correlation: A value derived from a statistic that describes the relationship between two variables. May range from −1 for a perfect negative relationship to 1 for a perfect positive relationship. Zero indicates no relationship.
Critical value: The value of a test statistic (chi-square, t value, F ), above which the hypothesis of interest is rejected.
Cross-tabulation (or Cross-tab): Simultaneous distribution of two usually categorical variables. See Contingency table.
Cumulative frequency: Frequency distribution that shows the accumulation of values from the lowest category to the highest.
Data range: A set of generally contiguous cells that represent data to be included in some Excel operation.
Degrees of freedom: A value that designates the number of options that can be exercised before no others are available.
Delimited: Refers to one of two ways that data may be stored in a .txt file. Each data element is followed by a common character that designates or delimits the end of that data element. See Fixed-length.
Dependent variable: A variable whose values are assumed to be affected or modified by the value of other variables in a given analysis.
Determinant: A single number that can be used to describe a square matrix. For a two-by-two matrix, equal to the product of the main diagonal elements minus the product of the off-diagonal elements.
Diagnostic-related group (DRG): A categorization of medical conditions used for determining payment by Medicare and Medicaid.
Discrete distribution: An Excel option that allows the user to define the probability of the selection of any value from a predetermined set of values.
Discrete numerical variable: A numerical variable that cannot be divided into units smaller than integers, such as the number of persons in a waiting room.
Dispersion: A way of referring to the variability in a set of data. Measured by the variance or standard deviation.
Double-blind random clinical trial: An experimental design in which neither the subjects under study, the persons administering the intervention of the study, nor the persons assessing the results of the study know which group or groups received which intervention.
Dummy variable: A categorical variable that takes on two values and is coded 1 and 0. For example, the two colors blue and red could be coded 1 for blue and 0 for red. They would then represent a dummy variable.
Dust Bowl empiricism: Term applied to the use of statistics to sift through data to find the best relationships, independent of theory.
Empirical probability: The likelihood of the occurrence of an event that can be determined only on the basis of historical data about similar events that have already occurred.
Event: The occurrence in a stochastic process to which a probability can be assigned.
Excel function: A built-in Excel option that will produce the result of a formula or algorithm. Accessible on the Formatting toolbar.
Exponential model: A regression model that is based on converting the dependent variable to its logarithmic value, either natural or base 10.
Factorial design: Analysis that includes more than one independent variable. Usually used in reference to analysis of variance.
Finite population correction (fpc): A multiplier for reducing the standard error of a measure taken from a finite population when the sample is large relative to the size of the population.
Fisher exact test: An alternative to chi-square for two-by-two tables with extremely small expected values (less than 5) in any cell.
Fixed-length: Refers to one of two ways that data may be stored in a .txt file. Each data element is the same length. See delimited.
Formula bar: See formula line.
Formula line: The field at the top of an Excel spreadsheet that shows the content of the currently selected cell.
HDI (Human Development Index): A composite number ranging from 0 to 100, developed by the United Nations Development Programme (UNDP) for each country, from per capita income, literacy, and life expectancy.
Header row: The row at the top of a column that contains the name of the data in that column.
Histogram: A graph that shows data values as vertical columns.
Homogeneity of variance: Equal within group variation across two or more groups.
Hypothesis: A statement of belief about a population to be assessed, using data from a sample.
ICD-9 (International Classification of Diseases, ninth revision): A coding scheme maintained by the World Health Organization that provides a code for classifying mortality data from death certificates.
Identity matrix: A square matrix with 1 in the main diagonal cells and 0 in all other cells.
Independence: Formally, the understanding that conditional probabilities equal marginal probabilities; the recognition that two variables or two events are not dependent on each other.
Independent variable: A variable whose values are assumed to be unaffected by other variables in a given analysis. See also Causal variable, Predictor variable.
Information matrix: A matrix derived from Logit analysis that provides an intermediary step in the calculation of standard errors of coefficients derived using Logit.
Interaction effect: The joint effect of two or more independent variables on a dependent variable. See Main effect.
Interval scale: A numerical variable scale that has no real zero point, such as IQ or temperature measured in Celsius or Fahrenheit.
Inverse: The value by which a number must be multiplied for the product to be 1 or by which a matrix must be multiplied for the product to be an identity matrix. For a scalar (single number), the inverse of x is 1/x.
Joint probability: The likelihood of two simultaneously occurring events—for example, the likelihood that a person will come to an emergency room during the day and will come for a true emergency.
Likert scale: An ordinal scale that is usually constructed with categories such as “Strongly agree,” “Agree,” “Undecided,” “Disagree,” and “Strongly disagree.”
Linear probability model: A regression model derived by using ordinary least squares to estimate regression coefficients for a dichotomous dependent variable.
Linear regression: A statistical technique for relating a dependent numerical variable to an independent numerical or two-level categorical variable that generally assumes a straight-line relationship.
Logarithmic model: Regression analysis in which the x axis variable is converted to a logarithm, either natural or base 10.
Logit: An analysis method that allows for the assessment of whether a two-level categorical variable is independent of one or more numerical or two-level categorical predictor variables. Based on maximum likelihood.
Main effect: The direct effect of a single independent variable on an independent variable. See Interaction effect.
Marginal probability: The probability of some outcome without regard to any other event; for example, the likelihood that a person who arrives at an emergency room will come for a true emergency versus a nonemergency condition is a marginal probability.
Matrix: A set of data in contiguous rows and columns. See Array.
Maximum likelihood: The estimation of regression coefficients based on the maximization of a likelihood function (rather than on the minimization of the sum of squared errors).
Mean: The overall average of all values for a single variable. Calculated by summing all values and dividing by the total number of values.
Median: The midpoint of the values of a single variable. Calculated by finding the value for which half the observations are larger and half the observations are smaller.
Medicare: A program administered by the U.S. government that pays specified medical expenses, primarily for persons over the age of sixty-five.
Mode: The most commonly occurring number in a series of data.
Monte Carlo technique: A method of simulating the results of an analysis or process using randomly assigned values.
Moving average model: A regression-like model that is generated based on the average of two or more previous time periods.
Multicolinearity: A term descriptive of the relationship between two—usually independent or predictor—variables that vary together or are highly correlated with each other.
Multiple regression: A technique for determining whether a single numerical variable is independent of two or more other numerical or two-level categorical variables.
Mutually exclusive: Two or more outcomes of a stochastic process that cannot simultaneously occur.
Nested functions: Two or more Excel functions in the same cell. Generally used for multiple decisions.
Nominal variable: A categorical variable that is not ordered.
Nonlinear relationship: A relationship between two variables that does not show evidence of a straight-line relationship.
Normal distribution: A probability distribution in which values near the mean are more likely than values farther from the mean. Often referred to as a bell-shaped curve.
Numerical variable: A variable that is measured on a number scale. May be continuous or discrete.
Ordinal variable: A categorical variable that is ordered by magnitude or intensity (e.g., good, better, best).
Ordinary least squares (OLS): The least complex multiple regression technique.
Outcome: The results of the events in a stochastic process.
Parameter: Measure of a characteristic of a population.
Pareto chart: A graph that shows actual frequencies as a histogram and cumulative frequencies as a line graph. Always ordered from the largest data category on the left of the graph to the smallest on the right.
Patterned distribution: An Excel option that allows the user to generate a series of values in any pattern selected. Not a probability distribution.
Pivot table: An Excel designation for a frequency distribution or cross-tabulation created by using the Pivot Table Tool.
Poisson distribution: A probability distribution that represents the likelihood of a rare event.
Polynomial model: A regression analysis in which a single independent variable is converted to its square, cube, fourth power, and so on to describe a dependent variable.
Population: The group of persons or organizations about which there is an interest.
Power model: Regression model in which both the independent and the dependent variables are converted to a logarithmic value, either natural or base 10.
Predictor variable: A causal variable. A variable whose values are predictive of the values of other variables in a given analysis.
Probability: The likelihood of an outcome of an event.
Probit: An analysis method that allows for the assessment of whether a two-level categorical is independent of one or more numerical or two-level categorical predictor variables. Based on maximum likelihood.
Pseudorandom number: A random number generated by a computer. Called pseudorandom because any computer-generation scheme inevitably incorporates some selection pattern.
R square (R²): Proportion of variance in a dependent variable that can be accounted for or explained by knowledge of variation in an independent variable or variables.
Random number table: A table of numbers arranged in rows and columns. Each number is randomly ordered with respect to all other numbers in the table. Used to select random samples before the advent of computers.
Random sample: A subset of a larger population selected in such a way that every member of the larger population has a known and nonzero likelihood of being included.
Ratio variable: A numerical variable that has a real zero point. May be continuous, such as weight, or discrete, such as the number of persons in a physician's waiting room.
Regression analysis: A statistical analysis that seeks to determine whether a given numerical variable is independent of some set of other numerical variables or two-level categorical variables.
Regression coefficient: A value by which an independent variable can be multiplied to predict the values of a dependent variable.
Repeated measures: More than one measure of a variable on the same group of subject (persons, organizations).
Sample: A subset of a population about which there is an interest. Selected to determine the values of interest for the population.
Sample space: All possible outcomes of a stochastic process.
Sampled population: The population that is actually sampled. May or may not be the same as the target population.
Scalar: A way of referring to a single number when discussed in the context of matrices or arrays.
Scatterplot (or scatter graph): A graph that shows the simultaneous distribution of the data points for two variables. Also called an XY graph.
Simple random sample: A sample drawn in such a way that every possible sample of a given size has an equal probability of being selected.
Skewed left: A designation for a data distribution that has a median value greater than its mean and a “tail” of data to the left side of the distribution.
Skewed right: A designation for a data distribution that has a median value less than its mean and a “tail” of data to the right side of the distribution.
Solver: An Excel add-in that solves a wide variety of optimization problems.
Spreadsheet: A computer-generated sheet of rows and columns. The area in Excel where work is done (a set of 1,048,576 numbered rows and 16,384 columns designated A through XFD).
Standard deviation: A measure of overall variation in a set of data, the square root of the variance. See variance.
Standard error: A measure of the overall variation in the means from samples of a given size taken from a population. Equals the standard deviation divided by the square root of the sample size.
Statistic: Measure of a characteristic of a sample. Estimates a parameter.
Statistical significance: Refers to a statistical test result that leads to the rejection of the implicit or explicit hypothesis of independence between two or more variables.
Stepwise regression: A technique for using regression analysis to find a model that accounts for the largest possible share of the variance in the independent variable while eliminating variables that do not contribute to the prediction. Can be forward inclusion or backward elimination.
Stochastic process: A series of events, the outcome of any one being determined by some probability.
Stratified sample: A sample drawn by first dividing the total population into two or more mutually exclusive and all-inclusive groups and then drawing samples from each of these groups.
Sums of squares: The addition of the difference between all values for a particular data set and the mean value for the data. May be calculated.
Systematic sample: A sample drawn by dividing the population into several ordered groups and then randomly selecting a first member of the sample from the first group and selecting each comparable member from all other groups.
t distribution: A probability distribution similar to the normal distribution but having fewer values near the mean and more in the tails, depending on degrees of freedom.
t test: A test that compares an estimated value from a sample with the standard error for that value. Used to determine whether a numerical variable is independent of a two-level categorical variable.
Target population: The population of interest from which a sample is desired.
Text Import Wizard: A set of dialog boxes invoked by the attempt to load a non-Excel file into an Excel spreadsheet that provides step-by-step direction in converting the non-Excel file into an Excel file.
Transpose: Operation by which the columns of an array (matrix) become rows and the rows become columns.
Trend line: The single line through an XY scatterplot that provides the best linear or nonlinear fit to the data.
Type I error: The likelihood of rejecting a hypothesis when it is true. Always set by the level of confidence selected.
Type II error: The likelihood of not rejecting a hypothesis when it is false. Known only if a specific value of an alternative hypothesis is given.
Uniform distribution: A probability distribution in which any number in a given range is equally likely. Often called a flat distribution.
Variable: A measure of some attribute for a set of entities, persons, or organizations that takes on more than one value.
Variance: A measure of overall variation in a set of data that represents the average squared difference between each value in the data set and the mean of all values.
Vector: A matrix (array) made up of a single row or a single column.
Weighted least squares (WLS): A regression technique that takes account of possible unequal variation in the dependent variable at different levels of the predictor variables.
Within group variance: Variance that exists among the values for a specified group.
Workbook: An Excel computer file consisting of one or more spreadsheets.
x axis: A horizontal axis in a graph or chart. Usually considered to be the independent variable.
x variable: A variable generally considered to be the independent or causal variable.
y axis: A vertical axis in a graph or chart. Usually considered to be the dependent variable.
y variable: Variable generally considered to be the caused or dependent variable.
Yates's correction: A modification of the chi-square formula for two-by-two tables with expected values less than 10.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Glossary

Create new playlist

Sign In

Sign Up

Glossary

Table of Contents for
Glossary