Preface

I don't know whether it is the age we live in, or the age I have lived to, but whichever, I have lately found myself shouting at the TV screen disturbingly often. Part of the reason for this may be the unchecked growth of the crotchety side of my nature. But some of the blame for these untoward outbreaks can be traced directly to the remarkable dopiness that substitutes for wisdom in modern society.

Ideas whose worth diminishes with data and thought are too frequently offered as the only way to do things. Promulgators of these ideas either did not look for data to test their ideas, or worse, actively avoided considering evidence that might discredit them.

Ranting in the privacy of my own home spares me embarrassment, but alas does nothing the remediate the problem. So from time to time I would put pixel to screen and describe the proposal or policy that had aroused me along with empirical evidence that tests the idea. In a surprisingly short time I had collected enough of these excursions to provide a coherent story.

This book is that story.

It deals with education in general and the use of tests and test scores in support of educational goals in particular. My forty years of experience in this area has reduced the diffidence I ordinarily feel about offering opinions. But the opinions, and the work involved in finding and analyzing the data that supports them, was not accomplished alone. I had help. It is my pleasure now to offer my gratitude publicly.

First, to my employer, the National Board of Medical Examiners, which provided the time and resources to prepare this work. And more specifically to Don Melnick, the Board's president, Ron Nungester, senior vice president, and Brian Clauser, assistant vice president, who, in addition to resources, provided the encouragement and quietude to get it completed.

Next to coauthors who collaborated with me in the original work that is anthologized here. Specifically, Peter Baldwin, Henry Braun, Paul Holland, William Lichten, and David Thissen. I thank you all. Without you, this would not have been possible. In addition there were a number of people who have read and commented on various pieces in preliminary forms: Wayne Camara, Stephen Clyman, Monica Cuddy, John Durso, Sam Palmer, Peter Scoles, and Linda Steinberg.

Of course, throughout all that I have done over the past eight years is the shadow of Editha Chase, who has been my right hand. Her smiling face always accompanies the resolution of whatever problem I set before her. A heartfelt thank-you is surely too small, but alas is all that I have to give.

And finally, the staff of Princeton University Press, whose intelligence born of long experience, turned my raw manuscript into the finished product you now hold in your hand. And of course, Vickie Kearn, math editor at the Press and my friend. Her enthusiasm and support have always been, and remain now, very special to me.

images

In September 2008 the National Association for College Admission Counseling (NACAC) published a report that made a number of recommendations for changes in the admissions process. Three of the major ones were (a) to make standard admissions exams optional, (b) to substitute specific achievement tests for the more general aptitude tests currently used, and (c) to replace the PSAT as a screening test for Merit Scholarships with a more vigorous screener without a fixed minimum eligibility score. In the first three chapters I use evidence to examine the value of each of these proposals. In the subsequent chapters I discuss how well evidence and logic supports (or doesn't) a number of other actions suggested or taken. Chapter 9 focuses on the use of student performance data to evaluate teachers. I am well aware of the current heated debates about the desirability of such an action. The challenges associated with accomplishing such an undertaking go well beyond dogma and involve subtle and important issues that are at the very heart of the validity of scientific inference.

What follows is an annotated table of contents. I have included these extended descriptions for two reasons:

 

a. To whet the appetite of the reader by providing more information about the contents of the chapter than could be accomplished in a short chapter title, and

b. To provide at least a summary of the contents for those too busy to read the whole chapter in hopes that it will increase their skepticism of the policy being discussed.

 

Chapter 1, “On the Value of Entrance Exams: What Happens When the SAT Is Made Optional?” Since 1969 colleges have begun to adopt an “SAT optional” policy for applicants. How has this policy affected the quality of entering classes? As luck would have it, essentially all applicants take the SAT, and then after receiving their scores decide whether or not to submit them. Through a special data-gathering effort we are able to compare the scores of those who submit them with those who decide not to. Not surprisingly, those who withhold their scores have done much worse. Paired with this, we also discover that their subsequent performance in college is about as much lower than the other students as would have been predicted from their lower SAT scores. However, by excluding lower-scoring students from the school's SAT average, the school's national rankings, which have mean SAT score as an important component, go up.

Chapter 2, “On Substituting Achievement Tests for Aptitude Tests in College Admissions.” This notion is one that only makes sense if you say it fast. Test scores are used by admissions officers to make comparisons among applicants. The idea of substituting specific achievement tests for more general aptitude tests has, as its basis, the idea that students will then be able to show off expertise directly in the content areas of their special competence. But then how are comparisons to be made? Is your French better than my physics? Was Babe Ruth a better hitter than Mozart was a composer? In this chapter I discuss the too often arcane, but, alas, critical, area of test equating, in which scores on different tests are made comparable, including the limits of the technology.

Chapter 3, “On Rigid Decision Rules for Scholarships.” Having hard and fast cutoffs has always been a tough sell. How can we say that someone who scores just above the cutoff is okay and someone just below is not? Surely our educational measuring instruments are not up to such precise decision-making. In this chapter I begin with the wise strategy adopted for the use of the nineteenth-century civil service exams in India, and show why the current approach is a far fairer solution to a difficult problem than the alternative recommended by NACAC.

Chapter 4, “The Aptitude-Achievement Connection: Using an Aptitude Test to Aid in Allocating Educational Resources.” Over the last decade there has been an enormous increase in the number of advanced placement courses offered in high schools, in the number of high schools that offer them, and the number of students who enroll in them. In the face of scarce resources and low performance of students on national assessments, is this a sensible development? In this chapter I describe an analytic tool that predicts how well students will do on Advanced Placement exams and argue that using it would allow the wiser allocation of resources.

Chapter 5, “Comparing the Incomparable: On the Importance of Big Assumptions and Scant Evidence.” There is no limit to the distances some will go to in order to make invidious comparisons. In this chapter I discuss how international comparisons are made and why the validity of such comparisons rests on assumptions that are both shaky and untestable. I illustrate this argument with examples from the United States, Canada, and Israel.

Chapter 6, “On Examinee Choice in Educational Testing.” Modern multiple-choice tests have been criticized for not providing an authentic context for measurement. The critics prefer test formats that feature essays and other kinds of extended response questions. A difficulty with this approach arises in real-world situations because of limited testing time. One cannot ask examinees to answer a large number of essay questions. But if only a very small number of questions are asked, the possibility increases that an unfortunate choice of topics by the test developer may disadvantage some examinees. This has been ameliorated by allowing examinees to choose from among several essay topics. In this chapter I show that this strategy fails empirically, exacerbating intergroup differences and disadvantaging women.

Chapter 7, “What If Choice Is Part of the Test?” Suppose we consider the choice of which item to answer as part of the test. We can sensibly do this if the skills related to choosing wisely are well established and agreed by all participants to be a legitimate part of the curriculum that is being tested and that all options available could, plausibly, be asked of each examinee. If this is all true, it leads to a remarkable and surprising result—we can offer choice on a test and obtain satisfactory results without grading the answers! In fact, the examinees don't even have to write an answer; it is enough that they merely indicate their preferences. In this chapter I demonstrate how this works out.

Chapter 8, “A Little Ignorance Is a Dangerous Thing: How Statistics Rescued a Damsel in Distress.” Whenever a testing organization makes a mistake—for example, an item that was misscored or some test answer sheets that were misread—we read about it in the paper. Yet the most grievous errors that take place in educational testing are perpetuated by the users of the test scores. In this chapter I describe how a third-grade teacher was suspended because her class did better than a superficial analysis of the scores on a statewide exam indicated was likely. It was concluded that the class performance reflected cheating. A subsequent, more careful analysis of the data indicated that the class's performance was in line with expectations. When these analyses were laid out in a court hearing, she was exonerated.

Chapter 9, “Assessing Teachers from Student Scores: On the Practicality of Value-Added Models.” Powerful forces are currently arrayed in support of the assessment of teachers using the gains demonstrated by their students' scores on exams given at the beginning and end of the school year. In this chapter I discuss the epistemological and practical problems with this approach.

Chapter 10, “Shopping for Colleges When What We Know Ain't.” We often hear that after our house, our car is the largest purchase we will make. In line with this claim, we are bombarded with information to help us make an informed decision. Yet as anyone who has recently paid the college tab for one or more children well understands, those expenses dwarf cars, and indeed three children at a private college can easily surpass the cost of the family house. Yet the advice we get about choosing colleges is remarkably short on hard evidence. In this chapter I evaluate a new approach to ranking colleges, and conclude that it falls far short of what is required.

Chapter 11, “Of CATs and Claims: The First Step toward Wisdom.” When we change the format in which tests are administered we must also change some of the rules. In this chapter examine what went wrong when attempts were made to continue to use rules devised for traditional paper and pencil tests when the tests were administered intelligently by computer. Along the way, we find that misunderstanding increases when the distinctions between data and evidence are blurred.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.82.23