8

SCHOOLS

The quest for measureable results has been even more central to government policy regarding K-12 education. In the words of the historian of education (and erstwhile Department of Education official) Diane Ravitch, “Governors, corporate executives, the first Bush administration, and the Clinton administration agreed: They wanted measureable results; they wanted to know that the tax dollars invested in public education were getting a good return.”1 In the public sector, the show horse of metrics became “No Child Left Behind” (NCLB), a major piece of legislation enacted under George W. Bush in 2001, with bipartisan support, whose formal title was “An act to close the achievement gap with accountability, flexibility, and choice, so that no child is left behind.”

THE PROBLEM AND ITS PURPORTED SOLUTION

NCLB was meant to address a real problem: despite substantial state-level efforts to equalize spending among school districts, there were persisting differences in school performance among ethnic groups. Advocates of the reforms maintained that the act would counter the lack of accountability of teachers and principals, and create incentives for improved outcomes by aligning the behavior of teachers, students, and schools with “the performance goals of the system.”2 The culprit was presumed to be a lack of professionalism among public school teachers.

The legislation grew out of more than a decade of heavy lobbying by an extraordinarily heterogeneous coalition: business groups concerned about the quality of the workforce; civil rights groups distressed by differential group achievement; and educational reformers disturbed by what they saw as the failure of public schools to educate, who demanded national standards, tests, and assessment.3 The benefit of such measures was oversold in terms little short of utopian. William Kolberg of the National Alliance of Business asserted that “the establishment of a system of national standards, coupled with assessment, would ensure that every student leaves compulsory school with a demonstrated ability to read, write, compute and perform at world-class levels in general school subjects.”4

The first fruit of this effort, on the federal level, was the Improving America’s Schools Act, adopted under President Clinton in 1994. Meanwhile, in Texas, Governor George W. Bush became a champion of mandated testing and educational accountability. Under the NCLB act, enacted early in Bush’s presidency, states were to test every student in grades 3–8 each year in math, reading, and science. The act was meant to bring all students to “academic proficiency” by 2014, and to ensure that each group of students—including blacks and Hispanics, who were singled out for comparative evaluation—within each school made “adequate yearly progress” toward proficiency each year. It imposed an escalating series of penalties and sanctions for schools in which the designated groups of students did not make adequate progress. The act was co-sponsored by Sen. Edward Kennedy, and passed both houses of Congress with both Republican and Democratic support, despite opposition from conservative Republicans antipathetic to the spread of federal power over education, and of some liberal Democrats.5

Yet more than a decade after its implementation, the benefits of the accountability provisions of the NCLB remain elusive. (Other aspects of NCLB—which promoted greater school choice, the creation of charter schools, and higher qualifications for teachers—seem to have been more successful, but are beyond the scope of our subject.) Its advocates grasp at any evidence of improvement on any test at any grade in any demographic group for proof of NCLB’s efficacy. But test scores for primary school students went up only slightly, and no more quickly than before the legislation was enacted, and its impact upon the test scores of high school students has been more limited still.

The main impact of NCLB was to call greater attention to the “achievement gap”—the differences in academic performance among Asian, white, black, and Hispanic students.6 Asians tended to outscore whites, who in turn tended to outscore blacks and Hispanics. Most salient was the ongoing deficiency of African American students. Eight years after the introduction of NCLB, their relative scores had not changed. Average scores on national examinations such as the National Assessment of Educational Progress tests for English and mathematics for seventeen-year-olds remained virtually unchanged from the early 1970s through 2008. In fact, the scores for each group (Asian, white, black, and Hispanic) rose somewhat, but because of the changing ethnic composition of the pupils (especially the rising percentage of Hispanic students, who tended to score less well than their Asian or white counterparts), the average national scores remained steady.7

UNINTENDED CONSEQUENCES

The unintended consequences of NCLB’s testing-and-accountability regime are more tangible, and exemplify many of the characteristic pitfalls of metric fixation. Under NCLB, scores on standardized tests are the numerical metric by which success and failure are judged. And the stakes are high for teachers and principals, whose raises in salary and whose very jobs sometimes depend on this performance indicator. It is no wonder, then, that teachers (encouraged by their principals) divert class time toward the subjects tested—mathematic and English—and away from other subjects, such as history, social studies, art, music, and physical education. Instruction in math and English is narrowly focused on the sorts of skills required by the test, rather than broader cognitive processes: that is, students too often learn test-taking strategies rather than substantive knowledge. As depicted in the HBO series The Wire, a great deal of class time is devoted to practicing for tests—hardly a source of stimulation for pupils. Because students in English are taught to answer multiple choice and short-answer questions based on brief passages, the students are worse at reading extended texts and writing extended essays—much as Mathew Arnold had predicted a century and a half earlier.8

The problem does not lie in the use of standardized tests, which, when suitably refined, can serve as useful measures of student ability and progress. Value-added testing, which measures the changes in student performance from year to year, has real utility. It has helped to pinpoint poorly performing teachers, who have then left the system.9 More importantly, value-added testing can be genuinely useful as a diagnostic tool, used by the teachers themselves to discover which aspects of the curriculum work and which do not. But value-added tests work best when they are “low stakes.”10 It is the emphasis placed on these tests as the major criterion for evaluating schools that creates perverse incentives, including focusing on the tests themselves at the expense of the broader goals of the institution.

High-stakes testing leads to other dysfunctions as well, such as creaming: studies of schools in Texas and in Florida showed that average achievement levels were increased by reclassifying weaker students as disabled, thus removing them from the assessment pool.11 Or out and out cheating, as teachers alter student answers, or toss out tests by students likely to be low scorers—phenomena well documented in Atlanta, Chicago, Cleveland, Dallas, Houston, Washington, D.C., and other cities.12 Or mayors and governors moving the goalposts by diminishing the difficulty of tests or lowering the grades required to pass them, in order to raise the pass rate and thus demonstrate the success of their educational reforms.13

An emphasis on measured performance through standardized tests creates another perverse outcome, as Campbell’s Law (explained in chapter 1) predicts: it destroys the predictive validity of the tests themselves. Tests of performance are designed to evaluate the knowledge and ability that students have acquired in their general education. When that education becomes focused instead on developing the students’ performance on the tests, the test no longer measures what it was created to evaluate. If, for example, class time is diverted to practicing multiple choice questions that resemble those on the test (perhaps by using questions from past tests), students may attain higher test scores—but without having actually learned much about the subject tested.14

Just a few years before the adoption of NCLB, the British government adopted its own system of metric evaluations for the school system. In 2008 a parliamentary commission looking into the system found many of the same dysfunctions as in the United States.15

DOUBLING DOWN ON DATA

Despite the pitfalls of the testing and accountability regime of NCLB, the Obama administration’s Department of Education doubled down on accountability and metrics in K-12 education. In 2009 it introduced “Race to the Top,” a program that used funds from the American Recovery and Reinvestment Act to induce states “to adopt college- and career-ready standards and assessments; build data systems that measure student growth and success; and link student achievement to teachers and administrators.”16 Whereas NCLB had focused on measuring the performance of whole schools, “Race to the Top” extended performance metrics to individual teachers. It provided funds to states and to school districts willing to adopt its metric agenda. Teachers were now to be rewarded based upon the measurable changes in the achievement of their pupils. That was known as “value-added scoring” or “student progress.” It was understood that teachers could not be held responsible for how high or low the scores of their students were, since that clearly depended upon many external factors over which teachers had no control. But they were to be held responsible for how much their students learned during the year. The idea was to test pupils at the beginning and end of the academic year, to discover the “value added” (though this was adjusted for risk factors such as race and family background), and to reward teachers accordingly. In some states, value added scores came to account for half of a teacher’s evaluation score. Generating the data needed to evaluate teachers under “Race to the Top” required another huge expansion of testing and assessments.17

The adoption of value-added performance metrics for teachers was spurred by the findings of economists. The early metrics showed that some teachers were indeed better than others, and that pupils assigned to them had greater educational success. Extrapolating from these limited metrics, some economists concluded that achievement gaps could be closed if only poor children could be taught by the top 15 percent of teachers, or if the lowest-scoring 25 percent of first-year teachers were dismissed. As time went on, however, it became clear that the yearly value-added gains tended to fade over time.18

PAYING FOR PERFORMANCE

Motivated by the same logic that led to “Race to the Top,” school districts began to experiment with their own pay-for-performance schemes, offering bonuses to teachers based on their value-added metrics. The results were disappointing. A large-scale experiment of paying teachers for performance in New York City ran from 2007 to 2009. A study of the experiment by the economist Roland Fryer led him to conclude that there was “no evidence that teacher incentives increase student performance, attendance, or graduation, nor … any evidence that the incentives change student or teacher behavior.”19 So too with a 2011 study from the National Center on Performance Incentives at Vanderbilt University. It found that offering teachers in Nashville bonuses based on their value-added ratings had no discernable impact.20 Earlier studies, dating back to the mid-1980s, had already reached the same conclusion. Despite such evidence, faith in pay-for-performance is so strong that its inadequacies must nevertheless be constantly rediscovered.21

The failure of pay for measured performance schemes to achieve results has not stopped the federal government from pouring ever greater resources into such efforts. In 2010, for example, the Department of Education selected sixty-two programs in twenty-seven states to receive some 1.2 billion dollars over the course of five years from its Teacher Incentive Fund. Nor is the United States unique in such efforts. Similar schemes to link teacher raises, tenure, and promotion to measured performance were undertaken in the United Kingdom, Portugal, Australia, Chile, Mexico, Israel, and India.22

THE NEVER-CLOSING “ACHIEVEMENT GAP”

Perhaps the preeminent concern of advocates of one or another form of metrics in the field of American education is the disparity in educational attainment among ethnically or racially defined groupings. That was a major motive behind the predecessors of “No Child Left Behind” and of the act itself, and it remained central to the policy of the Department of Education during the Obama administration, and to the reauthorized revision of NCLB, the “Every Student Succeeds Act,” passed in late 2015. (Like “No Child Left Behind” or “Operation Iraqi Freedom,” the title of the act expressed a pious hope.) Nor is that concern confined to the federal level: it is salient in the educational policy of many states and countless municipalities, and it dominates the agenda of teachers colleges. Schools are increasingly conceived as “gap-closing factories.”23

Yet it is striking that after decades of gathering and publicizing these metrics, the outcome has remained more or less unchanged. The positions of blacks and Hispanics relative to whites are remarkably stable. While there have been some minor fluctuations when students are measured in grades 4 and 8, there is almost no change in the ultimate result—the metrics in grade 12, that is, at the end of high school.

Pupils throughout the United States are administered tests of reading and mathematics in grades 4, 8, and 12. These are the NAEP (National Assessment of Educational Progress) tests. Experts regard them as relatively reliable indicators of performance, because, unlike some other tests, they involve “low stakes”: that is to say, the fortunes of the students, the teachers, or the schools are not affected by the outcomes, and so there is less incentive for teachers to skim the testing pool, to teach to the test, or to fabricate results. The National Center for Educational Statistics publishes an annual report, Status and Trends in the Educational Achievement of Racial and Ethnic Groups, comparing the relative rates of achievement among Asians, whites, Hispanics, and blacks (as well as some subdivisions of each of these groups) over time.

Its findings are telling. For those who took the test in grade 12, the reading achievement gap between whites and Hispanics (22 points on a scale of 500, where the average score in 2013 was 288) was no different in 2013 than it had been in 1992. The gap between whites and blacks was actually larger in 2013 (30 points) than it had been in 1992 (24 points). As for math, the report compares the relative performance of each group in 2005, 2009, and 2013. The result: the gap in scores between whites and their black and Hispanic peers remained unchanged.24

The inability of the schools to influence the relative level of educational attainment should come as no surprise. Since at least the Coleman Report, “Equal Educational Opportunity” (1966) commissioned during the Johnson administration, it has been known that the output of schools depends largely upon the inputs: student performance correlates closely to the social, economic, and educational attainment of their parents.25 “Good schools” tend to be those populated by pupils who are brighter, more curious, and more self-controlled; and these tend to be the offspring of people who are themselves relatively bright, curious, and self-disciplined. Since these traits are conducive to success, and since they tend to be passed down in families, more successful parents tend to send to schools children who are more likely to achieve educationally.

General improvements in schooling do not therefore lead to greater equality of outcomes. As the political scientist Edward Banfield noted a generation ago, “All education favors the middle- and upper-class child, because to be middle- or upper-class is to have qualities that make one particularly educable.” Improvements in the quality of schools may elevate overall educational outcomes, but they tend to increase, rather than diminish, the gap in achievement between children from families with different levels of human capital.26

Such outcomes might lead one to conclude that the achievement gap cannot in fact be closed by education—and that the reasons lie beyond the schoolhouse door. Yet measuring continues unabated. That is perhaps because, as Banfield noted, the idea that some problems are insoluble is morally unacceptable to a substantial portion of educated Americans.27 When it comes to gaps in school achievement, it seems that in the absence of discernable progress in results, the resources devoted to ongoing measurement becomes itself a sign of moral earnestness.

THE COSTS OF ATTEMPTED GAP-CLOSING

Of course, the scores on English and math achievement tests cannot measure the full benefits of K-12 education. That is not because the NAEP scores are distorted or insignificant. They do provide a useful measure of student knowledge of the subjects tested. But there is much more to school than the learning of English and mathematics: not only other academic subjects but also the stimulation of interest in the world, and the cultivation of habits of behavior (self-control, perseverance, ability to cooperate with others) that increase the likelihood of success in the adult world. Development of these noncognitive qualities may well be going on in classrooms and schools without being reflected in performance metrics based on test scores.28

In fact, the growing emphasis on testing students in English and math as early as kindergarten may come at the expense of nonacademic activities, such as creative play and the arts, that contribute to individual development but are not easily measured.29 Moreover, though exposing students to better teachers may lead to gains in academic achievement, those gains tend to fade away over time. The noncognitive gains, however, appear to persist.30 Character development matters—which has led some legislatures to try to incorporate measurement of character into their accountability systems!31

The costs of trying to use metrics to turn schools into gap-closing factories are therefore not only monetary. The broader mission of schools to instruct in history and in civics is neglected as attention is focused on attempting to improve the reading and math scores of lower-performing groups. Pedagogic strategies that may be effective for lower-achieving students (such as longer school days and shorter summer vacations) are extended to students for whom these strategies are counterproductive. And resources are diverted away from maximizing learning on the part of the more gifted and talented—who may in fact hold the key to national economic performance.32

The emphasis on measuring the achievement gap and the pressure to close it has other troubling effects. One is the blame heaped upon teachers and schools for their failure to accomplish what may be beyond their reach, and for reasons that have little to do with their own limitations. The logic of NCLB, “Race to the Top,” and similar programs, places the responsibility for closing achievement gaps on those who may have neither the power nor the ability to do so. That itself is a recipe for the demoralization of teachers. Add to that the dilemma presented to teachers: pursuing the multiple aims of education versus teaching to the test; following their broad vocational mission versus adhering to the narrow criteria upon which they are to be remunerated. Whichever course they choose, they lose. In addition, many teachers perceive the regimen created by the culture of testing and measured accountability as robbing them of their autonomy, and of the ability to use their discretion and creativity in designing and implementing the curriculum of their students. The result has been a wave of retirements by experienced teachers, and a movement by the more creative away from public and toward private schools, which are not bound by the regime of metric accountability.33

Thus, the self-congratulations of those who insist upon rewarding measured educational performance in order to close achievement gaps come at the expense of those actually engaged in trying to educate children. Not everything that can be measured can be improved—at least, not by measurement.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.60.155