Ask smart people—workforce experts, labor economists, university faculty—about the skills that schools should be teaching students, and they all talk about the same things: problem solving, analyzing and synthesizing information, thinking creatively, communicating clearly. These are the sorts of skills that higher-paying jobs increasingly demand—skills that reformers of various stripes have been pushing public education to teach since a blistering US Department of Education–funded critique of American education in the early 1980s ignited the school reform movement that continues today.
Whether the context is the changing nature of work, international competitiveness, or, most recently, calls for common standards, the premium today is not merely on students’ acquiring information but on recognizing what kind of information matters, why it matters, and how to combine it with other information—what many people now call twenty-first-century skills. Remembering information is no longer the highest priority in classrooms; instead, the emphasis is on figuring out what students can do with that knowledge in new situations. And with new research revealing that young children are far more able to engage in complex thinking skills than once thought—and that problem-solving skills are the basics on which other skills are built—teaching students how to become analytical and strategic in applying what they learn is now important in elementary classrooms as well as in high schools.
The federal No Child Left Behind Act (NCLB), passed by Congress in 2001 to promote school improvement by holding local educators accountable for their students’ achievement, called for “high standards of academic achievement [for] all public elementary school and secondary school students.” The law put the entire public education system on a new outcomes-based footing, and it cast in sharp relief the second-class educational status of students of color and those from disadvantaged backgrounds. However, the standardized tests that states introduced to comply with NCLB have generally not sought to gauge students’ grasp of the thinking skills that experts say students should master. Instead, tight NCLB testing time lines, the scale of testing required under the federal law, and pressure from state elected officials to lower costs led to tests that rely heavily on multiple-choice questions measuring mostly low-level tasks like the recall of information in reading passages. These questions can be administered and scored rapidly and inexpensively, but by their very nature, they are not well suited to judging students’ ability to express points of view, marshal evidence, and display other advanced skills.
A number of states that implemented performance assessments in the early 1990s scaled them back as a result of technical concerns, implementation burdens, or costs, especially when NCLB increased testing requirements to reach every child every year. In addition the federal Department of Education was often unwilling to approve innovative testing systems.
Because teachers tend to teach what is tested, especially when high stakes are attached to the scores, the expansion of multiple-choice measures of simple skills has narrowed the opportunities for lower-achieving students to attain the higher standards that NCLB has sought for them, and it has placed a glass ceiling over many more advanced students who are unable to demonstrate the depth and breadth of their abilities on such exams. The tests have discouraged teachers from teaching more challenging skills and from having students conduct experiments, make oral presentations, write research papers, use new technologies, and do other activities that teach such skills and pique students’ interest in learning at the same time.
The NCLB school accountability model and the standardized testing that undergirds it may have established an academic floor for the nation’s students, but it has not catalyzed the pursuit of genuinely higher standards of thinking and performance. It is therefore not surprising perhaps that US students outperform many of their international counterparts on measures of knowledge such as the Trends in International Mathematics and Science Study (TIMSS), which measures knowledge as given, while they do much less well on international tests that gauge students’ ability to apply knowledge, such as the Program for International Student Assessment (PISA). With the exceptions of a few states like Massachusetts, we are today, under NCLB, still pursuing the basic skills testing that was introduced in the 1980s, when policymakers took their first tentative steps toward holding schools accountable for their students’ performance, and long before it became clear that success in a rapidly changing and increasingly complex world required students to master much more than the ability to recognize one answer out of five.
The advent of the Common Core State Standards, the Next Generation Science Standards, and the emergence of new accountability systems under federally approved waivers from NCLB provide a potential opportunity to address this fundamental misalignment between our aspirations for students and the assessments we use to measure whether they are achieving those goals. The United States has an opportunity to create a new generation of assessments that build on NCLB’s strengths, including its commitment to accountability for the education of traditionally underserved groups of students, while measuring a wider range of skills and expanding the definition of accountability to include the teaching of such skills.
These new assessments would rely more heavily on the kinds of performance measures described in this book—tasks requiring students to craft their own responses to complex problems rather than merely select from among multiple-choice answers that in many instances require little thinking and reward guessing. They range from short-answer tasks such as constructing and explaining a problem solution to extended work like writing essays, engaging in research, and conducting laboratory investigations. Like the road test that virtually all adults have taken to gain a driver’s license, these performance assessments ask students to demonstrate that they can actually do with their knowledge when it is applied in practice.
As we have described, there are many examples of large-scale performance assessments in the United States and other countries that feature tasks in virtually all subject areas: Kentucky’s long-standing writing portfolio and the New York State Regents Examinations to the hands-on science experiments and computer simulated tasks of the National Assessment of Educational Progress (NAEP), Connecticut’s and Vermont’s high school science assessments, the Collegiate Learning Assessment, and England’s General Certificate of Secondary Education exams and similar assessments in Hong Kong, Singapore, and Australia, for example.
Research shows that well-designed performance assessments yield a more complete picture of students’ abilities and weaknesses and can overcome some of the validity challenges of assessing English learners and students with disabilities. The use of performance measures has been found to increase the intellectual challenge in classrooms and support higher-quality teaching. Students who routinely engage in instruction where they are expected to demonstrate applications of their knowledge and explain and defend their answers have often been found to outscore other students on both traditional tests and more complex measures.
By involving teachers in scoring essays and other performance measures, the way assessment systems in high-achieving nations and some states do today, teachers can become more knowledgeable about how to evaluate and teach to challenging standards. Teacher involvement in scoring has been found to offer a powerful professional development opportunity that translates into a stronger ability to design and implement standards-based curriculum. Such tests are thus tied more closely to the improvement of classroom instruction and can support more expansive and productive student learning.
All of these factors are driving the increased use of performance assessments around the world. As the Hong Kong Examinations and Assessment Authority (2009) explained while introducing new school-based performance assessments into its examination system:
The primary rationale for school-based assessments (SBA) is to enhance the validity of the assessment, by including the assessment of outcomes that cannot be readily assessed within the context of a one-off public examination, which may not always provide the most reliable indication of the actual abilities of candidates. . . . SBA typically involves students in activities such as making oral presentations, developing a portfolio of work, undertaking fieldwork, carrying out an investigation, doing practical laboratory work or completing a design project, [that] help students to acquire important skills, knowledge and work habits that cannot readily be assessed or promoted through paper-and-pencil testing. Not only are they outcomes that are essential to learning within the disciplines, they are also outcomes that are valued by tertiary institutions and by employers.
There are numerous challenges to using performance measures on a much wider scale, such as ensuring the measures’ rigor and reliability. But valuable lessons can be learned in addressing such challenges from a growing number of high-achieving industrialized nations that have successfully implemented performance assessments for many years, from a series of state experiments with performance assessments in the 1990s, the expansion of the Advanced Placement (AP) and International Baccalaureate (IB) programs, and the growth of performance measures in the military and other sectors, developments that have been aided by substantial advances in testing technology. This large body of work suggests that performance assessments pay significant dividends to students, teachers, and policymakers alike and that the assessments can be built to produce confident comparisons of individual student performance over time and comparisons across schools, school systems, and states.
Our goal in this book has been to provide a thorough analysis of the prospects for and challenges of introducing and sustaining standardized performance assessments on a large scale. We have studied extensively the history and current uses of performance assessments in the United States and abroad, the technical advances that have been made, and the impacts that have been documented by researchers.
The challenges associated with using performance measures on a large scale include the need to ensure the tests’ rigor and technical reliability and to manage their cost and time requirements. The experiences of a growing number of high-achieving nations that use large-scale performance assessments effectively, the record of the IB and AP testing programs, successful state experiences with performance assessments, and the growth of performance measures in the military and other sectors illustrate how such assessments can be reliably and cost-effectively incorporated into testing systems. And studies have demonstrated that performance tasks can be designed in ways that allow them to measure student achievement accurately and permit the comparison of results across students and schools and from year to year—necessary features of tests used to hold schools accountable for their students’ results.
The research reviewed in this book shows that creating reliable, valid, feasible, and cost-effective performance assessments can be developed with attention to these topics:
Costs, especially for scoring, are another concern. Studies have found that performance-based tests tend to be about twice as expensive as tests that rely exclusively on multiple-choice questions. But a detailed cost modeling study grounded in real-world prices shows that it is possible to construct large-scale assessments that combine multiple-choice questions and performance measures for no more than today’s much-less-informative tests—about twenty-five dollars per pupil for English language arts and math combined. This can be accomplished by taking advantage of the economies of scale that will accompany states banding together in consortia, tapping the efficiencies of technology in administering tests and supporting scoring, and using teachers strategically in the scoring of performance items.
Appropriate, affordable, and educationally supportive scoring models must be developed. In most European and Asian systems, and in those used in several US states, scoring of assessments is conducted by teachers and time is set aside for this aspect of teachers’ work and learning. While teacher time to create and score the assessments can be substantial, these activities lead to more skilled and engaged teachers. Teachers often report that some of the best professional development of their careers occurs when they have opportunities to examine, score, and discuss student work. Importantly, international assessments have strategically captured teacher professional development time to evaluate and validate student work. Capitalizing on this time can both lower costs and establish a common language around curriculum standards and assessment.
While the use of performance tasks does require time and expertise, educators and policymakers in high-achieving nations believe that the value of rich performance assessments far outweighs their cost. Nations around the world have expanded their use of performance tasks because these deeply engage teachers and students in learning, make rigorous and cognitively demanding instruction commonplace, and, leaders argue, increase students’ achievement levels and readiness for college and careers. While looking to economize, it is also important to put the costs of high-quality assessment into perspective. Even if states spent fifty dollars per pupil on assessments each year (more than twice the estimated costs of a balanced system), this would still be far less than 1 percent of the costs of United States education overall.
Performance assessment is a key component in a balanced assessment system that responds to fast-paced changes placing greater demands on education and knowledge development in the United States and around the rest of the world. Images of what students will need to do with their knowledge should help shape formulations of curriculum, instruction, and assessment policy at the national, state, and local levels. As a starting point for the development of the next generation of assessments, policymakers must begin with a vision of young people as lifelong learners who deeply understand core concepts and modes of inquiry within the disciplines and who can also work across disciplines to evaluate evidence, frame and solve problems, express and defend their ideas, and create new ideas, technologies, and solutions.
We have noted that consortia of states have undertaken efforts to refine standards for learning, so that they are internationally benchmarked and are fewer, higher, and deeper. To ensure that new assessments are developed to fully represent the new standards, federal and state policy should:
Current accountability reforms are based on the idea that standards can serve as a catalyst for states to be explicit about learning goals, and the act of measuring progress toward meeting these standards is an important force toward developing high levels of achievement for all students. However, an on-demand test taken in a limited period of time on a single day cannot measure all that is important for students to know and be able to do. As described by Achieve (2004), a national organization of governors, business leaders, and education leaders, the limitation of traditional on-demand tests is that they cannot measure many of the skills that matter most for success in the worlds of work and higher education:
States . . . will need to move beyond large-scale assessments because, as critical as they are, they cannot measure everything that matters in a young person’s education. The ability to make effective oral arguments and conduct significant research projects are considered essential skills by both employers and postsecondary educators, but these skills are very difficult to assess on a paper-and pencil test. (p. 3)
Balanced systems of assessment that include performance assessments have the potential to strengthen curriculum and instruction by evaluating the full range of standards in valid and appropriate ways, providing rich information about student learning that is useful to classroom teachers, and providing diverse means for students to demonstrate their learning. Developed carefully and used properly, such assessments can stimulate more thoughtful teaching, become an engine for ongoing improvement and professional development, and create a commitment to standards that shape more powerful learning.
13.58.197.26