Chapter 7
Supporting Teacher Learning through Performance Assessment

Linda Darling-Hammond and Beverly Falk

As the internationally benchmarked Common Core State Standards (CCSS) have been adopted by forty-five states across the country, educators are seeking ways to support an increasingly diverse student population to meet these more demanding expectations. The likelihood that students will achieve the aims of the standards will be substantially shaped by how well teachers teach these more challenging academic skills in ways that support a wide range of learners. Teachers’ understanding of the standards and their grasp of how to teach them will also influence whether the new assessments provide useful insights, rather than harmful side effects, particularly for students who have historically been least well served by their schools.

As we have outlined in previous chapters, performance assessments will be needed to evaluate the college- and career-ready skills that the CCSS intend. Indeed, performance assessments themselves offer important opportunities for teacher learning about how to develop these skills. Research and experience suggest that teachers’ involvement in developing, scoring, and analyzing the results of performance-based student assessments can help them learn about the standards, their students, and their teaching practice. This kind of professional learning can help them acquire the tools to teach the more complex skills and knowledge that are crucial to preparing our citizens for the global workforce.

Designing professional learning opportunities that can strengthen the capacities of teachers to support more ambitious teaching, and thus enhance learning for all students, is challenging. The limitations of short-term training models, such as the all-too-common one-shot workshop designed to transmit information to passive recipients, are well known (Lieberman & Miller, 2001; McLaughlin, 2005; Wei, Darling-Hammond, Andree, Richardson, & Orphanos, 2009). In fact, research shows that what teachers have dubbed “drive-by” professional development has little effect on teacher practice and virtually none on student achievement, despite the fact that this form of teacher development is ubiquitous in American schools (Yoon, Duncan, Lee, Scarloss, & Shapley, 2007). By contrast, significant gains in student achievement can result from strategies that engage teachers in content-specific activities linked to collegial analyses of student work and learning over a more sustained period of time (Wei, Darling-Hammond, & Adamson, 2010).

This chapter describes how teacher learning through involvement with student performance assessments has been accomplished around the world, particularly in countries that have been recognized for their high-performing educational systems. We discuss how teachers’ engagement with performance assessments influences their understanding of the standards and their students’ abilities. This discussion includes teachers’ reports about their experiences with performance assessments, as well as the results of published research. Finally, we recommend how these kinds of performance assessment opportunities can be planted and scaled up as states and districts implement CCSS and deepen their efforts to teach twenty-first-century skills.

TEACHER ENGAGEMENT IN ASSESSMENT IN HIGH-PERFORMING COUNTRIES

Contemporary efforts to raise standards in the United States have developed in response to the more competitive expectations and outcomes for student learning around the globe. High-achieving jurisdictions—including Finland, Japan, Singapore, New Zealand, and Hong Kong—teach fewer topics in more depth, focus more on reasoning skills and applications of knowledge, and have a well-worked-out sequence of expectations based on how students typically progress in mastering specific skills over time (Schmidt, Wang, & McKnight, 2005). In the CCSS, as in some other countries’ standards, these expectations, called learning progressions, can help guide teachers’ judgments about what to teach next.

As we described in chapter 4 of this volume, high-performing nations increasingly use open-ended performance tasks to give students opportunities to develop and demonstrate twenty-first-century skills, such as the ability to find and organize information to solve problems, frame and conduct investigations, analyze and synthesize data, and apply learning to new situations. Students solve extended problems in mathematics and the sciences, showing and explaining how they are approaching the task; compare and synthesize evidence from different kinds of data and texts; and compose essays that explain and defend their thinking.

The growing emphasis on project-based, inquiry-oriented learning in high-performing nations has also caused many of these countries to introduce school-based tasks into their assessment systems: research projects, science investigations, and development of products ranging from software solutions to engineering designs. These tasks, incorporated into examination scores in contexts as far-ranging as Britain, Canada, Singapore, Australia, New Zealand, and the International Baccalaureate program, focus teaching and learning on the development of higher-order skills and the use of knowledge to solve problems.

Rather than attempting to keep testing separate from the teaching and learning process, these systems integrate curriculum, instruction, and assessment in ways that improve both teaching and students’ learning. Teachers are engaged throughout the assessment process in developing, reviewing, scoring, and analyzing the results of students’ assessments, which enables them to understand the standards and develop stronger instruction. Tests are not kept remote and mysterious. Developing, reviewing, and scoring assessments—including those that are used for summative accountability purposes—is often part of teaching work.

The use of curriculum-embedded assessments provides teachers with models of good curriculum and assessment practice, enhances curriculum equity within and across schools, and allows teachers to see and evaluate student learning in ways that can inform instructional and curriculum decisions. Such curriculum-embedded assessments can also build students’ capacity to assess and guide their own learning.

Teachers score these open-ended tasks through a process called moderation, in which they receive training and then score and discuss model answers until their judgments are reliable—that is, they accurately represent the standards and are consistent with one another. Sometimes these moderation processes occur within schools; at other times, teachers are assembled from across a region. Teachers use benchmark examples of student work at different levels, along with a rubric or set of scoring criteria, to calibrate their own judgments. As teachers learn to look for the key features of the work expressed in the criteria, they become more aware of the elements of strong student performance. As they continue to score and discuss the work, they fine-tune their capacity to evaluate so that high rates of reliability are achieved.

Equally important, the scoring process and the discussions around student work help teachers reflect on their curriculum, teaching, and assessment strategies, thus becoming more effective at teaching the standards (Darling-Hammond, 2010). Such involvement heightens the probability that teachers—the critical players for enacting educational change—will come to understand and embrace the standards and be able to use the data from the new assessments. Lauren Resnick (1995), professor and codirector of the Institute for Learning at the University of Pittsburgh, emphasizes this fact in her writings about teachers’ work with standards:

Standards documents, even elegant ones with benchmarks and commentary, can affect achievement only if the standards come to be held as personal goals by teachers and students. . . . That will happen only if a concerted effort is made to engage teachers and students in a massive and continuing conversation about what students should learn, what kinds of work they should do, and how well they should be expected to do it. (p. 210)

Involving teachers in scoring assessments is powerful professional development because it connects teacher learning directly to their examination of student learning and gives them the opportunity to think together about how to improve that learning. It also sends an important message by signaling that teachers can be active participants in shaping the direction of school change. As this kind of professional development acknowledges the critical role of teachers in supporting students’ learning, it puts teachers in their rightful place: at center stage in the school improvement process.

TEACHERS’ INVOLVEMENT IN PERFORMANCE ASSESSMENT IN THE UNITED STATES

Like behind-the-wheel drivers’ tests, performance assessments require people to show what they know by demonstrating their skills in action. In education, performance assessments can include everything from extended written responses and oral presentations of research and communication skills, to graphical and other representations of problem solving, to the conduct and reporting of science experiments, or even musical or artistic presentations.

As noted in earlier chapters, many states used performance assessments in the 1990s, prior to passage of the No Child Left Behind Act (NCLB). Although these assessments were discontinued in most places because of US Department of Education regulations and the costs associated with the every child, every year testing requirements, some states and localities continued to use performance assessments because of their commitment to teaching and measuring higher-order skills. (See chapter 3, this volume.) New initiatives are now under way as a consequence of the CCSS and the movement to expand students’ opportunities to develop twenty-first-century skills. We describe how these initiatives have involved teachers and supported their learning.

Studies of the implementation of performance assessments in California, Connecticut, Kentucky, Maine, Maryland, Missouri, New Hampshire, New York, Ohio, Rhode Island, Vermont, and Washington State found that the portfolios and performance tasks could ultimately be scored quite reliably by teachers. Furthermore, the assessments supported improvements in instruction and student learning (Darling-Hammond & Rustique-Forrester, 2005). The results were most positive when states or districts developed teachers’ expertise for designing, scoring, and evaluating the results of the assessments (Borko, Elliott, & Uchiyama, 2002; Falk & Ort, 1998; Darling-Hammond, 2004; Sheingold, Heller, & Paulukonis, 1995; Wolf, Borko, McIver, & Elliott, 1999).

Researchers found that scoring the assessments led teachers to work on instruction. Examining students’ work helps teachers to learn more about what their students know and can do, as well as how they think. Doing this in the context of standards and well-designed performance tasks stimulates teachers to consider their own curriculum and teaching. Together, teachers can then share specific instructional approaches that can be used to support the strengths and needs of their students.

“Sitting Down” to Score

Scoring sessions typically begin with an orientation process that helps teachers learn to use standards as a reference for evaluating students’ responses. A scoring orientation usually goes something like this. Aided by a facilitator, teachers, working in small groups, look over each assessment task and discuss the specific standards that the tasks assess. They then review each task’s criteria and discuss what students need to do to accomplish the task. Together teachers examine sample student responses, referencing the scoring guide—also known as a rubric—for descriptions of what the completed work looks like at different levels of proficiency. Discussing the student work detail by detail, they then compare the evidence in each response to the rubric’s indicators until they arrive at a consensus for a score.

Differences in opinions and perspectives among teachers can be mediated by using the rubric’s clearly articulated criteria for performance and the requirement that teachers always justify their evaluation using evidence from the student work. Although viewpoints may initially vary, teachers begin to agree on the scores that they assign after going through several sample responses. Recurring consensus signals completion of the orientation. Only then do teachers move on to begin the independent scoring of tasks used to assign official scores.

Using Standards to Guide Evaluation

Learning how to use a rubric helps teachers evaluate students’ work based on evidence rather than on feelings or assumptions. In the course of scoring, they learn to apply common criteria and standards to the work of all their students rather than just comparing one student’s work to another’s. Learning to use evidence as a result of participating in standards-based scoring often transforms the way teachers evaluate student work. As one elementary teacher who participated in a statewide performance assessment of student work project put it: “I moved away from thinking about work in an A, B, C, or D way, to thinking about the criteria for performance and the evidence that would justify my evaluation” (Falk & Ort, 1998).

As teachers work with content standards, rubrics, and student benchmark papers during scoring, they come to think more deeply about their teaching. Content standards provide an overarching conception of the discipline itself as they specify what the important aspects of a subject area are. Scoring rubrics identify the features of student work that are important and guide teachers in looking for these features. Actual examples of student work at differing levels of performance offer reference points for understanding to what degree standards are met. In these ways, participating in scoring helps teachers to clarify goals and expectations for their teaching and for their students’ learning. In addition, it deepens their knowledge of their discipline, reveals important information about what their students know and can do, and offers them insights to improve their teaching.

Learning to evaluate student work with a clear, objective eye also helps to safeguard against biases that teachers may have about students’ capabilities. When teachers carry this approach back into the classroom, they can better recognize the varying strengths of diverse learners. This understanding in turn makes it harder to attach labels and judgments to students, which often have the unintended effect of becoming self-fulfilling prophecies. In addition to safeguarding against bias, keener observation of students and their work helps teachers make more informed decisions and provide better supports for learning (Falk, 2001).

In addition, engagement in scoring sessions strengthens teachers’ sense of professionalism, heightens their understandings about the workings of the system of which they are a part, and reaffirms the central importance of teachers to the evaluation process. Instead of having to rely on testing companies to judge the outcomes of students’ work, teacher involvement in scoring places assessment back into the domain of teaching where it can be readily accessed to inform and support learning. Several important aspects of this process are discussed below.

Conversing about the Standards

One of the most valuable aspects of this work is the opportunity that scoring sessions provide for collegial conversations. These discussions—which can take place before, during, or after scoring—enable teachers to learn about state or district expectations for their students, hear about how other teachers interpret the standards, and see how the big ideas embodied in standards play out in real student work. Working with the standards helps teachers gain greater perspective about what is valued and valuable in their broader community (Falk & Ort, 1998). In addition, the scoring experience helps them to develop shared understandings and a common language about the essentials of their disciplines, which develops a sense of professional community and can facilitate more coherent instruction across classrooms.

Looking at Performance on High-Quality Tasks

Scoring worthy tasks gives teachers a window into what their students can do as well as how their students think. Well-designed tasks are contextualized in real-world situations, they ask students to show and explain their work, and they allow students to demonstrate their abilities in multiple ways (Darling-Hammond & Wood, 2008). From this evidence, teachers learn more about the variety of ways students approach and solve problems.

Furthermore, because the expectations are publicly articulated, students have a better chance of achieving them. This makes the assessments fairer and more accessible for different kinds of learners (Abedi & Herman, 2010). These features of assessments can help expand teachers’ visions of what good work may look like when it comes from a wide range of students across many locales and backgrounds. According to an elementary teacher who administered and scored a math/science/technology task, “Looking at student responses to the assessment tasks reinforces the idea that good work can look very different and can take on many forms” (Falk & Ort, 1998, p. 62).

Exploring Teaching

Many teachers say that participation in scoring motivates them to strengthen their practice, not only to better prepare their students for tests but also to improve the teaching and learning that goes on in their classrooms. The following comments from New York public school teachers illustrate the types of changes teachers planned to make to their practice as a result of their participation in a recent performance-assessment project in New York:

“I plan to give kids rubrics detailing what makes ‘quality’ work.”—Elementary teacher

“I will provide more opportunities for revision, self-analysis, and evaluation.”—Elementary school teacher

“I want to make the open-ended questions I ask clear enough to get the information I want to get from the students. I will also make grading criteria very clear, very related to the question, and available to students ahead of time.”—Middle school teacher

“I will do more testing requiring justifications—help students to become more comfortable explaining their understandings.”—High school mathematics teacher

Teachers’ discussions about students’ work in scoring conferences offer them opportunities to learn from each other about new practices and educational processes and to validate their knowledge as competent professionals (Little, Curry, Gearhart, & Kafka, 2003). Teachers involved with scoring state performance assessments note how much they appreciate this opportunity:

“Meeting with dedicated, concerned teachers was most valuable to me. I learned from their positive attitudes, from discussing concerns about my students and about future directions for my discipline.”—High school science teacher

“I don’t think you can underestimate the need that folks have for getting together and having quality time to reflect on all of the changes that are happening in our schools.”—High school social studies teacher

“Scoring sessions provide valuable professional dialogue. It is a great way to do teacher in-service.”—High school English teacher

CURRENT PERFORMANCE ASSESSMENT INITIATIVES

Performance assessment initiatives are once again developing across the country as states have adopted the Common Core State Standards, which call for higher-order thinking skills not measured on traditional multiple-choice tests. At least a dozen states have some form of performance assessment activity occurring at the state or local level,1 and a number of others have plans in the works. In this section, we discuss initiatives in California, Ohio, New York, and New England that take distinctive forms, offering a range of lessons about how teacher engagement with performance-based assessments improves teaching and learning.

The Silicon Valley Mathematics Assessment Collaborative

Since 1998 a group of school districts in California’s Silicon Valley have supplemented the state testing system with a common set of performance assessments (Foster, Noyce, & Spiegel, 2007). The Mathematics Assessment Collaborative (MAC) uses tasks designed by the Mathematics Assessment Resource Service (MARS) from second grade through high school. The MARS assessments, which are developed by mathematics teachers working with researchers, feature tasks that test key mathematical concepts along with the mathematical practices now codified in the Common Core Standards: problem solving, modeling, reasoning, and communication. This example is illustrative:

Teachers score the tasks using a rubric that takes into account how students approach the problem, their solutions, and their ability to justify or generalize their solutions. MAC also offers professional development for teachers and data supports for districts. Although it is still centered in California, MAC has grown to more than ninety school districts and charter school networks located in several states. Between forty thousand and eighty thousand student assessment papers are hand-scored annually (Paek & Foster, 2012).

The ongoing professional development and coaching offered by the project begins with teachers coming together within their districts to score the MARS exams on professional development days. They receive training and calibration support to score reliably to the standards. It is worth noting that annual audits of a randomly selected 5 percent of the student papers find very high reliability: a recent analysis found the mean difference between the original score and the audit score was only 0.01 point (Foster et al., 2007).

At the end of the scoring day, teachers spend time reflecting on students’ successes and challenges and implications for instruction. Teachers, supervisors, and coaches all see this aspect of the project as valuable for teacher learning:

“Scoring the MARS test is the single most valuable professional development we have done with our teachers in mathematics. The full day of scoring the tests leads to rich conversations about what we expect from students and how our students think mathematically. We see real buy-in from teachers.”—Assistant superintendent of instruction, suburban district

“We joined the Silicon Valley Mathematics Initiative and decided to give the MARS test. We didn’t know what we signed on for or how much work it would be. At one point, I thought we were over our heads. But we continued to forge ahead with the scoring session. I have to say it is one of the most rewarding days I have had in education. We got them all scored and the teachers were great. They really felt they had had an opportunity to explore what was in the students’ heads. They came away convinced this way of scoring student work had changed forever the way they will teach.”—Math coach, low-income urban school district

“At first when we were training to learn to score the MARS task, I was very skeptical about the process. There were a lot of concerns among the teachers. Some of us really pushed back on the facilitator. But after going through the standardizing papers and especially after spending the full day scoring tests, it became very obvious we were focusing on what the students really knew and could explain. We all seemed to discover the same problems the students were having doing real math.”—Sixth-grade teacher

Researchers who have evaluated the MARS process explain how this learning occurs:

To be able to score a MARS exam task accurately, teachers must fully explore the mathematics of the task. Analyzing different approaches that students might take to the content within each task helps the scorers assess and improve their own conceptual knowledge. The scoring process sheds light on students’ thinking, as well as on common student errors and misconceptions. As one teacher said, “I have learned how to look at student work in a whole different way, to really say, ‘What do these marks on this page tell me about [the student’s] understanding?’” Recognizing misconceptions is crucial if a teacher is to target instruction so that students can clarify their thinking and gain understanding. The emphasis on understanding core ideas helps teachers build a sound sequence of lessons, no matter what curriculum they are using. (Foster et al., 2007, p. 141)

The learning does not stop there. Once the papers are scored, they are returned to the schools, along with a copy of the master scoring sheets, for teachers to review and use as a guide for further instruction. It is important to note that teachers receive the tasks and rubrics, along with real student work, not just abstract scores. Districts also receive data for both the MARS tests and the state exams, along with a set of reports that can inform professional development, district policy, and instruction.

These reports include an annually produced document, Tools for Teachers, which includes an analysis of student thinking, understandings, errors, and misconceptions derived from an in-depth study of the tasks. The study’s findings are combined with sample student work and a set of suggestions, strategies, and questions for teachers to use to inform and improve instruction. MAC offers training and coaching for teachers, teacher leaders, and coaches around these tools. This combination of assessment materials, student work, and professional supports contributes to improvements in curriculum development and instruction, as participants explain:

“We believe the value of the MARS tasks is for formative assessment. We coaches meet with our teachers during early release days. They bring the student work from a MARS task they have given to their classes. We score it, analyze it, and discuss it with the teachers. Afterwards we plan reengagement lessons. They are designed to address students’ misconceptions and errors. This process has caught on, and teachers are having success deepening students’ understanding.”—Math coach

“Our grade-level team uses the MARS tasks a lot. We use them to make sure the students can do real-life math problems. We also like reading the students’ explanations. At the start of the year the students do not know how to explain, and they barely write anything. So we use them regularly, and by December they are writing more, and it becomes insightful for our team to go through their solutions.”—Third-grade teacher

Research over a thirteen-year period shows that as teachers and schools participate in this process, students’ mathematics performance improves significantly on both the MAC tasks and the more traditional state tests (Paek & Foster, 2012). As teachers learn how to evaluate student needs and design their instruction to produce stronger mathematical understanding, their students’ improvement is stronger. Across thirty-five districts participating in one recent study, students in all grades (third grade through tenth grade) had better outcomes the longer they had been taught with teachers who had participated in scoring, coaching, and professional development through the Mathematics Assessment Collaborative. Furthermore, students of teachers who had received more intensive coaching around formative assessment uses of the tasks had stronger results (Paek & Foster, 2012).

These outcomes result from a combination of factors: the performance assessment, the scoring sessions, the performance reports, and use of MARS assessments and rubrics for formative assessment, instruction, coaching, and professional development. Researchers note:

The scored tests themselves become valuable curriculum materials for teachers to use in their classes. MAC teachers are encouraged to review the tasks with their students. They share the scoring information with their students, and build on the errors and approaches that students have demonstrated on the exams. . . . The Mathematics Assessment Collaborative fights teachers’ sense of isolation and helplessness by sharing everything it learns about students. It identifies common issues and potential solutions. It helps teachers understand how learning at their particular grade level is situated within a continuum of students’ growing mathematical understanding. It promotes communication across classrooms, schools, and grade levels. It encourages teachers to take a longer, deeper view of what they are working to achieve with students. (Foster et al., 2007, pp. 141)

Together these supports result in strong teaching and stronger learning within and across classrooms, schools, and districts.

Ohio Performance Assessment Project

Extended curriculum-embedded performance tasks are the basis for the Ohio Performance Assessment Pilot Project (OPAPP). (See also chapter 3, this volume.) In 2008, Ohio undertook a statewide project to develop and pilot performance-based assessments developed in partnership with the Stanford Center for Assessment, Learning, and Equity (SCALE) (Wei, Schultz, & Pecheone, 2012). The assessments in English language arts, mathematics, and science are mapped to the Common Core State Standards and the anticipated Next Generation Science Standards and are designed to promote and evaluate students’ learning of content and skills that will prepare them to be successful in college and in careers. Within the context of Ohio policy, the assessments may be used as project components for course examinations, as proficiency measures to grant credit based on competence instead of seat time, or as options for the senior project encouraged in Ohio schools. They are intended not only to contribute to the development of a new multiple-measures assessment system, but also to support improvements in instructional practice.

The project has involved educators from thirty schools in constructing and piloting a set of curriculum-embedded, one- to four-week-long formative learning tasks and conceptually relate, but shorter (one- to three-day) assessment tasks. The Ohio Department of Education is now expanding the project to include history/social science and career-technical education and will build a task bank. In addition, it plans to expand the network of districts in the pilot project; build the capacity of local educators, administrators, coaches, and regional assistance centers to carry on this work into the future; and build a technology platform that will support the scaling up and implementation of a performance-based assessment system.

The curriculum-embedded performance tasks include in-class, collaborative components, which are supported through a set of lessons and interim products, followed by individual student products that are scored. In the English language arts tasks, for example, students read and take notes on required or self-selected texts and have an opportunity to engage in discussions with classmates on those texts prior to using the texts in their final essays, which they develop individually. Some tasks have group collaboration built in, along with peer and teacher feedback on drafts of products. The following example, Got Relieve IT? is one of them:

The tasks are standardized with respect to guidelines for the kinds of supports, instructional scaffolds, and feedback teachers are allowed to provide to students without decreasing the rigor and challenge of the tasks. In the learning tasks, there is also some flexibility: teachers can choose when to implement the tasks, and students or teachers may have choices within the task, such as choices of texts. The shorter assessment tasks, which are focused more explicitly on measurement, are more standardized.

Teachers initially receive two days of professional development to help them understand how to use the tasks, including an opportunity to complete key elements of the tasks themselves and reflect together about their approaches or solution strategies. In addition, the Ohio Department of Education and districts provide ongoing coaching during the year to support implementation. Teachers are trained to score, and their scoring is calibrated until it becomes consistent. They are allowed time after scoring to analyze student work and reflect on their experiences and implications for future instruction.

The OPAPP performance assessments are scored using genre-specific, descriptive four-level analytic rubrics. These are introduced to students before they undertake the task and are used to provide detailed feedback to students about their performance after they finish. Furthermore, as the developers of the OPAPP assessments explain:

Since the rubrics are genre-specific rather than task-specific, and the same dimensions are scored across tasks, it becomes possible to track a student’s progress along the same dimensions of performance over time across years and courses in the same discipline (e.g., science inquiry, math problem-solving). (Wei et al., 2012, p. 11)

Many teachers were involved in developing and fine-tuning the OPAPP assessment tasks and rubrics, as well as piloting and scoring the tasks. Surveys revealed that virtually all of the teachers felt positively about the professional development they received, including participating in the tasks themselves and planning together for conducting the tasks (Wei et al., 2012). Teachers also reported learning from conducting the tasks, as well as scoring them. In science, for example, many teachers had not previously been involved with inquiry-based teaching, and the project provided insights into their students’ capacities and about how to provide more engaging instruction. Here is what some teachers had to say:

I learned that I need to have higher expectations for my students and to do more inquiry based labs. My students exceeded my expectations. They were able to complete the task with very little direction from me.

I saw students who were very engaged and invested in the designs of their cars. They were having great evidence-based discussions to determine which changes to make on their cars. I asked the kids if they liked what they were doing and they answered a resounding “yes.”

My students came in voluntarily during lunch and after school to research the normal values for each of the diagnostic tests and what the results meant related to the “Medical Mystery task” . . . that is the first time I have seen that in 20 years of teaching. (Wei et al., 2012, p. 42)

Some students also expressed their enthusiasm for tasks that allowed them to think critically, make decisions, and learn for themselves. According to one student, “This task was better than what my teacher usually does because we were able to make decisions and figure out how to solve the problem, instead of just mindlessly following his directions” (Wei et al., 2012, p. 41).

As in other performance assessment contexts, the scoring sessions were often viewed as “the most useful for improving my teaching and understanding of where my students are in comparison to where they should be.” More than 97 percent found the opportunity to discuss their experiences with other teachers highly useful. According to teachers, many kinds of learning resulted from the scoring sessions:

I learned how better to apply rubric expectations in my instruction by clarifying what those things “look like” within student examples.

I learned just how difficult it is for students to do the reflection piece of the project. I will try to construct some models and activities to help students to better accomplish this task.

Students did a lot of good work, but had trouble labeling and explaining. [I] need to emphasize that more in the classroom.

I will be incorporating communication skills in “bite-sized” chunks into day-to-day teaching; and incorporating smaller “rich” problems into homework. (Wei et al., 2012, p. 45)

The researchers noted that in order to support professional learning and improvements in practice, it is important that teachers have the following:

  • Opportunities to learn and practice the content and skills necessary to implement the performance tasks, as well as administrative support
  • Access to communities of practice engaged in performance assessment work, with opportunities to collaborate in planning for and reflecting on implementation
  • Opportunities to analyze student work samples and scores and to learn to score using a common set of evaluation criteria, engage in calibrating conversations with colleagues, and score at least one class set of their own students’ work in order to analyze patterns of performance across students

The Quality Performance Assessment Initiative

The Quality Performance Assessment (QPA) program was launched by the Center for Collaborative Education (CCE), a Boston-based, nonprofit organization that partners with public schools and districts in several New England states to create and sustain effective and equitable schooling.2 CCE aims to strengthen and document local assessment systems by introducing common tasks and moderated-scoring processes. The QPA work has led to the creation of a set of CCSS-aligned performance tasks with teacher guides and student work samples that have been field-tested in schools and are documented in the book Quality Performance Assessment: A Guide for Schools and Districts (Center for Collaborative Education, 2012). In addition to the disciplinary focus of the CCSS, the initiative has placed an explicit emphasis on “habits of mind,” a term that refers to the complex, critical thinking, problem-solving, and communication skills students will need in college and throughout their careers and civic life.

The original initiative started with schools that practiced performance assessment at the individual classroom and school level. These schools developed the QPA model and field-tested the process with common performance assessments used across schools. The work of QPA has since expanded beyond the original network to schools, districts, and states where QPA provides professional development and resources for implementation. These partners develop and use their own unique assessments while also working to develop and score common tasks that can contribute to high-quality, reliable, large-scale performance assessment systems. QPA brings greater rigor to the process of task development and scoring, as well as analysis of student work and instruction.

LEARNING FROM SCORING

Many teachers who have attended QPA scoring sessions note that their experience deepened their understanding about how to teach to twenty-first-century skills. This quote from one of them captures the sentiment of the group: “[This work] has got me thinking about how to use twenty-first-century skills in my assessments and [how to] grade the work not the kid.”3

William Hart, assistant superintendent of the Pentucket Regional School District in Massachusetts, explains:

Teachers are now using the common rubrics to guide the type of project or task they can develop to marry concept/content acquisition and twenty-first-century skills. They are asking questions like: “How do I shift the instructional environment to do both? How am I helping kids develop as collaborators?”

Measuring worthy skills and knowledge in this way has driven the context of the classroom.

As teachers define these worthy skills, they also learn to measure them based on evidence rather than subjective hunches. Laurie Gagnon, QPA director, notes:

There is great power in grounding the conversation in evidence . . . in discussing, “What do I mean by a well-chosen and supported quote? What does it mean to write a good thesis statement? What does this really look like?” Conversations around such questions have yielded big learning for teachers.4

Christina Brown, director of QPA’s Principal Residency Network, compares the process of learning to score student work with what happens in umpire school: “[Just as] the prospective umpire learns to distinguish between a ball and a strike, and to know the criteria for what each means, [scorers] learn the details of what proficiency actually looks like.”5

Using common assessments and rubrics and engaging in collective scoring of student work helps to create coherence for the teaching that takes place across grades in a school. Jeanne Sturgess, a staff developer at Souhegan High School in Amherst, New Hampshire, explains:

Before [working with the QPA initiative] we did not always have consistent alignment of learning outcomes across teams and classes. Teachers might have had rubrics aligned with standards, but the work was not necessarily comparable across classes. The work we have done with QPA over the past two years has focused on trying to ensure that if ninth-grade science teachers are all doing the same project with the same rubrics, they will all make a similar judgment about the students’ work. Using common rubrics and performance tasks has created great opportunities for teachers to push their thinking about the level of rigor we ask of students and the level of equity we provide. Although this work presents a huge challenge, the commonness of our work—the shared accountability—offers the best of what standards can bring.6

Addressing the Opportunity Gap

In addition to developing coherence across schools and districts, a consequence of using common tasks and rubrics is that teachers begin to understand how their own students match up to commonly held expectations about necessary knowledge and skills. This can be an eye-opening experience for teachers who work in varying contexts. They sometimes find discrepancies between what has been considered proficient in one type of community and what has been considered proficient in a community with different demographics. Again, according to QPA director Gagnon, “Surfacing these differences can lead to tough conversations among educators about issues of equity—conversations that may be hard but important to think about. Naming the challenge and having the evidence to substantiate it can lead to positive conversation and change.”7

QPA principal network director Brown underscores the importance of having such conversations as a means of providing access to excellence for those who have not had equitable access to educational opportunities:

To determine proficiency in learning, it doesn’t matter in the long run if a kid comes from a hard life. We do kids a disservice if they are not held to a high standard. This is an equity issue. We need to hold everyone to the same high standard. We need to give accurate feedback—tell them when they don’t meet the standard and help them understand what we are looking for in actual work and in defense of that work. We need to get away from the soft and mushy. When teachers do this together they are developing knowledge of an agreed-upon standard for the community. They are holding students accountable and themselves accountable.8

Building Communities of Practice

Hard but necessary conversations about equity are most productive when they are supported by a community of practitioners who hold common goals. The work of sharing common resources and assessments as well as engaging in cross-school scoring nurtures this kind of professional community.

Todd Wallingford, curriculum director for secondary English language arts and social studies at Hudson Public Schools in Massachusetts, which is part of the QPA network, underscores the point: “For the past three years, Hudson’s collaboration with QPA has steered us toward building a stronger professional culture founded upon the development and scoring of common performance assessments aligned to Common Core standards.” Likewise, Priti Johari, the redesign administrator for Chelsea High School in Massachusetts, notes that using common performance assessments and rubrics in her school has nurtured not only collaboration but also a culture of inquiry among teachers:

Our work of creating common assessments and rubrics and scoring them across classrooms has created a culture of inquiry and a collaborative atmosphere. Four years ago classroom doors were closed and there was no collaboration. Twenty-five percent of the teachers in the school were a professional learning community. Now I believe 100 percent of the teachers experience themselves that way. This is a result of our process of learning about the Common Core, unpacking standards, writing lesson plans and tasks, sharing those plans, giving each other feedback, creating common rubrics, and collectively examining student work.9

MOVING FROM A CULTURE OF TESTING TO A CULTURE OF TEACHING

Engagement with performance assessment has the potential to change discussions among educators about what teaching and learning should be. In the Pentucket Regional School District in Massachusetts, every student in every school has a portfolio of work that demonstrates standards for the district-agreed-on habits of learning. This portfolio is presented in a public forum in grades 4, 6, 8, and 11 and is evaluated using rubrics that are common across the district. Currently in the sixth year of this work, assistant superintendent William Hart notes the power of this work:

Like a lot of districts, for many years, standardized testing had dominated the thinking of teachers in our district and defined the practices about what kids should know and be able to do. Teachers and administrators had focused their energies on finding the most expedient way to prepare kids for those exams. The unintended impact of this focus was more didactic teaching. As we have engaged in performance assessment development and scoring, a new balance has been brought to people’s efforts. Now we assess our students’ thinking, collaboration, independence, creative exploration as well as state standards.10

The public nature of this work helps parents and family members understand and support their children’s learning. Hart explains:

We have parents attend their children’s presentations so that they can see what we do. It is not infrequent that parents leave these events in tears because they are just blown away by the deeper kind of learning and the broader set of skills, attributes, and habits they see that they never before saw in work for the old tests.11

Improving Teaching and Student Learning

The effort and time invested in having teachers design common assessment tasks and then score them through common rubrics yields dividends in regard to both teacher learning and the quality of student outcomes. Amy Woods, an eighth-grade English teacher at Cape Cod Lighthouse Charter School in East Harwich, Massachusetts, speaks to this point:

We have been doing performance assessment in our school since its inception. At the beginning, each teacher developed his/her own assessments and rubrics and was scaffolding teaching and student assignments differently so that each class was being prepared differently. Now, with our common assessments, we have developed continuity in our rubrics across the grades. Our collective scoring of that work has given us a common language and more coherence in the school in terms of preparing kids across the continuum of development.12

The Pentucket Regional School District in Massachusetts had a similar experience. As assistant superintendent William Hart notes:

Using common performance assessments with common rubrics has had a positive impact on our teachers as well as our students. Teachers are now using the common rubrics to guide the type of project or task they can develop to marry concept/content acquisition and twenty-first-century skills. They are asking questions like: How do I shift the instructional environment to do both? How am I helping kids develop as collaborators? Measuring worthy skills and knowledge in this way has driven the context of the classroom. The work takes time and collaboration, but huge dividends are evident in the student work. Our high school state test performance is extremely positive. And what’s interesting is that where performance assessments are being implemented with the greatest fidelity, we are getting the best test performance. One of our high schools was identified as an exemplar by our state commissioner. Our elementary school is the top in the state. The message I take from this is: if you do performance assessment well, then it is just good teaching and learning, and kids are going to achieve.13

Collaborative work in performance assessment that is managed skillfully can have an impact on all layers of the school system. Laurie Gagnon, QPA director, explains:

The work takes a systems approach. It gives kids opportunities to learn and demonstrate their learning, embedded in a cycle of teacher learning and in what teachers need to be supported by their district and community. It touches on all the different pieces of the system. It has led to deep thinking among teachers and administrators about rubrics and how to communicate with kids about what their next steps are.

Before [performance assessment work such as ours] teachers used to teach something and then move on. Now they are thinking about reteaching and about making connections within a single source and across disciplines. They are identifying skills that are common across disciplines and helping kids understand how to read and use rubrics. They are learning from what kids are doing and helping them get to the next level. This work is an ongoing cycle of inquiry, both within and across disciplines. It takes a long time to get this right, but the impact is clear on the kids.14

A belief that performance assessment with extensive teacher and leader involvement can systemically strengthen learning for students and teachers is central to the policy initiatives of New Hampshire’s State Department of Education. In New Hampshire, an effort is under way to create an accountability system that includes performance assessment in addition to other paper-and-pencil tests. The goal is for all students to participate in complex performance tasks that measure deeper learning over time, to expect teachers to be involved in the development and scoring of these tasks, and to create a pool of tasks that teachers can choose from. Paul Leather, deputy commissioner of education in New Hampshire, explains:

We have been involved in performance assessment for years and looking at student learning for years. But what we found was that it was very spotty. Some folks did a great job with it, and others didn’t know what we were talking about. Realizing this has led us to ask what we need to do better. We came to understand that preparation is not set up to help them [teachers] support the degree of expected learning for students. And schools are not set up to encourage personalized learning going deeper. So we are trying to deconstruct and reconstruct our whole system so that teachers feel more supported. We want to move forward on a continuum toward deeper assessment that is more challenging for students and teachers too. We are aiming eventually to have a system where the students create their own tasks and teachers score them with common rubrics. Right now, though, teachers are creating the tasks and developing common rubrics that they use to assess against established competencies.15

Leather believes that this performance assessment development and collaborative scoring work has had a positive impact not only on teacher learning but on student outcomes as well. As evidence for this claim, he points to a decline in the New Hampshire’s high school dropout rate, an increase in high school graduation rates, and an increase in the number of the state’s graduates who go on to college. Leather attributes this to the more personalized teaching that results from the process of teachers’ involvement with collaboratively developing and scoring complex performance assessments. He explains:

They are placing a lot more attention on depth of knowledge of the learning process. They are looking at assessment questions—are we asking students to do things that are going to be asked of them in the realities of their lives? We are encouraging teachers and students to take on deeper learning. We want to make sure that the assessments we use will incent the kind of teaching we want. This [whole process] has been a breath of fresh air for our teachers as well as our students.16

CONCLUSION

Anthony Alvarado, the renowned educator and administrator whose successful school reforms in New York City and San Diego, California, are well researched (see, e.g. Elmore & Burney, 1999; Darling-Hammond et al., 2005), called attention to the fact that in order to support improved student learning, educators need “to find ways of getting deeply into the specifics of how to help students master subject matter . . . and to create contexts that support changes in thinking and pedagogy on the part of teachers” (Alvarado, 1998). Alvarado wisely understood that a focus on how to better support teacher learning is critical to efforts aimed at improved student learning. What’s more, he knew that the ability of schools to develop students to meet the challenging demands of the future depends on teachers who are knowledgeable about the critical elements of learning and can employ the strategies that are needed to connect these elements with the understandings of diverse learners. Involving teachers in the design, use, and scoring of standards-based, performance-based assessments is a powerful way to help teachers develop this knowledge and these skills.

Thus, the use of common standards–based performance assessments that are designed and collaboratively evaluated by teachers can have many benefits, including these:

  • Providing teachers with more direct and valid information about student progress than is offered by traditional assessments, especially on the deeper learning skills that characterize the Common Core State Standards
  • Enabling teachers to engage in evidence-based work: reflecting more clearly and analytically on student work to inform their instructional decisions
  • Yielding information that enhances teachers’ knowledge of students, standards, curriculum, and teaching, especially when scoring is combined with debriefing and discussing next steps with other teachers

By examining the work of their students, teachers can increase their knowledge of individual students, become better informed about their students’ capacities, and receive guidance about what they need to do next to support students’ development. Teacher involvement in assessment helps teachers to clarify their goals and purposes for teaching, make expectations of students explicit, create learning experiences that apply knowledge to real-life contexts, and provide many different ways for students to demonstrate their abilities and skills. It supports teachers’ learning about state standards, their discipline, their students, and their teaching practices. The approach also offers teachers a forum for collaboration and an opportunity to learn. In other words, teacher involvement in standards-based and performance-based assessments lays the groundwork for better teaching and learning. As leaders in the Mathematics Assessment Collaborative note:

Teachers benefit from this approach as much as students do. . . . Teachers laboring to improve student performance on a high-stakes exam can come to feel isolated, beaten down, and mystified about how to improve. . . . The exigencies of test security mean that teachers often receive little specific information about where their students’ performance excelled or fell short. When high-stakes test results come back, often months after the exam, teachers can do little with the results but regard them as a final grade that marks them as a success or failure.

Assessment that requires students to display their work . . . is a tool for building the capacity of the teaching community to improve its work over time. The discipline of exploring together [what] we want students to know and the evidence of what they have learned is simultaneously humbling and energizing. Knowing that they are always learning and improving creates among educators a healthy, rich environment for change. To improve instruction requires that teachers become wiser about the subject they teach and the ways that students learn it. Performance assessment of students, with detailed formative feedback to teachers accompanied by targeted professional development, helps to build the teacher wisdom we need. (Foster et al., 2007, pp. 152–153)

While standards and assessments provide teachers and students with explanations of and access to images of excellence, this awareness is not sufficient. An open and public guide to expectations for teaching and learning offers one way to help level the playing field between students who have experienced vastly unequal opportunities, resources, and supports for their learning. However, adequate supports also need to be provided to build the capacities of all teachers and students to achieve these new and more challenging standards. Special attention and resources need to be allocated to provide teachers and students, especially those in historically underserved communities, with the appropriate opportunities to learn so that they can be sufficiently prepared to reach higher levels of success.

In addition, if tests are to support system learning, they must be used for information rather than punishment. If they are to be educative, assessments cannot be used to allocate sanctions for teachers or schools that create competition where collaboration is needed, use fear to shut down teacher learning, or create incentives for keeping or pushing struggling students out of schools in order to boost scores.

Based on the evidence we have reviewed here, we recommend the following steps for states, localities, and assessment developers:

  • Ensure that assessment is embedded in a learning system. At the state and local levels, assessment should be considered as part of a learning system for both students and adults, connected to curriculum, instruction, and professional development.
  • Include performance tasks as a part of assessment. States and multistate consortia should include performance tasks as part of their systems of assessment and involve teachers in the design, use, and scoring of these tasks.
  • Make sure that criteria and rubrics for scoring tasks are clear and explicit for students and teachers. Sample tasks evaluating key standards should be publicly available for formative use in classrooms.
  • Involve teachers in collaborative scoring sessions. States and districts should bring teachers together for training and moderation to learn to score reliably and should include opportunities for teachers to discuss the implications of the standards and assessment results for their teaching.
  • Expand opportunities for teachers to engage in analysis of student work. Although teachers may score the work of students other than their own for purposes of summative evaluation, they should have timely opportunities to see the work of their own students (completed tasks and rubrics) to inform and guide their practice.
  • Provide teachers with coaching and professional development to develop standards-based instruction. Teachers’ engagement in examining and scoring student work should be supplemented with coaching and professional development focused on teaching strategies to implement the new standards.
  • Build communities of practice. States, districts, and schools should build communities of practice engaged in performance assessment work with iterative opportunities to build tasks, collaboratively plan instruction, analyze student work and scores, and continually fine-tune practice.

Teacher involvement in the design, use, and scoring of performance assessments has the potential to powerfully link instruction, assessment, student learning, and teachers’ professional development. If used wisely, it has the potential to address multiple important goals through one concentrated investment. It also offers a powerful way to evaluate what students know and can do while it also affirms teachers’ knowledge and supports their learning.

Continued use of high-quality standards and performance assessments over time has been shown to improve teaching and learning. And as teachers become more expert about teaching, continued improvement and progress on the part of students can be expected. Not only will overall pedagogical capacity be enhanced, but teaching and assessing will stay focused on its central purpose: the support of learning for all involved.

NOTES

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.103.10