Chapter 8
A New Conceptual Framework for Cost Analysis

Lawrence O. Picus, Frank Adamson, William Montague, and Margaret Owens

The use of assessments in schools continues to expand alongside our growing expectations of what school children need to know and be able to do when they graduate. The standards-based approach to education reform has focused attention on the role of standardized tests and created higher stakes for both students and the education professionals who serve them. At the same time, the press for continued improvement in student learning, as measured by state-established learning and performance standards, has caused schools to rely more heavily on assessment tools as part of data-driven decision-making processes. Despite the growing importance of assessments in our education system, relatively little is known about the economic costs and benefits of these assessments that are such a large part of every student’s educational experience.

What is clear, however, is that following passage of the No Child Left Behind (NCLB) Act in 2001 and the requirement that states measure the progress students make toward meeting student proficiency goals, the amount of and level of testing in schools has increased dramatically, along with the costs for those assessments. The US Government Accountability Office (US GAO, 2009) estimated that in the forty states responding to its survey, payments to testing vendors to develop, administer, score, and report results of assessments to meet NCLB requirements exceeded $640 million in 2007–2008. This figure does not include the costs of NCLB testing in the other ten states or the costs of tests developed and administered by any of the fifty states themselves. One recent estimate places the total costs of NCLB-required testing in the United States at $1.7 billion annually (Chingos, 2012), with costs greater for tests offering more open-ended items and performance tasks.

Because testing costs are closely related to the kinds of items and tasks used and because those tasks significantly influence the cognitive demands of the tests and their potential influence on instruction, it is important to evaluate the costs and benefits of performance assessments. This chapter updates earlier work on estimating the costs of alternative assessments within the current policy context (Picus, 1994; Picus, Tralli, & Tasheny, 1996; Picus & Tralli, 1998), with a focus on the costs of developing, administering, scoring, and reporting the results of performance assessments. As in the earlier studies, efforts are made to distinguish between the concept of economic or opportunity costs (the use of teacher time that is already “paid for” through the contract and used as part of the assessment process rather than for some other activity or function), and the direct expenditures made for assessment.

As this chapter shows, the bulk of the costs for any assessment system are the time teachers and other school and district personnel spend in the development, administration, grading, and use of the results of assessments, not the costs of the assessment activities themselves. Similarly, and not unexpectedly, the benefits of assessments depend on the extent to which those same individuals are able to use the data from the assessments to improve student learning or performance.

It is relatively straightforward to determine how much a school or school district spends on assessment instruments and reporting of the test results, but it is much more difficult to determine how much time is devoted to preparing for and administering the tests, and even harder to determine the costs of how the results of those assessments are used by school staff to improve learning and instruction. These estimates are made more complex by the growing realization that teacher collaboration through professional learning communities (DuFour, DuFour, Eaker, & Many, 2006) or similar teacher efforts are critical to improving student learning.

Much of the literature on improving student performance describes collaborative, data-driven approaches that rely heavily on analysis and use of student assessments. These efforts are time-consuming on the part of school staff and often require extensive training to be fully implemented. When used as part of a strategy to improve student learning, there are multiple benefits to these systems as well. Knowing how performance assessments can lead to higher student performance through better identification of student needs and more appropriate approaches to teaching is a critical component of analyzing the costs of assessment systems.

One of the concerns frequently raised about performance assessments is the cost of scoring more extended, open-ended items in relation to the costs of machine-scored multiple-choice tests. However, many states and nations have maintained performance assessment systems that are manageable and affordable. At least two aspects of this problem need to be explored and estimated in cost-benefit terms: first, the manner in which the assessments are (or are not) integrated into teachers’ work—with scoring managed as part of teachers’ core work or professional development (which has both cost and benefit implications)—and, second, the extent to which the benefits of teachers’ participation in this work translate into improved instruction and learning for students.

Another issue for consideration when estimating the costs and benefits of performance assessment is the identification of benefits. The traditional literature on cost-benefit analysis (see, for example, Mishan & Quah, 2007) focuses mostly on the monetization of costs and benefits. While this analysis establishes a framework for and some initial estimates of the costs of assessment practices, the benefits of assessment are measured in terms of student performance, which is not easily translated into dollars. Cost-effectiveness analysis (Levin & McEwan, 2000) offers an alternative framework, which we consider.

This chapter focuses on the use of assessments to improve learning and offers a framework for estimating some of the costs and benefits of performance assessments, including the influences on both costs and benefits of different scoring models (e.g., widespread training and involvement of teachers as part of their ongoing work and professional development; external scorers unrelated to the classroom; uses of technology platforms for facilitating scoring).

The chapter starts with a framework for considering assessments by establishing working definitions of formative, benchmark, and summative assessments as they will be used in this chapter. The second section advances earlier work on the concept of costs of assessments compared to direct expenditures for assessments in schools. The third section focuses on the benefits in terms of improved student performance and is followed by a table that summarizes the costs, expenditures, and benefits of various types of assessments. The final section summarizes the framework and offers suggestions for improving the analysis of the costs and benefits of performance assessments.

FRAMEWORK FOR CATEGORIZING ASSESSMENTS

In today’s standards-based environment, assessment of student performance is an accepted and regular part of the expectations of all schools. Students are familiar with the annual tradition of standardized tests—generally required by each state and given in the spring of each school year to measure how students and schools perform. Parents, the media, and even real estate agents eagerly await the reporting of these test data to “see how well their schools are doing.” In many states, one can visit a website to compare school-level test results (along with related student demographic information) across schools and school districts. Some states, such as California, have even reduced the reporting of test results (along with other measures of a school’s “success”) into a single index number. Moreover, the press by schools and districts to make Annual Yearly Progress (AYP) to avoid sanctions identified in the federal NCLB law has all schools working to help students do better on those standardized tests.

Yet statewide standardized tests are only one part of the entire student performance assessment system available for use by schools. Many argue that to fully understand student needs, provide instructional programs to meet those needs, and assess the effectiveness of the programs, a tiered structure of assessment is needed. (See, Boudett, City, & Murnane, 2008; chapter 10, this volume.) If one is to identify the costs of assessment programs, these distinctions among what is known as formative, benchmark, and summative assessment become important. Unfortunately, educators do not have clear agreement on the distinction across these three levels of assessment. Thus, before estimating costs and benefits, it is important to establish a framework for types of assessment.

As used in this chapter, formative assessments are diagnostic and include teacher-developed strategies and tools to understand what students know and need to know. Benchmark assessments are periodic tests to check understanding to ensure students have mastered the material they have been taught. Summative assessments are used to make judgments about what students have learned and include the standardized annual tests given in virtually every public school. These standardized assessments are increasingly used to measure school quality.

Formative Assessments

Odden (2009) describes formative assessment as being diagnostic in nature and given with relative frequency—sometimes as often as weekly or even daily. Teachers use these assessments to determine how to teach specific curriculum units and monitor regular student progress. Boudett et al. (2008) point out that short-term data can be generated continuously as teachers use students’ regular work (assignments and tests) to assess their progress, diagnose problems with understanding, and tailor instruction to focus on areas where students need additional help or focus. This effort goes beyond the simple grading of papers, quizzes, and tests and requires teachers to link the students’ work with the learning goals of each unit through examination of that work, observation of student participation, and conferring with students on a regular basis (Boudett et al., 2008).

The information derived from formative assessments is not always easy to translate into instructional practice and can require considerable work on the part of a teacher. Wylie and Lyon (2009) suggest substantial professional development (PD) efforts are needed to ensure that teachers can develop, use, and take advantage of formative assessment processes. They argue that formative assessments require school-based PD for teachers supported by coherent district support for PD efforts.

The advantage of strong formative assessments and tools is that they allow teachers to focus instructional activities to the exact learning status or needs of students in their classroom. Odden (2009) states that strong formative assessments that allow teachers to emphasize what students need to learn and move more quickly over material students have mastered could be thought of as being more efficient, a concept critical to the analysis of costs and benefits.

Benchmark Assessments

Teachers and schools need periodic assessments of student progress and learning. As used in this chapter, benchmark assessments provide these guideposts to educators so they can measure student progress more frequently than the once-a-year standardized state tests. The purpose of these assessments is to track student progress during a school year and might include locally developed instruments or commercially available tests (Boudett et al., 2008). The value of benchmark assessment tools is that they give routine and regular progress reports on student learning to teachers, enabling them to adjust their teaching strategies and pacing to ensure students are mastering the material. This is distinct from a formative assessment that helps a teacher understand what students already know, and instead gives regular and periodic information on what students have actually learned from the material presented.

Where material in one semester builds on material learned the previous semester—or any time block that is relevant to the subject matter being taught—benchmark tests enable educators to know if students are prepared for the new material. If not, reteaching of that material may be more efficient than developing interventions for students who are unable to keep up, a potential benefit that could be ascribed to assessments.

Summative Assessments

Summative tests can include any measures that are used to make an assessment of a student’s knowledge and skills at a moment in time for the purpose of drawing inferences about his or her achievement and for informing decisions. Today most policy-relevant discussions of student performance and school success focus on annual statewide standardized test scores, although these are not the only summative tests in use. Statewide test data, which are often used for accountability at district and state levels, provide a snapshot of a school’s performance. Schools can use them to focus on areas needing improvement over time and can give a good picture of gaps in achievement among different groups of students in a school or district. Unfortunately, these tests are often given in the spring with results provided in the summer or following fall, limiting their use to focus on the learning needs of students.

Summary

In summary, for the purpose of the cost framework that follows:

  • Formative assessments are often highly individualized by teachers, who use them frequently and in various forms to identify what students need to learn to master the material being taught. Formative information can also be secured from large-scale assessments if they are sufficiently rich and if data are delivered to teachers in a detailed and timely way.
  • Benchmark assessments are given on a regular and periodic basis throughout the school year and are designed to measure how well students have learned the material presented. They allow teachers to make corrections in their teaching to ensure that students have the knowledge from early units needed to master the more difficult skills and knowledge required in higher units of the curriculum.
  • Summative assessments in today’s US policy system are typically annual standardized tests that intend to show how well schools are meeting state-established standards over time, as well as how they compare with other schools in the district and state.

MEASURING THE COSTS OF ASSESSMENT

Before developing a conceptual framework for measuring the costs of performance assessment, it is important to establish the definitions of and differences between costs and expenditures.

Expenditures

A common approach to comparing the costs of alternative programs in educational institutions is to determine the monetary value of the resources necessary to implement each program, and compare the total expenditures across programs. Economists point out that this process implicitly assumes the two programs are intended to accomplish the same goals and that both have identical efficiencies or inefficiencies in their operation. If these conditions do not hold, and there is little reason to expect that they do, comparisons of expenditures are invalid and can be misleading (Monk, 1990; Belfield, 2000).

If, as is often the case in education, multiple goals are established for an assessment program, then estimation of the costs of that program must include all of the resources necessary to accomplish all of those goals. The difficulty is that a project’s goals can be hard to quantify or may even be contradictory. For example, among the many goals that have been attributed to performance assessment are to change what is taught and learned in schools focusing more on problem solving and critical thinking, raise expectations of students, and motivate student interest and effort in learning. Determining the resources necessary to achieve each of these goals is at best a complex task. Because of this difficulty, many analysts stop short of estimating the true costs of a program and instead focus on the expenditures required for its implementation.

In K–12 educational institutions, even determining the actual expenditures for a specific program can be difficult. Most state accounting systems require school districts to report spending by object (e.g., salaries, benefits, supplies) and sometimes by function (e.g., instruction, administration, instructional support, maintenance and operations, transportation). Odden and Picus (2014) point out that often these expenditure data are reported at the district level, and there is little information about how funds are used at the school or classroom level. Moreover, detailed information about specific programs within a district is often hard to discern from school district financial reports. In an object-oriented system, estimating the expenditures for student assessment might require determining the salaries and benefits of staff members who work in that program, estimating what portion of their time is devoted to the assessment program, and then determining which of the district’s expenditures for supplies and materials (including the tests) should be attributed to the program. These expenditures may be coded in different places in the district’s accounting reports, making their estimation more difficult (Hartman, 2002).

Even in districts able to provide detailed information about the expenditures made for their assessment program, this information provides only a partial delineation of the full economic costs of the assessment program. The other factors that must be considered when estimating the full costs of a program are described below.

Costs

The textbook definition of the cost of a program is the benefits that are not realized through the best forgone alternative. Thus, if a resource is devoted to some use, the benefits associated with the best possible alternative use of that resource represent the opportunity cost of the program. Unfortunately, it is not always possible to determine what the best alternative use of those resources might be. Moreover, if that alternative can be identified, determining its benefits may be a considerable problem. For example, if a district is considering the implementation of a new performance assessment program, the opportunity costs of that program would be equal to the benefits from any conceivable alternative reform that was not implemented.

In analyzing the costs of performance assessments, the range of opportunity costs could be thought of as the benefits of all the possible alternative programs the district could establish to improve student performance. In this case, the benefits derived from the performance assessment would be compared to the benefits derived from the best option facing the district other than the assessment program. The more beneficial the alternative given up is, the more it will cost to devote resources to performance assessment (Monk, 1995). However, before the benefits of a program can be measured, agreement must be reached as to the goals of the forgone activity.

In some cases, it may be appropriate to restrict the alternatives considered. For example, in analyzing the costs of an assessment program, it may be that the decision to be made is whether to replace the existing conventional assessment system with a new form of assessment. In that case, the relative set of alternatives is limited to the assessment program currently in place, and the costs of the new assessment program will be measured on the basis of the forgone benefits of the old assessment program.

To make a cost analysis useful to decision makers, cost analysts need to develop a common metric to measure the benefits of alternatives. Unfortunately, there is no simple way to compare the benefits of programs that have disparate goals. Since agreement on the spectrum of benefits may also be difficult to achieve and since estimation of the benefits of a forgone alternative may require a great deal of time for what could be considered an activity with little value (after all, why calculate the benefits of something you do not plan to do?), many analysts simplify the issue by estimating the expenditures necessary to operate the alternative program. One approach is to use the dollar value of the actual or anticipated expenditures as a measure of the projects costs. Often called the ingredients method (Levin & McEwan, 2000), this approach relies exclusively on expenditures to measure costs; as Monk (1995) argues, it leads to confusion about the difference between expenditures and costs.

If one believes that the benefits to be derived from an alternative assessment dramatically exceed the system being replaced or if one anticipates improvements in student learning as a result of the new assessment system (clearly a hoped-for outcome of today’s assessment spectrum), then using the expenditures devoted to the performance assessment program may in fact overstate the true costs of the program since the benefits derived exceed the benefits from the program or programs it replaces. Unfortunately, there is no way to estimate the size of this exaggeration. To resolve this problem, one needs to make explicit assumptions about what factors could cause this overstatement and then estimate costs with and without an adjustment for this issue. In the framework established in this chapter, the ingredients approach to costs is used, and where necessary, adjustments for the potential overstatement of benefits that this could lead to are identified and possible adjustments considered.

ADDRESSING THE BENEFITS OF PERFORMANCE ASSESSMENT

Assessing the benefits of performance assessment is complex. Although no one doubts the value of assessing student performance, there is a growing movement among educators who argue that time spent in assessment activities is taking away from the time needed to teach material to the students. It is hard to evaluate the accuracy of statements like this because the findings would be dependent on the quality of the assessment and the way it is used. Following are several ways to think about the benefits (positive and negative) of assessment:

  • If assessments are properly aligned with state or local standards, they are a useful tool to help assess student progress and learning needs, and they can help schools and districts identify their strengths and weaknesses.
  • Standardized tests provide data for only one point in time, and the results are often not available until the next year when a student has moved to another classroom and often to another school. Nor do standardized tests measure progress when there is a high incidence of student mobility.
  • Benchmark assessments will give schools and teachers regular information on student progress during the school year and help them focus.
  • Well-designed formative assessments can be used as tools to sharpen instruction to focus on student needs, leading to improved student outcomes over time.
  • Assessment results can form the basis of teacher collaboration and planning efforts to design instruction programs that are coherent and focused directly on student learning needs, leading to improved student outcomes.
  • A strong assessment system will enable teachers to focus on student learning problems earlier, resulting in fewer expensive interventions.
  • Assessment takes time away from instruction and limits student learning.
  • Standardized assessments are often misused and are not representative of what is taught in classes, resulting in little value for teachers or students.

The problem with measuring all of these potential benefits is that there are so many, and so much of the benefit to be derived is dependent on the way the assessments are implemented, analyzed, and then used to drive and, it is hoped, improve instruction. Thus, unless assessment is an integral component of an overall school strategy to improve learning, it is unlikely that any assessment program will provide a high level of benefits by itself. This of course complicates the measurement of those benefits.

For example, in work we conducted in Wyoming and Little Rock (Odden, Picus, Archibald, & Smith, 2009; Picus and Associates, 2010), we found that relatively less time was focused on initial instruction, while more was focused on providing interventions for students when they fall behind. In most of the schools we visited, specific times during the day were set aside for all students to have interventions, under the assumption that they would need additional help. This time was provided at the expense of instructional time in the core classes of math, science, language arts, social studies, and world languages.

Better formative assessments, combined with a focus on high-quality instruction to start with, would, it seems, reduce the need for expensive interventions. To the extent that well-designed formative and benchmark assessments are used to identify and resolve student learning concerns early and quickly, parsing out the benefits that can be attributed to the assessment is probably impossible—yet it is clear that a well-designed model like this could result in more high-quality instruction for all students and fewer but more timely interventions for students when they do struggle with the material.

The benefits of performance assessment then seem to fall into several categories. They provide more information to help teachers identify and correct learning deficiencies early. They give teachers and site leaders information about how well their students are learning the material required by a state’s standards, and they provide long-term information on changes in overall student performance across schools and districts. All of this information can be used to focus and design instruction to improve student learning, and is clearly a benefit of performance assessment.

ESTABLISHING A FRAMEWORK FOR IDENTIFICATION OF COSTS AND EXPENDITURES

In earlier work on the issue of assessment costs (Picus, 1994; Picus et al., 1996; Picus & Tralli, 1998), Picus proposed three dimensions of costs or expenditures for assessments, identified as components, kinds, and levels. The factors identified in each category are outlined in table 8.1.

Table 8.1 Dimensions of Costs and Expenditures for Performance Assessments

Dimensions
Kind Component Level
Personnel
Materials
Supplies, travel, and food
Development
Production
Training
Instruction
Test administration
Management
Scoring
Reporting
Program evaluation
National
State
District
School
Classroom
Private market

The costs and expenditures identified in table 8.1 might be thought of as a three-dimensional matrix whereby costs and expenditures could be located in any cell that related to each of the three dimensions. For example, personnel costs and expenditures are likely to be incurred at all levels and for most of the components of assessment (e.g., management and scoring).

The basic costs and expenditures identified in table 8.1 have not changed noticeably in the past fifteen years, but the relative allocation of each has changed considerably. For example, the availability of online testing capacity has changed the way many assessments are now scored, enabling teachers to have the results of these benchmark tests the next day—data they can use to adjust instruction as they move forward with lesson plans. Also, the adaptive approach used in many of the computer-based assessment systems, which adjusts the questions asked of students based on their responses, may provide much more precise measures of student knowledge and skills.

At the same time, the press for frequent formative and even benchmark assessments may lead to the use of more of the resource factors identified in table 8.1 than has been the case in the past. The important question that policymakers, school district administrators, school site leaders, teachers, and even parents need to address is the extent to which the use of multiple assessment strategies offers benefits in terms of enhanced or improved student achievement. Given the multiple variables that are important to the administration of such assessments (identified below), it is likely impossible to assign direct benefits to the cost and expenditures of assessment programs. However, substantial evidence suggests that well-thought-out student assessment programs in an overall school reform strategy can be part of an effective program that leads to improved student performance (Odden & Picus, 2014; Odden & Archibald, 2009; Odden, 2009).

In appendix C, we present tables that identify the nature of costs, expenditures, and benefits that schools, districts, and states would encounter in the development, implementation, and analysis of performance assessment programs. We organize the analysis in separate tables for formative, benchmark, and summative assessments, using the components identified in the center column of table 8.1: development, production, training, instruction, test administration, management, scoring, reporting, and program evaluation. These tables illustrate that most of the costs relate to personnel time that can be allocated in different ways. Thus, there may be little change in personnel expenditures, but an important change in the focus of what teachers, counselors, site leaders, and others at a school site are doing. The actual expenditures for materials and supplies are likely to be relatively low.

Analyzing Expenditures for Assessment

Most work analyzing the costs of assessment considers state-, district-, and school-level expenditures for assessment programs. In their 2002 work on test-based accountability, Hamilton, Stecher, and Klein point out that while improved testing systems are likely to cost more money, few good estimates of the costs of improved accountability systems in relation to their benefits have been developed. More than a decade later, little has changed.

In this section, we review estimates of costs and expenditures for performance assessments and suggest how more complete analyses could be shaped in the future.

GAO Cost Analyses

The GAO has conducted a number of analyses of assessment and testing studies. In 1993, a GAO study to estimate the cost of a national assessment included two components: purchase cost and time cost. The study defined purchase cost as the money spent on test-related goods and services, a category in line with what we call expenditures. The GAO also estimated the cost in terms of the time teachers, administrators, and other school personnel spent on all test-related activities, including development, test preparation for students, test administration training for teachers, all the activities of test administration, scoring, analysis, and reporting of results. The GAO then converted the cost in terms of time into a dollar amount by multiplying the total time spent on test-related activities by the average salary in each district. Unfortunately, aggregating these different types of time disguises important differences between them that have emerged in the NCLB era as more important considerations than in previous decades. Specifically, test preparation time for students has become a subject of national debate about how much class time teachers spend teaching to the test.

Including these personnel costs, the GAO study indicated that a strictly multiple-choice test would cost about fifteen dollars per student, while an assessment with performance-based items in addition to multiple-choice items would cost about twenty dollars per student, and a solely performance-based assessment would cost about thirty-three dollars per student (US GAO, 1993). Adjusting for inflation, those estimates would be closer to $24, $32, and $53, respectively, in 2009 dollars.1 While the GAO’s estimates identified a 65 percent larger cost for performance assessment than multiple-choice testing, this still represented only 0.7 percent of per student expenditures in 1991 (US GAO, 1993).2

The GAO (1993) made several points that highlighted potential cost-saving efficiencies. First, it reported a large spread in the cost of performance assessment, from $16 to $64 (with an average of $33). This spread suggests the potential for economies of scale and experience in developing and implementing performance assessments. When including more students in test administrations, the study found that costs fell, with fixed costs distributed over a larger number of students. In addition, when a test administration had several purposes, such as testing the same student population in more than one subject area, the per-subject-area cost of a test also declined as fixed costs were divided over a larger number of subjects. Finally, GAO researchers found performance assessment costs to be the lowest in the states and Canadian provinces with the most years of experience administering a performance assessment, pointing toward a possible learning curve in performance assessment efficiency (US GAO, 1993). In these two regions, the cost of performance assessment averaged only $22 per student (approximately $35 in 2009 dollars)—about 33 percent less than the average figure.

State Studies of Assessment Costs

As Topol and colleagues report in chapter 9 of this book, the GAO dollar estimates and their proportions of current spending on education are consistent with those reported in more current studies of state spending, as are the variability in costs across assessments and the factors that can produce cost savings.

The ratio of spending for performance-based assessments relative to multiple-choice tests also appears to hold across multiple studies. Costs appear to increase by about 50 percent when a significant portion of the test becomes performance based and are just over double if the test becomes entirely performance-based. However, they remain very small as a proportion of overall education spending: less than 1 percent.

For example, in a study of state-level testing expenditures in Kentucky and North Carolina in the early to mid-1990s, Picus et al. (1996) found that the more traditional multiple-choice North Carolina tests averaged $4.59 per test administered, while Kentucky’s much more performance-based assessments averaged about $7.51 per test. (For a description of the Kentucky Instructional Results and Information System, see chapter 2, this volume.) This represented 0.26 percent of state expenditures on K–12 education in North Carolina and about 0.45 percent in Kentucky.

In a later study, Picus and Tralli (1998) added estimates of both district expenditures and spending on personnel time, which were not included in the earlier study. They found that spending associated with the assessment system in Kentucky—when measuring both district expenditures and the total price of all the time teachers and other district employees spent on KIRIS-related activities—would result in expenditures per test in Kentucky in 1995–1996 of upward of $140 per student.

However, these expenditure estimates overstate the true cost of KIRIS because much of the time that teachers reported spending on KIRIS-related activities was actually a source of benefits resulting from the instructional, professional development, and class-preparation time teachers spent working with students and colleagues in ways that improved their teaching of writing and mathematics. The time teachers spent working on the portfolio and performance tasks that the system required was time spent helping students develop and learn to use the higher-level thinking skills represented on the assessments.

The New England Common Assessment Program (NECAP), which serves New Hampshire, Rhode Island, Vermont, and Maine, illustrates both the benefits associated with higher-quality assessments and the strategies that can make them more affordable.3 The NECAP assessment program includes open-ended responses on both the reading and math tests, comprising about half of the score on each, plus a writing assessment. (For more detail, see chapter 3, this volume.) The total cost of developing, administering, scoring, and reporting these assessments for the 2009–2010 school year was roughly twelve dollars per test and twenty-nine dollars overall per student tested.4

Evidence from NECAP illustrates that these expenditures produce benefits that offset the costs. For example, rather than detracting from teaching time, the director of assessment for Vermont believes that NECAP testing has become “an embedded part of the curriculum,” giving teachers valuable data to identify gaps in student knowledge. Vermont encourages its teachers to use released NECAP items as a model for crafting their own assessments, which state officials argue leads to the development of higher-quality classroom assessments capable of producing more meaningful results. Such improved teacher professional development and practice represents a potential benefit of switching to performance assessment, one that has the capacity to offset the marginal increase in price that states will likely incur by switching.

In addition to dividing the fixed costs of the program, which represent about 20 percent of the total, the NECAP states save money by realizing a number of economies of scale, as predicted by the GAO’s 1993 cost study. In fact, New Hampshire’s assessment director believes that economies of scale exist in the process of scoring open-ended items, which cost the state considerably more to score than multiple-choice items and represent one of the primary factors driving the somewhat higher price of performance assessment. According to New Hampshire’s assessment director, “The first 1,000 constructed-response items are a lot more expensive to grade than the last 1,000,” because as graders become more experienced at scoring an item, their efficiency and reliability increase.

Component Costs

Hardy (1995) analyzed states’ expenditures on performance assessment in three areas: development, administration, and scoring. Development includes creating tasks and conducting quality-control activities that lead to an assessment exercise ready for large-scale use and interpretation. Development activities might include the identification and specification of the learning and assessment objectives; exercise writing; editing, review, and other quality-control procedures; small-scale pretesting; developing guidelines for scoring and interpretation; and possibly norming. When these are developed by an external agency, prices for performance assessments reflect these expenditures. However, in-house development by current staff makes these expenditures harder to determine. Estimates for developing and pilot-testing performance tasks from state departments of education and testing agencies ranged from roughly $5,000 to $7,500 per task.

Spending estimates for administration of performance assessments include expenditures for any staff time and materials required to administer an assessment to students as well as for any special training for teachers, test coordinators, or other school personnel involved in the administration of assessment tasks. In the most complex cases, where specially trained task administrators have gone to schools with special testing materials, such as science kits and other hands-on materials, administrative costs ranged from $3 to $5 per student, with the higher costs associated with the use of manipulatives.

With respect to scoring, the study included training for teachers, other professionals, and, in some cases, clerical staff to assign numerical scores, narrative comments, or other forms of evaluation to student responses to assessment tasks. Costs in this area are significant because of manual scoring. However, under our new framework, the costs of teacher-moderated scoring, among others of these elements, should be evaluated through a cost-benefit lens that considers how expenditures may also contribute to better teaching and learning.

Hardy (1995) offered a list of performance-assessment scoring elements from recruiting and training raters to paying for their time to evaluate each response. Costs ranged from $0.54 to as much as $5.88 per student, depending on the number of raters and the degree of content specialization needed; the length and complexity of the student response; the type of rating scheme (holistic or analytic); diagnostic reporting; and the extent of involvement of classroom teachers. As much as 60 percent of the costs of performance assessments is associated with the involvement of teachers or other raters in scoring. Where teachers are the raters, the real location for these costs of performance assessment should be in the categories of teacher professional development and training—where the benefits should also be understood and calculated.

AN EVIDENCE-BASED MODEL OF SCHOOL FINANCE

One model that offers some insight into the costs of performance assessment is the evidence-based model of school finance adequacy developed by Odden and Picus (2014). Their model lays out a research-based approach to the organization of schools that often changes how certificated staff are used and provides funds for instructional supplies and materials. Generally the model enables a school to implement the ten strategies outlined by Odden (2009) that have been identified as frequently leading to strong gains in student performance. Among the strategies most aligned with strong performance assessment practice are a focus on planning and collaboration time for teachers, large investments in professional development that include additional paid days for teachers to meet during the summer to plan instruction (and the measures of their success in instruction), funds for instructional coaches to help teachers analyze assessment data and improve instruction, and money to purchase the contract services of experts as identified by the school or district.

The actual cost of an evidence-based system varies substantially from state to state depending on the current level of spending and the number of certificated personnel in each district and school. In some states, there are enough certificated staff to fill the roles identified in the model; in others, additional staff may be required. Moreover, the assessment aspects of the model are one part of a systemic view of improving schools (Odden & Archibald, 2009; Odden, 2009), making it hard to distinguish the costs of the assessment system by itself.

That said, there are some direct expenditures a school or district must make to implement any assessment program. In our work in a number of states, we have estimated this to be approximately twenty-five dollars per student, hardly a major expenditure compared to current levels of per pupil spending in the states (see Odden, Picus, & Goetz, 2006; Odden et al., 2007; Picus, Odden, Aportela, Mangan, & Goetz, 2008). This expenditure would include resources for testing materials and enough funds to purchase an online system. It does not include the costs of staff (either new positions or replacement of alternative activities by staff) for implementing an assessment system.

CONCLUSION

Research on estimating the costs of performance assessments in the United States can help inform new systems of assessment, especially if we use a framework that can distinguish between expenditures and costs and can incorporate the student and classroom benefits of various kinds of assessment systems.

The key to completely identifying the costs and benefits of assessment programs is to understand how personnel time is used in the development, design, preparation for, administration, and evaluation of assessments. The lion’s share of costs is the personnel time devoted to these steps of the assessment; the benefits accrue to the extent that the assessments help educators support and improve student learning. Research on the total costs of assessments then needs to focus on how personnel time is reallocated for different assessment strategies, and the benefit measured by improvements in the quality of instruction and the outcomes of those assessments.

If one were simply to look at the expenditures devoted to various forms of performance assessment, one would find that the expenditures as a component of a school district’s budget are quite low. Yet a comprehensive system of formative, benchmark, and summative assessments requires considerable time on the part of teachers, school site leadership, and a central office. While current standardized tests are often viewed as reducing time for learning because they are remote proxies from actual student work, curriculum-embedded performance assessments that provide learning experiences are typically viewed by educators as enhancing instruction rather than impeding it. Research suggests that this is often the case. While any assessment program has considerable costs in terms of the personnel time devoted to conducting, evaluating, reporting, and using assessment results, useful information—about what students know and can do—and support for teachers’ understanding of standards, curriculum, teaching, and learning are important benefits of high-quality assessment programs.

NOTES

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.80.173