Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 6
Adapting Performance Assessments for English Language Learners

Jamal Abedi

Traditional standardized achievement test outcomes are used for high-stakes decisions in assessment and accountability systems throughout the United States. Often developed and field-tested for the mainstream student population, these assessments may not be sensitive enough to the needs of some subgroups of students, such as English language learners (ELLs), who are faced with challenging academic careers. Research clearly demonstrates that some variables unrelated to the focal measurement construct (e.g., unnecessary linguistic complexity, cultural biases in construction of items) can affect the quality of high-stakes assessments for these students (Abedi, 2006; Solano-Flores & Li, 2006; Solano-Flores & Trumbull, 2003; Solano-Flores, 2008). Therefore, the outcomes of these assessments may not be reliable and valid, and they may not yield sufficient evidence for making important decisions regarding a student’s academic career.

Despite efforts to make state and national standardized achievement tests more accessible for ELLs, the outcomes of these assessments may not be useful in evaluating student learning and informing instruction due to their inherent limitations. Conducted mainly for accountability purposes, these end-of-year assessments do not afford an opportunity for students to present a comprehensive picture of what they know and are able to do in content areas such as math, science, and reading and language arts. More important, “accountability is not only about measuring student learning, but actually improving it” (Darling-Hammond, 2004, p. 1078).

Performance assessments can help to fill this gap, because they not only engage these students and give them a chance to demonstrate their knowledge but also disclose more in-depth information on students’ academic needs. Performance assessments can be less affected by unnecessary linguistic complexity for two reasons. First, language is often not the only medium of presenting an assessment task. For example, in a science hands-on performance task, students are presented with a set of physical materials—batteries, wires, and bulbs (see figure 6.1)—and asked to “determine what is inside an electric mystery box by constructing and reasoning about circuits.” The scoring process focuses on how they use evidence and the quality of their explanation (Ayala, Shavelson, & Ayala, 2001, p. 25).

**Figure 6.1** **Electric Mysteries Performance Assessment**

In another task, this one for fourth graders, they are presented with a pencil and saltwater and freshwater:

The fourth-grade Floating Pencil task is intended to measure students’ ability to collect data (measure length and volume), make inferences, and apply their understanding to new situations. In the task, students are told that they can determine the difference between fresh water and salt water by doing a test. First, students are instructed to measure the length of a pencil weighted [vertically] with a thumbtack (which serves as a hydrometer) [and] floating [partially] above the surface of the water, in both fresh water and salt water. The pencil is marked with equally spaced letters from A (top of pencil) through J (bottom of pencil), and students are asked to observe where the water line comes to on the pencil and place a mark on a picture of the pencil. Students are then directed to measure the length of the pencil that was above the water using a to-scale picture of a ruler. They repeat the Floating Pencil test to identify a “mystery water,” measuring the length of pencil floating above the water in the mystery water and comparing this finding with results from the previous tests. Throughout the task, students also are asked (1) whether the amount of water in the cylinder changes when the pencil is added; (2) how the way the pencil floats in salt water compares with how it floats in fresh water; (3) how dissolving more salt in the salt water would change the way the pencil floats; (4) how they can tell what the mystery water is; and (5) whether, when people are swimming, it is easier for them to stay afloat in the ocean or in a freshwater lake.

Source: Bass, Glaser, and Magone (2002, p. 5).

Second, students have access to these physical materials as they formulate assessment responses thereby reducing reliance on language. In science, students can experience manipulation of hands-on objects and use that familiarity as they formulate written or oral responses. Performance assessments thus reduce linguistic complexity as tasks are presented and as responses are prepared.

This chapter describes how performance assessments can help ELL students demonstrate what they know and are able to do. Unfortunately, research literature on performance assessments for these students is thin, but it nevertheless offers evidence on the effectiveness and usefulness of performance assessments for these students and on strategies to support access and fairness for these kinds of assessment.

PERFORMANCE ASSESSMENTS AND ENGLISH LANGUAGE LEARNERS

Bass, Glaser, and Magone (2002) observe that performance assessments allow all students, especially those with different language backgrounds, to engage in cognitively complex activities such as generating strategies, monitoring work, analyzing information, and applying reasoning skills. At the same time, concerns are often raised that for ELL students, performance assessment responses may not fully reflect their content knowledge because their responses may be confounded with writing skill and vocabulary. That is, language factors in performance assessments may even have a greater level of impact on ELL student performance than for native speakers of English.

In response to this concern, a distinction should be made between language related to the construct being measured (construct relevant) and language not necessarily relevant to the content (construct irrelevant). In performance assessments, students’ actual performance rather than their expressive language may more clearly convey the content being measured. For example, in the floating pencil exercise described above, students not only hear the test instruction to measure the length of a pencil floating in both fresh- and saltwater but also see the pencil under both conditions. More important, since the object of measurement is presented in multiple sensory modes (floating pencil and freshwater and saltwater), language is not the only way to present the performance assessment task (PAT) and obtain student responses.

As Linn and Burton (1994) have indicated, performance assessments have appeal as assessments that better reflect good instructional activities, are often thought to be more engaging for students, and are better reflections of the criterion performances that are of importance outside the classroom (they are said to be more authentic). Similarly, as Darling-Hammond (2006) indicated, “Performance assessments that require students to evaluate and solve complex problems, conduct research, write extensively, and demonstrate their learning in projects, papers, and exhibitions have proven key to motivating students and attaining high levels of learning in redesigned high schools” (p. 655). Increasing the level of motivation is important for English learners because they need encouragement and support in their academic endeavors.

Research clearly demonstrates that language factors have a major impact on the outcomes of assessments for ELLs (Solano-Flores & Li, 2006; Solano-Flores & Trumbull, 2003; Solano-Flores, 2008). Inclusion of unnecessary linguistic complexity within an assessment leads to a widening in the performance gap between ELL and non-ELL students.

Benefits of Performance Assessments for ELLs

Performance assessments can also help to ameliorate language factors that influence assessment outcomes. For example, Mislevy, Steinberg, and Almond (2002) showed that task-based language assessments (TBLAs) can assess language in more realistic and complex settings than traditional discrete-skills assessments can, with the latter offering narrower, more artificial opportunities for receptive and expressive language use. In discussing design and analysis in TBLAs, Mislevy et al. (2002) showed that linguistic competency, which includes knowledge of vocabulary and grammar, is not sufficient to evaluate communicative competence. The authors pointed out that “the concern in TBLA extends beyond knowledge and language per se, to the ability to deploy language knowledge appropriately and effectively in educationally or professionally important language use settings” (p. 3). The authors believe that an assessment such as TBLA, which includes sociolinguistic competence, strategic competence, and discourse competence, can present a broader conception of communicative competence. They refer to the listening portion of the Test of English as a Foreign Language as an example of a TBLA in which all items in the test require information in the form of a short conversation in a college environment.

Another recent study found student responses to a writing prompt less affected by student background variables, including English learner status, than were scores on a commercially developed language arts test, largely comprising multiple-choice items (Goldschmidt et al., 2007). Similarly, Wang, Niemi, and Wang (2007a) found that performance assessment outcomes are not sensitive to elements of students’ background status such as socioeconomic status (SES) and ethnicity. In other words, student strengths were more fully demonstrated on the performance assessments without undue influence of some of the sources of construct-irrelevant variables.

One goal of a performance assessment is to judge students’ level of competency in reading and language arts, science, and mathematics (Parker, Louie, & O’Dwyer, 2009). Therefore, performance assessments can also produce useful information for diagnostic purposes to assess what students know, and they can help teachers decide where to begin instruction or determine which groups of students need special attention. These assessment strategies can also be used to monitor students’ processing skills and problem-solving approaches, as well as their competence in particular areas while simulating learning activities. These characteristics can be extremely beneficial for special needs student populations, including ELLs, since these students may not have received equal education opportunities because of their linguistic needs (Abedi & Herman, 2010). These students often exhibit greater interest and a higher level of learning when they are required to organize facts around major concepts and actively construct their own understanding of the concepts in a rich variety of contexts.

Performance tasks are also instructional, allowing students to engage in worthwhile learning activities within the classroom. In performance assessment settings, students may be encouraged to seek out additional information or try various approaches, and in some situations they can work in teams. These assessment strategies are all beneficial for ELLs, who benefit from active engagement in classroom activities.

Furthermore, performance assessments are more accessible because many of the variables affecting large-scale state and national assessments have less impact on performance assessments and learning environments (Boscardin, Aguirre-Munoz, Chinen, Leon, & Shin, 2004; Wang, Niemi, & Wang, 2007a, 2007b). Under the right circumstances, open-ended assessments improve the chances for ELL students to engage with language production and learning, offering them opportunities to express their knowledge in a broader sense than the limited linguistic opportunities given to them in traditional multiple-choice items.

Performance assessments can be presented in many forms, yet are comprehensive in nature and allow students to present a more thorough indication of their understanding of certain content areas. A prime example of a performance assessment is a situation where students are asked to communicate in a second language or design and conduct research on a topic of interest. In this situation, the ELL students’ speaking and writing abilities could be directly evaluated on the basis of the actual presentations and texts that are created by these students.

Linn, Baker, and Dunbar (1991) indicated that direct assessments of writing, for example, provide instances of the tasks that we would like students to be able to perform, whereas questions about proper grammar are designed to measure what are best termed “enabling skills,” or partial indicators of actual ability to write. Indirect proxies for the knowledge or skills to be measured can raise validity concerns. For example, assessing writing by asking questions about grammar may not be highly correlated with actual writing ability. Furthermore, questions about nuances of grammar and syntax may be particularly disadvantageous for ELL students whose first language is based on differing rules. By contrast, direct assessments of writing give students more opportunity to demonstrate their ability to convey ideas.

More examples in the chapter illustrate how performance assessment tasks can help improve the quality of education for ELL students. These examples compare and contrast the quality of measurement outcomes of performance assessments against those from traditional standardized achievement tests. This helps demonstrate that outcomes from performance assessments can be more informative for teachers, students, and parents of all students, particularly ELLs; these assessments are not as greatly affected by extraneous variables as are traditional high-stakes assessments (Boscardin et al., 2004; Wang et al., 2007a, 2007b).

Limitations of Standardized Assessments for ELLs

Standardized achievement tests are commonly used for assessment and accountability because they have established objectivity, reliability, and validity, as well as ease of administration and cost efficiency in scoring (Burger & Burger, 1994). These tests may also refer more succinctly to content standards and have more easily verifiable content representation than many current classroom measures (Chung, Delacruz, & Bewley, 2006; Mehrens, 1992).

However, research has pointed to many limitations in the achievement tests commonly used for student assessment and accountability purposes (Linn et al., 1991; Archbald & Newmann, 1988; Shepard, 1991). These issues and limitations are particularly serious for ELLs, who are often at the lower levels of the academic performance distribution. Research shows that high-stakes testing policies can create inequity for low-achieving students when schools or districts systematically exclude them from these assessments in order to demonstrate gain in overall student achievement. For example, Heilig and Darling-Hammond (2008) found widespread exclusion of English learners from testing in a high-stakes accountability environment through mechanisms that rendered them “missing” on testing days and policies and practices that pushed older students out of school entirely. Other studies on grade retention, exclusion from state and local assessment testing (and school), and dropout rate also identify these sources of gaming.

These findings raise an important concern for ELLs, because they are often among the lowest-performing students in schools as a result of inappropriate assessments coupled with inequity in their access to quality instruction. If they are being categorically excluded from grade-appropriate high-stakes assessments in order to inflate a school’s or district’s scores or if they are excluded from rich content-based instruction to be drilled for the multiple-choice tests, then their progression and achievement may be misrepresented in schools that rely solely on standardized tests as a barometer of performance.

At the same time, it may not help to include these students without appropriately assessing them. As Ayala et al. (2001) indicated, “Although multiple-choice tests are useful for ascertaining a child’s conceptual knowledge, an assessment of actual performance may be more appropriate” (p. 1). Among the reasons that Miller and Linn (2000) present for using performance assessments instead of standardized tests is concern about the possible unintended negative effects of multiple-choice assessments that lead to narrower curriculum and encourage teaching to the test.

There are other technical issues with multiple-choice assessments. For example, a major threat to their validity is the technical problems associated with distractors in multiple-choice items. In addition to the possibility of unequal frequencies of response across the distractors, various subgroups of students show dissimilar trends in selecting distractors when they are not sure about the correct response. For example, Abedi, Leon, and Kao (2008) found the pattern of selecting distractors to vary by students’ disability status. Students without disabilities chose the distractors that are more likely to be the correct answer, while students with disabilities randomly selected distractors rather than making educated guesses. The same is true of students at a lower level of proficiency.

Similarly, in a study of eighth-grade students on multiple-choice standardized tests across three states, Abedi and colleagues (2010) found that ELL students most often selected distractors with a high incidence of academic vocabulary. It appeared that these distractors with academic terms were more attractive, and ELL students tended to select those responses in spite of their being distractors.

Issues of Linguistic Complexity

Analyses of national and state data show a substantial gap between the academic performances of ELLs and those of native speakers of English (Abedi, 2006, 2008). Abedi, Leon, and Mirocha (2003) compared the performance of ELL and non-ELLs on tests across several content areas and states. Results of their analyses suggest that the higher the level of language demand of the test items, the larger the performance gap was between ELLs and non-ELLs.

Some of the findings of analyses of National Assessment of Educational Progress (NAEP) data illustrate this point (Abedi et al., 2003). Results of these analyses show that the construct-relevant performance differences between ELLs and non-ELLs in grades 10 and 11 are highest in reading, a subject with a significant language demand that is considered construct relevant since the focal construct is language. The performance gap decreases in science and mathematics. Averaging overall scores for students in grades 10 and 11, the performance gap between ELL and non-ELL students was 15.0 score points (on the normal curve equivalence or NCE scale score) in reading, which was reduced to 10.5 points in science and further reduced to 1.3 points in math computation.

Abedi (2008) analyzed data from several states and made comparisons between the performance of ELLs and non-ELLs using a disparity index (DI), which is based on the percentage of difference between the two groups. Conducted on pre–No Child Left Behind (NCLB) data, these analyses again showed that the performance gap between ELLs and non-ELLs is higher in areas with more complex linguistic structure than in those with less complex linguistic demands. For example, the DI for ELLs in reading for grade 3 students was −53.4, suggesting that ELLs underperformed non-ELLs by 53.4 percent. For math, the DI for these students was 14.5 percent, substantially lower than those reported above for reading. Results from site 3 furnish another example. Those results are consistent with those from site 1; the data suggest a substantial performance gap between ELLs and non-ELLs. The DI for ELLs in reading for grade 5 students was −33.4, suggesting that ELLs performed 33.4 percent lower than non-ELLs as compared with a DI of −22.6 in math. The trend for post-NCLB data was quite similar to that of pre-NCLB data (Abedi, 2008), with ELLs performing substantially lower in all content areas than non-ELLs, and a greater performance gap in reading than in math, implying that language factors play a major role in this gap.

Figure 6.2 is an example of a multiple-choice assessment question from the 2007 NAEP eighth-grade math test that offers challenges for ELL students. Although this item may work well for non-ELL students, it is less likely to function well for ELL students for two reasons. First, both the stem and choices for this item are long. As indicated earlier, Abedi et al. (1997) found that ELL students performed substantially lower than their non-ELL peers on NAEP items with more than three lines of stem and more than a line on any of the choices. Second, and more important, the choices for this item have varying lengths. The first three choices are relatively short, and the last two are long. ELL students, particularly those with lower English proficiency, may select shorter choices just to be done with it, particularly if they have difficulty understanding the language in the long choices. This could create a larger performance gap between ELLs and non-ELLs, as well as affect the scoring and scaling of test items. Consequently, this may cause differential distractor selection patterns (see, for example, Abedi et al., 2008), which influence scoring of the items.

**Figure 6.2** **A NAEP Math Item for Eighth Graders**

*Source*: National Center for Education Statistics (2005, grade 8, year 2007, item 13, block M7).

In addition to the item length and distractor issues, there are linguistic complexity issues in this item for ELLs. For instance, two experts rated the grammatical complexities of this item by six linguistic features: (1) passive voice, (2) complex verb, (3) relative clause, (4) subordinate clause, (5) noun phrase, and (6) entities. These features were found to slow down the readers and make interpretation of text difficult for them (Abedi et al., 2010; Abedi & Lord, 2001). The two raters consistently identified eight instances of relative clauses, six instances of noun phrases, and six instances of entities. Furthermore, this item combines two activities that could easily be outside the cultural/linguistic understanding of ELLs: taking tests and eating fish.

A less obscure version of the mathematical concepts tested in the item is found in figure 6.3, which makes linguistic modifications that can support ELLs without changing the construct or the mathematical difficulty level of the item.

**Figure 6.3** **A Linguistically Modified NAEP Multiple-Choice Item**

Performance assessment tasks may also suffer from unnecessary linguistic complexity because students have to read test directions and contextual information in the items. These may be linguistically complex readings. Students also have to write to explain and justify their responses, adding to the linguistic demands of these tasks. The content in which linguistic complexity occurs is a major distinction between the two types of assessments in terms of linguistic structure. In standardized test items, it is difficult to differentiate between complex linguistic structures that are the target of assessment and those that are unrelated to the construct being measured. It is also difficult to ascertain whether student responses result from linguistic challenges with the prompt or with the set of multiple-choice answers, including distractors. In performance assessment tasks, complexity may occur mostly in areas not related to the construct being measured (e.g., directions, context). Therefore it is less challenging to simplify the linguistic structure in performance tasks without altering the construct being measured. Below, I discuss how this can be done.

HOW PERFORMANCE ASSESSMENTS CAN BE MADE MOST VALID FOR ELLS

Because the language of assessment is among the most influential factors affecting the outcomes of assessment for ELL students, I elaborate on the impact of such factors and offer recommendations on how to improve the quality of performance assessment with more linguistically accessible outcomes. As noted, ELL students and students with disabilities sometimes perform better on performance tasks than on multiple-choice tests. This has been the case in the New Jersey Special Review Assessments (SRAs) offered to students failing the state high school exit exam. (See chapter 3, this volume.) These open-ended performance tasks test the same standards and concepts as items on the multiple-choice test but have proved more accessible to these populations of students.

Nonetheless, in any kind of test, careful design can make a difference in validity for special populations. Research on ELL students has identified a number of linguistic features of test items that slow readers down and increase the chances of misinterpretation, among them, language load, complex linguistic structures, and length (Abedi et al., 1997). Researchers have found that linguistic modifications reducing the complexity of sentence structures and replacing unfamiliar vocabulary with more familiar words increase the performance of English learners, as well as other students in low- and average-level classes (Abedi & Lord, 2001).

Linguistic modifications can be used in designing performance assessments to help ensure a valid and fair assessment, not only for ELLs but for other students having difficulty with reading. Table 6.1 shows how a task from the New Jersey SRA can be made even more accessible with linguistic modifications, without altering the knowledge and skills being measured. These modifications reduce the length of the task by more than 25 percent (from 264 words to 184), eliminate conditional clauses and grammatical complexities (such as passive voice), and use more familiar words and concepts. Although the modified task is easier to read and understand, it still tests the same mathematics skills.

Table 6.1 Performance Task Item, Modified for Linguistic Access

Sources: For the original item: New Jersey Department of Education (2003), 2002–03 SRA Mathematics Performance Assessment Task. For the modified item: Abedi (2010).

Original Item	Linguistically Modified Item
Dorothy is running for president of the student body and wants to create campaign posters to hang throughout the school. She has determined that there are four main hallways that need six posters each. A single poster takes one person 30 minutes to create and costs a total of $1.50. What would be the total cost for Dorothy to create all the needed posters? Show your work. If two people working together can create a poster in 20 minutes, how much total time would Dorothy save by getting a friend to help her? Show your work. If Dorothy works alone for 3 hours and is then joined by her friend, calculate exactly how much total time it will take to create all the necessary posters. Show your work. Omar, Dorothy’s opponent, decided to create his posters on a Saturday and get his friends Janice and Beth to help. He knows that he can create 24 posters in 12 hours if he works alone. He also knows that Janice can create 24 posters in 10 hours and Beth can create 24 posters in 9 hours. How long will it take them, if all three of them work together, to create the 24 posters? Round all decimals to the nearest hundredth. Show your work. When Omar went to purchase his posters, he discovered that the cost of creating a poster had increased by 20 percent. How many posters will he be able to create if he wants to spend the same amount of money on his posters as Dorothy? Justify your answer.	You want to plant 6 roses in each of four large pots. Planting a single rose takes you 30 minutes and costs $1.50. What is the total cost to plant all the roses? Show your work. With a friend’s help, you can plant a rose in 20 minutes. How much total time do you save by getting a friend to help? Show your work. You work alone for 3 hours, and then a friend joins you. Now how much total time will it take to plant all the roses? Show your work. You can plant 24 roses in 12 hours. Your friend Al can plant 24 in 10 hours and your friend Kim can plant 24 in 9 hours. How long does it take the three of you to plant 24 roses together? Round all decimals to the nearest hundredth. Show your work. You just discovered that the cost of purchasing a rose increased by 20 percent. How many roses can you plant with the same amount of money that you spent when a rose cost $1.50? Justify your answer.

Original Item

Linguistically Modified Item

Dorothy is running for president of the student body and wants to create campaign posters to hang throughout the school. She has determined that there are four main hallways that need six posters each. A single poster takes one person 30 minutes to create and costs a total of $1.50.
What would be the total cost for Dorothy to create all the needed posters? Show your work.
If two people working together can create a poster in 20 minutes, how much total time would Dorothy save by getting a friend to help her? Show your work.
If Dorothy works alone for 3 hours and is then joined by her friend, calculate exactly how much total time it will take to create all the necessary posters. Show your work.
Omar, Dorothy’s opponent, decided to create his posters on a Saturday and get his friends Janice and Beth to help. He knows that he can create 24 posters in 12 hours if he works alone. He also knows that Janice can create 24 posters in 10 hours and Beth can create 24 posters in 9 hours. How long will it take them, if all three of them work together, to create the 24 posters? Round all decimals to the nearest hundredth. Show your work.
When Omar went to purchase his posters, he discovered that the cost of creating a poster had increased by 20 percent. How many posters will he be able to create if he wants to spend the same amount of money on his posters as Dorothy? Justify your answer.

You want to plant 6 roses in each of four large pots. Planting a single rose takes you 30 minutes and costs $1.50.
What is the total cost to plant all the roses? Show your work.
With a friend’s help, you can plant a rose in 20 minutes. How much total time do you save by getting a friend to help? Show your work.
You work alone for 3 hours, and then a friend joins you. Now how much total time will it take to plant all the roses? Show your work.
You can plant 24 roses in 12 hours. Your friend Al can plant 24 in 10 hours and your friend Kim can plant 24 in 9 hours. How long does it take the three of you to plant 24 roses together? Round all decimals to the nearest hundredth. Show your work.
You just discovered that the cost of purchasing a rose increased by 20 percent. How many roses can you plant with the same amount of money that you spent when a rose cost $1.50? Justify your answer.

It is important to note that linguistically sound items do not avoid technical language appropriate to the content being measured. The following example shows a performance assessment task in mathematics from the 2002–2003 New Jersey SRA, High School Proficiency Assessment (HSPA). The language in this item, which includes sophisticated mathematical terms, is related to the content being measured and has a minimum level of unnecessary linguistic complexity.

Performance assessments presented in clear language and not affected by cultural biases may offer a better opportunity for ELLs to present a valid picture of what they know and are able to do. I believe that using a performance assessment approach to measure ELLs’ content knowledge has the potential of incorporating research findings on how to reduce sources of threats to accessibility of assessment for these students.

SCORING PERFORMANCE ASSESSMENT TASKS

An exemplary performance assessment task, no matter how well written, may not produce desirable outcomes if it is not scored properly for ELL students of diverse linguistic and cultural backgrounds. Proper scoring of performance assessment tasks is of paramount importance for ELL students because correct responses in content-based areas (such as math and science) may be masked by potential difficulties in grammar and spelling. Among the components involved in scoring performance assessments are (1) creating a scoring rubric, (2) training scorers, and (3) establishing sufficient interscorer reliability. Major attention is needed in all three areas to ensure reliable and valid scoring of a performance assessment.

As Lane noted in chapter 5 (this volume), the first and the most important step to ensure an objective and valid scoring of PATs is access to a well-developed, objective, and validated scoring rubric. A detailed and objective scoring rubric always accompanies a well-designed series of performance tasks because the rubric helps to ensure accurate and consistent scoring. A set of anchor items with the rubric will help operationalize the rubric. Scoring rubrics, particularly in large-scale assessments, are often validated (Jonsson & Svingby, 2007). The process of validating scoring rubrics for use with ELL students should include clear instructions to avoid factors unrelated to the content. For example, in content-based areas such as math and science, where spelling and grammar are not the focal construct of measurement, students should not be penalized for spelling or grammar errors. A scoring rubric developed by the New Jersey SRA illustrates this point (New Jersey Department of Education, 2004). For each PAT, a four-point scoring rubric is developed with clear performance-level descriptors and clear instructions for scoring (see appendix A).

The next step in objectively scoring PATs is properly training scorers. They should have relevant knowledge and experience before participating in the training sessions. In the training session, the goal of the performance assessment should be clearly discussed, and the rubric developed and validated for this assessment should be introduced to the scorers. A set of released PATs could be used for scoring exercises, and interscorer reliability can be computed at the scene on site after each round of scoring. Low interscorer reliability should prompt more focused training. Discussion of the issues concerning scoring of PATs for subgroups such as ELL students should be included in the training session.

Establishing interscorer reliability is the final step in objective scoring. Scoring by various individuals with dissimilar backgrounds may be inconsistent if the rubric is not clear enough and, more important, if student background issues that are not directly related to the PATs intervene in the scoring process. For example, for ELLs, the students’ language background may negatively affect the level of consistency among scorers. More than one scorer should review a sample of the PATs; then the interscorer reliability should be computed using appropriate statistics such as kappa or intraclass correlation (see Abedi, 1996, for a discussion of rater reliability coefficients and how to compute them). A low interscorer reliability coefficient (kappa of .50 or below) should call for revision of the rubric and more intense training.

USING PERFORMANCE ASSESSMENTS TO IMPROVE TEACHING QUALITY

Performance assessments make a major contribution to the academic careers of students, particularly those with challenging academic lives, by informing instruction and supporting higher-quality teaching and learning (Darling-Hammond, 2007). Miller and Linn (2000) believe that “much of the impetus for [performance assessments] is that they should mirror the teaching and learning process and provide a better measure of accountability” (p. 373). The authors reported on teacher attitudes and practices toward state-mandated performance assessments in five states. The results showed that teachers tried to align instruction with state performance assessments and supported using this type of mandatory assessment to improve instruction.

Firestone, Mayrowetz, and Fairman (1998) point out that “performance-based assessment can change specific behaviors and procedures in the classroom more easily than the general paradigm for teaching a subject” (p. 111). They suggest that working closely with performance assessments may improve teachers’ instructional knowledge. Similarly, Linn et al. (1991) indicated that teaching for test items emphasizing problem solving is not limited to a single right answer; rather, there could be many ways of solving problems—an important aspect of good instruction.

Analyses of student writing reveal that students often lack understanding of expected language use in performing academic tasks (Aguirre-Munoz et al., 2006). In a study addressing this shortfall, Aguirre-Munoz et al. (2006) trained teachers using a comprehensive functional linguistic approach to academic language, which enhances a student’s own interpretation of characters and events (Christie, 1986, 2002). They found that teachers believed this approach has potential in improving ELL writing development. It offers more explicit instruction and permits performance assessment of genre-specific features of academic language to enhance reading comprehension and writing skills.

Using the outcomes from performance assessment to inform instruction could be of great value for English learners because it helps their teachers improve the quality of instruction for these students. For example, Abedi and Herman (2010) found that ELL students report a lower level of opportunity to learn, defined as content coverage, than their native-English-speaking peers, even within the same classroom. These findings may suggest that ELL students did not fully benefit from teachers’ instructions, possibly because of language issues.

INFORMING TEACHING THROUGH PERFORMANCE ASSESSMENT

The common characteristic inherent in all ELL students is their need for linguistic support. Therefore, performance assessments supplying comprehensive information on the student’s level of English proficiency can be extremely helpful in informing curriculum and instructional planning. Performance assessment outcomes focusing on English language content can furnish information needed to understand how unnecessary linguistic complexity influences assessing English learners. Performance assessments can enable students to demonstrate their knowledge unencumbered by confusion caused by linguistic complexity.

ASSESSING ENGLISH LANGUAGE PROFICIENCY

In most schools, ELL students’ knowledge of the English language is assessed to determine their level of English proficiency as required by NCLB Title III and measure their literacy level in English as required by NCLB Title I. Both measures influence ELL students’ academic careers. However, few schools have access to tools that fully capture what ELL students know and can do or that offer sufficient diagnostic information for instruction.

NCLB Title III requires assessing ELL students’ level of English proficiency in the four domains of reading, writing, listening, and speaking. These assessments should be based on the concept of academic English and aligned with states’ English language proficiency and content standards. Prior to NCLB, there were many tests for measuring ELL students’ level of English proficiency. These assessments, which were mainly composed of multiple-choice questions, had dissimilar content, measured different language constructs, and had a number of formats and psychometric properties (Abedi, 2007, 2008; Parker et al., 2009). After NCLB, several consortia of states developed and field-tested batteries of ELP tests.

Reviewers of these more recent assessments indicated areas that needed improvement (see, for example, Abedi, 2008). The listening and reading components of the tests use a traditional multiple-choice item format; the speaking and writing components are in the form of performance-based tasks and are scored against their respective rubrics. In the speaking test, tasks are administered individually in an interview format. Writing items are in a short-answer or essay format and administered in a group setting (Bauman, Boals, Cranley, Gottlieb, & Kenyon, 2007). The performance assessment format was clearly the optimal approach for these assessments.

I believe that assessment of ELL students’ English proficiency through performance assessment procedures can offer more comprehensive and useful outcomes than traditional selected response tests, including more valuable information for instruction. For instance, one could set performance assessments to directly involve ELL students in classroom conversation and encourage them to report their understanding of teachers’ instructions, particularly with complex language instruction. The outcome of this type of performance assessment would help teachers evaluate a student’s listening and speaking knowledge through observation and student presentations.

Another potential instructional tool is the think-aloud protocol. For example, Boscardin et al. (2004) designed the Language Arts Performance Assessments, requiring students to construct their own responses to open-ended prompts about literary works involving substantial integration of text-based information. They generate outcomes not easily obtained with multiple-choice or short constructed-response questions. Bass et al. (2002) used a think-aloud protocol to examine the thinking and reasoning of fourth- and eighth-grade students engaged in two hands-on science tasks in NAEP. The think-aloud protocol would be valuable for ELL students because, through this approach, teachers and others involved in educating ELL students can obtain a clearer picture of where content and language can be confounded.

CLASSROOM PERFORMANCE ASSESSMENT IN ACTION

This section presents two examples of performance assessments for ELL students that have been developed by an educator with both teaching and assessment experience. The first example focuses on applying performance assessment in reading and writing, and the second shows how to apply a performance assessment approach in social studies. The former is of particular relevance because language assessment is one of the most important aspects of ELL students’ academic careers.

In both examples, students are using language with scaffolding that allows them to engage authentically with spoken and written texts while acquiring knowledge of both language and content. These opportunities not only advance their competence, but also provide rich data to their teacher about what they know and how to guide their learning.

Reading and Writing

Adele Fiderer is a fourth-grade classroom teacher and language arts developer who has incorporated performance assessment into her teaching practice. In her reading class, Adele always encouraged students to choose their own books and read them independently, and then she would ask her students to talk and write about what they were reading. Throughout the years, she developed portfolios of each student’s best work, but she thought this was not a completely accurate measure of a student’s learning. As Adele was looking for a comprehensive assessment technique to support the natural act of reading and responding to a story, she discovered through research that the performance assessment approach would give a better picture of what she wanted to know about her students’ reading performance.

She adopted a performance assessment that required a common reading and writing task for the entire class. She begins by selecting a text that her students have not read—for example, a book discussing a story with a significant theme appropriate for her students that has a clearly identifiable problem and resolution, well-developed characters, and high interest. She then creates a writing task that encourages students to think about the story. Before they write their final draft, she gives them prewriting organizers such as webs, maps, and Venn diagrams, allowing her students to make notes about their ideas and giving them enough time to complete the performance assessment. A similar type of writing prompt for primary students would ask them to think about how to retell a story they have read to a friend. Students prewrite using a story map to outline the important parts of the story and then write about the story in final draft form.

Adele developed a rubric with ratings on a scale of 0 to 3 to evaluate children’s writings about a story problem. Within the rubric, she provides detailed information of the performance-level descriptions for all four levels. For example, to obtain a perfect score of 3, a student’s written response must be complete, indicating good understanding of the story and its problem, and it must give accurate and relevant details, information, and supportive reasoning.

Social Studies

Adele strongly believes that the process used to develop performance assessments in reading and writing can be used for other subject areas. For social studies, she begins the development process by looking for interesting nonfiction stories and articles relating directly to students’ current studies in history and social studies, and then she designs a writing prompt asking them to think in depth about the subject. She cites an example of this process as it relates to the topic of immigration within her school’s sixth-grade curriculum:

The sixth graders read “The Letters of Rosie O’Brien, a Convict in New SouthWales” (Cobblestone, April–May 1987) Performance Task Item, Modified. On one side of a planning sheet, students listed words and phrases that they thought described Rosie. On the other side of the sheet, they provided evidence from the story that backed up their opinions. The students were then asked to use their knowledge of Rosie’s character to write a letter from Rosie to her sister (Fiderer, 2009).

CONCLUSION

Assessment outcomes have a major impact on the academic careers of ELLs because they can influence a student’s classification, promotion, and graduation. The outcomes of these assessments are also used for accountability purposes, which may influence academic performance. Unfortunately, research identifies major problems with traditional statewide assessments for these students. Linguistic complexity and cultural bias may affect the outcome of assessments; therefore current assessments may not be a valid measure of what these students know and can do (Solano-Flores & Li, 2006; Solano-Flores & Trumbull, 2003; Solano-Flores, 2008).

Performance assessments are a powerful alternative to the traditional standardized achievement test. They can engage ELL students in the assessment tasks and more comprehensively demonstrate their knowledge in content-based areas. These assessments also supply more in-depth information on academic needs and create an environment for students to engage in more cognitively stimulating activities.

In addition, the outcomes of performance assessments help us understand the nature of the performance gap between ELL and non-ELL students to see whether such a contrast is due to lack of content knowledge or inadequacy in English proficiency. Performance assessments are often thought to be more engaging for students and to better reflect activities inside the classroom. As a result, they increase the level of motivation and effort for all students, particularly for ELLs, who traditionally face challenging situations in taking standardized achievement tests and need encouragement and support in their academic undertaking. Performance assessments requiring students to evaluate and solve complex problems, conduct research, write extensively, and demonstrate learning in projects are motivating for students in attaining a high level of learning.

Performance assessments can also yield information for diagnostic purposes to assess what students know. They stimulate learning activities and can help teachers decide where to start or determine which groups of students need special attention. Such strategies can monitor processing skills and problem solving as well as competence in particular areas; this is beneficial for ELL students who may not have received equal education opportunities owing to their linguistic needs (Abedi & Herman, 2010). ELL students often exhibit greater interest within the classroom and learn more when they are required to organize facts around major concepts and actively construct their own understanding of the concepts in a rich variety of contexts.

Performance assessment can contribute to the academic careers of students, particularly those with a challenging academic life, by informing instruction and supporting higher-quality teaching and learning. Such tasks are also instructional, allowing students to actively engage in worthwhile learning activities in the classroom. In performance assessment settings, students may be encouraged to seek out additional information or try a range of approaches, and in some situations they work in teams. These assessment strategies benefit ELL students because many variables affecting large-scale state and national assessments have less impact on the learning environment of the classroom.

As with many existing standardized achievement tests, performance assessment tasks may suffer from excessive language load. For example, students must read and understand directions to the tests and contexts in which they are embedded, and write and justify their responses. As this chapter shows, performance assessment tasks can be written in a linguistically accessible manner, without compromising the content and richness of the assessment items, to ensure linguistic complexity is not a source of construct-irrelevant variance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.