CHAPTER 32

Classroom Assessment of Science Learning

Beverley Bell

University of Waikato, New Zealand

In this chapter, classroom assessment of science learning is taken as that assessment done by the teacher of science, in the classroom, for formative and summative purposes, for use by the teacher and student. Other chapters are reviewing the research on assessment in science education for the purposes of accountability and international comparisons, for summative assessment for reporting to others outside the classroom and for external qualifications, and for program evaluation. In other words, classroom assessment is viewed as teacher assessment for the participants of the classroom—the teacher and students.

Classroom assessment is an important part of science teaching and learning. Its importance is indicated by the extensive and valuable reviews of classroom assessment of science learning in two previous handbooks of science education (Black, 1998a; Doran, Lawrenz, & Helgeson, 1993; Gitomer & Duschl, 1998; Parker & Rennie, 1998; Tamir, 1998). This review builds on these and other seminal reviews of classroom assessment research in education generally (Black & Wiliam, 1998a; Crooks, 1988; Natriello, 1987) to highlight the continuing as well as new trends in research on classroom assessment of science learning.

The importance of classroom assessment is also indicated in that classroom assessment is viewed as a highly skilled and complex task that is a major component of classroom teacher practice (Bell & Cowie, 2001b; Doran et al., 1993). Hattie and Jaeger (1998) assert that the factors that are most effective at improving student learning are clearly in the hands of teachers and include the giving of feedback during some assessment tasks. Most assessments of science learning are carried out by teachers of science in classrooms, it is the teacher who is responsible for either initiating or implementing changes in assessment in the classroom, and it is the teacher who has to ultimately judge the educational worth, significance, and use of different assessment practices.

This chapter reviews the literature in classroom assessment of learning in general, as well as that in science education, science education being a subset of education. The two bodies of literature form the structure of this integrative review, as does the work in early childhood, primary, secondary, and tertiary science education.

By way of an overview, the two main trends in assessment of science learning (as well as assessment of learning in general) are the following:

1. Assessment in education is moving from being viewed as using only traditional psychometric testing and psychological measurement based on a unitary trait view of intelligence and true score theory (Black, 2001), to educational assessment, that is, from assessment to prove learning to assessment to improve learning (Gipps, 1994a, 1994b). This move requires a different view of the learner; a different relationship between the pupil and assessor; an acknowledgment that context, pupil motivation, and the characteristics of the task will affect the performance (the demonstration of the competence on a particular occasion or under particular circumstances) as distinct from the competence (the basic ability to perform); and different notions of quality (Gipps, 1994b).

2. Educational assessment is being “perceived less [as] a technical matter of measurement and more a human act of judgment, albeit based on sound evidence” (Broadfoot, 2002; Harlen & James, 1997, p. 378). This concurs with Eisner's notion of “connoisseurship” (Eisner, 1985). Hence, assessment in classrooms is being seen as a teacher and student practice embedded in political, historical, social, and cultural contexts (Broadfoot, 2002).

It is acknowledged that psychometric testing and measurement and their associated technical considerations still have a place in assessment (Black, 2001; Hattie & Jaeger, 1998), and as Black (2001) stated, “the apparatus of psychometric statistics will still be needed, but in the service of new endeavours, not as keeper and promoter of the old traditions” (p. 80).

Hence, this review specifically focuses on these two new more recent and developing trends of educational assessment and assessment as a sociocultral practice of teachers and students. These two trends are evident in the political contexts of assessment, the multiple purposes for assessment, the assessment of multiple goals, assessment for formative and summative purposes, the notion of quality in assessment, and theorizing of assessment. Each of these aspects of assessment of science learning is discussed in this review.

ASSESSMENT AND ITS POLITICAL CONTEXTS

One trend evident in assessment in science classrooms (as in other classrooms) is that it continues to be determined as much by politics as it is by educational theorizing (linking assessment with learning) or psychometrics (the measuring of individual differences) (Donnelly & Jenkins, 2001; Tamir, 1998; Torrance, 2000). Assessment is a part of the political enterprise that is education (Apple, 1982, 1996; Delpit, 1995; Gipps, 1998). In other words, a researcher's findings indicating a need for change will not automatically be accepted by politicians, teachers, students, or the community. Research of itself does not necessarily instigate change. All shareholders in the practice called assessment need to be convinced of the necessity for change. What a researcher values in assessment may not be what is valued by other shareholders.

In the last 20 years, an increasing number of political views are being given voice and demanding attention in educational developments. An increasing number of different stakeholders/shareholders with legitimate interests in the purposes for which the assessments are carried out seek to influence the assessment purposes, practices, and reporting of assessment in classrooms. Shareholders may be government officials seeking to develop and implement government policy, teachers, parents, employers, and discipline authorities. Their influence is particularly visible when the assessments are high stakes; that is, there is much riding on the outcomes, such as teacher appraisals, curriculum evaluations, school funding based on the outcomes, student career paths, or student entry to further education. Accounts are given in the literature of the context of national or local requirements for certification and accountability exerting a powerful influence on the practice of assessment in classrooms in the New Zealand context (Bell, Jones, & Carr, 1995; Codd, McAlpine, & Poskitt, 1995; Crooks, 2002b; Gilmore, 2002; Hill, 1999), English context (Black, 1995a, 1995d, 1998b, 2000; Donnelly & Jenkins, 2001; Reay & Wiliam, 1999), Scottish context (Harlen, 1995a), Canadian context (Doran et al., 1993; Mawhinney, 1998; Orphwood, 1995), Korean context (Han, 1995), Australian context (Butler, 1995), and U.S. context (Atkin, Black, Coffey, & National Research Council [U.S.], Committee on Classroom Assessment and the National Science Education Standards, 2001; Berlak, 2000; Collins, 1995, 1998; Doran et al., 1993; Johnston, Guice, Baker, Malone, & Michelson, 1995).

Increasingly, teachers’ professional voice and assessment judgment making, if they are included at all, are negotiated with other shareholders, in the political process that is education (Atkin, 2002). The importance of the values held by the politicians cannot be underrated. An example would be politicians’ view of the professionalism of teachers—is this to be fostered to improve the quality of education or a barrier to be removed if quality is to be increased (Atkin et al., 2001; Atkin, 2002; Donnelly & Jenkins, 2001)? Politics can also be viewed in the post-structural sense as the discourses and grand narratives of a society or culture, and the micro-politics of the classroom (Bell, 2000; Torrance, 2000; Torrance & Pryor, 1998). In this view, all human practices and actions can be seen to have a dimension of power and hence a political aspect.

Hence, assessment policies, practices, and issues will vary from country to country or state to state, in accordance with the social, political, and historical contexts in which the assessments are done (for examples see Doran et al., 1993). Therefore, this review discusses the larger trends evident in the research literature, with the details of the debates specific to an individual country or state given in the cited references. Some or all of these trends may or may not be relevant in the country of each reader.

ASSESSMENT OF SCIENCE LEARNING FOR MULTIPLE PURPOSES

The second broad trend, derived from the first, is that the research confirms what science teachers (and other teachers) already know: that they are increasingly being asked to do assessment in their classrooms for multiple purposes. And the number of these purposes has increased with the number of different shareholders wishing to use the assessment information generated by teachers and students in classrooms for different purposes. These purposes are linked not only to the goals of education, but also to the political nature of assessment.

This international trend, of multiple purposes for classroom assessment, was brought into sharp focus in the 1990s, when politicians and others wanting to hold educationalists accountable looked to assessment to provide the information required for the accountability process, for example, to audit teacher effectiveness (Bell & Cowie, 2001b). This added to the existing demands for assessment information by people who operate outside the classroom, for example, care-givers, principals, school governing bodies, local or national government officials, awarders of national qualifications, selection panels for tertiary education programs, and employers. These multiple purposes can include auditing of schools, national monitoring, school leaver documentation, awarding of national qualifications, appraisal of teachers, curriculum evaluation, and the improvement of teaching and learning. While these purposes are often mandated by people who operate outside the classroom, the assessments themselves are often done by teachers on their behalf. There are three cornerstones of this education accountability process—a prescribed set of standards, an auditing and monitoring process to ascertain if the standards have been attained, and a way of raising standards if low standards have been indicated in the audits (Bell & Cowie, 2001b). Classroom assessment of (science) learning is seen as a way of raising standards (Atkin et al., 2001; Ministry of Education, 1993).

This more recent addition of assessment for accountability purposes is reflected in this statement of the three main purposes for assessment in education:

Assessment has multiple purposes. One purpose is to monitor educational progress or improvement. Educators, policymakers, parents and the public want to know how much students are learning compared to the standards of performance or to their peers. This purpose, often called summative assessment, is becoming more significant as states and school districts invest more resources in educational reform.

A second purpose is to provide teachers and students with feedback. The teachers can use the feedback to revise their classroom practices, and the students can use the feedback to monitor their own learning. This purpose, often called formative assessment, is also receiving greater attention with the spread of new teaching methods.

A third purpose of assessment is to drive changes in practice and policy by holding people accountable for achieving the desired reforms. This purpose, called accountability assessment, is very much in the forefront as states and school districts design systems that attach strong incentives and sanctions to performance on state and local assessments. (National Research Council, 1999)

The multiple purposes for which classroom assessment is done give rise to several issues of interest to the science education research community:

1. There can be confusion about the use of terms formative and summative assessment (Black & Wiliam, 1998a; Brookhart, 2001; Harlen & James, 1997). The terms formative and summative were first used by Scriven (1967, 1990) to distinguish between the two roles that evaluation may play in education, but the current use of these terms is as adjectives to describe the different purposes of assessment, not evaluation or assessment tasks. There is often a confusion reported in the literature over the terms formative, summative, and continuous summative assessment (Bell & Cowie, 2001b; Glover & Thomas, 1999; Harlen & James, 1997; Scriven, 1990). For example, teacher assessments, which are collected mainly for summative purposes, usually over the course of the year, are often aggregated into a single score or grade for reporting purposes. This assessment practice is called “continuous assessment” (Nitko, 1995), and although the assessments may have a weak formative role, the main purpose for the assessment information being generated is summative. In the same way, assessment for accountability purposes may have a weak formative purpose informing future teaching and learning, but the time delay is extended when compared with classroom assessments undertaken primarily for formative purposes. Harlen (1998) suggested the use of a continuum from procedures and purposes of assessment that are strongly formative through to those that have a strong summative focus.

2. The terms summative, formative, and accountability describe the purpose for which the assessment is done, not the task itself, as one assessment task might be used for both formative and summative purposes. For example, a concept-mapping task may be used for formative purposes during the teaching unit, or summatively after the teaching (Francisco, Nakhleh, Nurrenbern, & Miller, 2002; Toth, Suthers, & Lesgold, 2002), as can portfolios (Childers & Lowry, 1997; Treagust, 1995; Tripp, Murphy, Stafford, & Childers, 1997) and demonstration assessments (Deese, Ramsey, Walczyk, & Eddy, 2000). Although it is acknowledged that teachers do assessment for both formative and summative purposes (Biggs, 1998; Black, 1993, 1995b, 2002) and that the two purposes overlap, the issue arises as to whether the information collected by a teacher for formative purposes (that is, to inform the learning and teaching) can be used at a later time for summative purposes (that is, at the end of the teaching and learning) or accountability purposes. Similarly, can assessment information collected for summative purposes be used by others for accountability purposes, given concerns of ethics; quality such as reliability, validity, fairness; manageability; and the tensions between the dual roles of the teacher of advisor and adjudicator? Harlen (1998) highlighted these problems in that “assessment for formative purposes is pupil-referenced and judgments are made by pupils and teachers about next steps, whilst assessment for summative purposes requires judgments to be made by teachers against public standards or criteria” (p. 9). In the United Kingdom, as in New Zealand, the teacher judgments are often used to allocate students’ achievement to a curriculum “level.” Harlen saw no problem in using the raw information, but not the judgments, collected for formative purposes for later summative purposes. Hence, she advocated using the evidence of pupil learning but not the results (emphasis in original). A growing number of authors are arguing that the interaction of all three purposes is both possible and desirable, given the burdens of assessment on teachers and students (Biggs, 1998; Black, 1993; Gilmore, 2002; Harlen & James, 1997). A note of caution was given by Bell & Cowie (1997, 2001b), who presented research findings to indicate that the students in research science classrooms were able to determine the purpose of the assessment as being formative or summative, even if the teacher had not been explicit. Moreover, their performance was influenced by the level of disclosure they felt comfortable with, given the level of trust that existed, thereby influencing the quality of the assessment.

3. In the classroom, the practices of curriculum, pedagogy, learning, and assessment are connected, interdependent, and in tension (Carr et al., 2000), as are the purposes for doing assessment. Changes to the assessment practices in a classroom will affect the other practices and vice versa. For example:

  • curriculum change can influence assessment practices (Black & Wiliam, 1998b; Gitomer & Duschl, 1995; Orpwood, 2001);
  • assessment practices will affect curriculum, learning, and pedagogy (Buchan & Welford, 1994; Carr et al., 2000; Crooks, 2002a; Hill, 2001; Mavrommatis, 1997; Smith, Hounshell, Copolo, & Wilkerson, 1992), for example, the impact of high-stakes, standardized national testing on school learning (Black & Wiliam, 1998a);
  • for assessment practices to change, curricula and pedagogoy must change also (Black & Wiliam, 1998b; Cheng & Cheung, 2001; Dori, 2003; Treagust, Jacobowitz, Gallagher, & Parker, 2001).

In the same way, the three key purposes of assessment, formative, summative, and accountability, are interdependent and interactive:

  • Some classroom assessments for school summative purposes may be used for accountability purposes if schools are required to produce evidence that they are making a difference to audit agencies (Hill, 1999). This strong emphasis on assessment for summative and accountability purposes, and especially the high-stakes assessment, may decrease the assessment for formative purposes that teachers feel able to do (Cowie & Bell, 1999).
  • There is a tension between classroom assessments (for formative and summative purposes) and national assessment for accountability and monitoring purposes, with opportunities for doing assessment for formative purposes declining with the straitjacket of national assessment (Black & Harrison, 2001a, 2001b; Black & Wiliam, 1998a; Broadfoot, 1996; Daws & Singh, 1996; Eley & Caygill, 2001, 2002; Gipps, Brown, Mccallum, & McAlister, 1995; Keiler & Woolnough, 2002; Preece & Skinner, 1999; Smith et al., 1992; Swain, 1996, 1997). A strong emphasis on national or state assessment for accountability purposes may lead to teaching to the test or increased teaching time on the school work assessed by such tests (Gipps et al., 1995) or even less teaching time available (Gipps et al., 1995). However, the primary teachers (of science) in the Gipps et al. (1995) study on the impact of national testing became more knowledgeable in assessment; redirected the focus of their teaching, with resulting improvements in national assessments of basic skills; undertook more detailed planning; made more use of a systematic approach to collecting evidence from the students and written records than intuition; gained a better understanding of an individual's progress; and developed increased levels of discussion and collegiality with other staff.
  • Multiple purposes for assessment means that there are multiple audiences, and it raises the issue of whether one assessment task can provide information for several assessment purposes and audiences (Black, 1998a).

Research on assessment for accountability purposes is reviewed in Chapters 31 and 33. The multiple purposes of classroom assessment of science learning are seen as giving rise to two broad categories of assessment: assessment for formative (including diagnostic) and summative purposes. These two purposes are discussed in the fourth and fifth sections.

ASSESSMENT AND MULTIPLE GOALS OF SCIENCE LEARNING

What is assessed? is an important question, as it links assessment strategies, learning goals, and curriculum together and is part of what is called “validity inquiry in the psychometric literature” (Messick, 1989, 1994). In the 1993 review of assessment of science learning, the list of what is assessed was given as knowledge of facts and concepts, science process skills, higher order science thinking skills, problem-solving skills, skills needed to manipulate laboratory equipment, and attitudes of science (Doran et al., 1993). A strong criticism of assessment in the past has been that only learning goals that could be readily assessed, say by use of recall to answer multiple-choice tests, were assessed with a subsequent negative impact of the curriculum, pedagogy, learning, and learners in the classroom (Crooks, 1988). There is now the recognition that all learning goals need to be assessed, and not just recall and understanding of science concepts because they are easy to test for (Osborne & Ratcliffe, 2002). The additional science learning goals to be assessed include:

  • the nature of science views held by students (Aikenhead, 1987; Lederman, Abd-El-Khalick, Bell, & Schwartz, 2002);
  • what matters in the discipline of science (Gitomer & Duschl, 1995): knowledge and skills that are deemed important within the discipline, that is, knowledge and science experiences through investigation procedures similar to those scientists employ; the changing nature of scientific knowledge being acquired from an investigation; the accepted rules of practice that guide scientific practice; meanings and background knowledge of the scientific discipline;
  • knowing that science is culturally and historically embedded, and contextualized (Fusco & Barton, 2001);
  • the ideas and evidence—the processes and practices—of science, that is, how we know as well as what we know (Osborne & Ratcliffe, 2002);
  • learning dispositions (habits of mind), such as resilience, playfulness, reciprocity, curiosity, friendliness, being bossy; confidence, curiosity, intentionality, self-control, relatedness, communication, cooperation, courage, playfulness, perseverance, responsibility, selectivity, experimentation, reflection, opportunism, and conviviality (Allal, 2002; Carr, 2001; Carr & Claxton, 2002), as well as effective learning skills in the learning-to-learn literature (Baird & Northfield, 1992).

In the next section, the research on assessment of science learning for formative purposes is reviewed.

ASSESSMENT OF SCIENCE LEARNING FOR FORMATIVE PURPOSES

Another trend in the research on classroom assessment is the increasing emphasis on assessment for formative purposes. This trend has arisen because of recent political demands for increased accountability of teachers for learning outcomes, the research on the role of feedback in learning and teaching, and the research on teaching and assessment for conceptual development in science education (Bell & Cowie, 2001b; Treagust et al., 2001).

Definitions and Characteristics

Formative assessment is increasingly being used to refer only to assessment that provides feedback to students (and teachers) about the learning that is occurring, during the teaching and learning, and not after (Cowie, 1997). The feedback or dialogue is seen as an essential component of formative assessment interaction, where the intention is to support learning (Black, 1995b; Black & Wiliam, 1998a; Gipps, 1994a; Hattie, 1999; Hattie & Jaeger, 1998; Perrenoud, 1998; Ramaprasad, 1983; Sadler, 1989). And assessment can be considered formative only if it results in action by the teacher and students to enhance student learning (Black, 1993). These components are reflected in various definitions of formative assessment, for example, “The process used by teachers and students to recognise and respond to student learning in order to enhance that learning, during the learning” (Bell & Cowie, 2001b, p. 8). It is through the teacher-student interactions during learning activities (Newmann, Griffin, & Cole, 1989) that formative assessment is done and that students receive feedback on what they know, understand, and can do and receive teaching to learn further. Formative assessment is at the intersection of teaching and learning (Gipps, 1994a), and, in this way, teaching, learning, and assessment are integrated in the curriculum (Hattie & Jaeger, 1998). The term formative interaction (Jones, Cowie, & Moreland, 2003; Moreland, Jones, & Northover, 2001) may be used instead of formative assessment to highlight this interactive nature of formative assessment—that teacher-student interactions are the core of formative assessments. Assessment for diagnostic purposes, for example (Barker & Carr, 1989; Feltham & Downs, 2002; Simpson, 1993), is therefore included, as is embedded assessment (Treagust et al., 2001; Volkmann & Abell, 2003). Harlen and James (1997), in a review of literature, summarized that the characteristics of formative assessment, to distinguish it from summative assessment, are that it is positive, is a part of teaching, takes into account the progress of students, can elicit inconsistencies that can provide diagnostic information, places more value on validity and usefulness than reliability, and requires the students to be actively involved in monitoring their own progress and improving their learning. Harlen (1998) describes assessment for formative purposes as that which is “embedded in a pedagogy of which it is an essential part; shares learning goals with students; involves students in self-assessment; provides feedback which leads to students recognizing ‘the gap’ and closing it; underpinned by confidence that every student can improve; and involves reviewing and reflecting on assessment data” (p. 3).

Bell and Cowie (1997, 2001b), in reporting the findings of a major research project, summarized the characteristics of formative assessment on the basis of the qualitative data generated by the 10 teachers of year 7–8 students (aged 10–14) in a two-year research project into classroom assessment and science education. Over the two years, the teachers were asked to describe what it was that they were doing, when they were doing assessment for formative purposes. Their ability to articulate and make explicit this often tacit practice increased over the two years as a shared understanding and use of a shared language also grew. The nine characteristics of formative assessment discussed by the teachers were that it was responsive (that is, dynamic and progressive, informal, interactive, unplanned as well as planned, proactive as well as reactive, responding with individuals and with the whole class, involves uncertainty and risk taking, and has degrees of responsiveness); uses written, oral, and nonverbal sources of evidence; is a tacit process; uses professional knowledge and experiences; is an integral part of teaching and learning; is done by both teachers and students to improve teaching as well as learning; is highly contextualized; and involves managing dilemmas (Bell & Cowie, 1997, 2001a, 2001b).

A descriptive and interpretive account of some characteristics of formative assessment was also given by Treagust et al. (2001). In this study, the classroom practices of an acknowledged exemplary U.S. middle-school teacher of 23 grade 8 students studying sound were researched with an interpretive research methodology over a period of 3 weeks. The researchers explored and documented how the teacher incorporated assessment tasks as an integral component of her teaching about the topic sound. They reported that the teacher used “the information to inform her teaching, nearly every activity had an assessment component integrated into it, that students had a wide range of opportunities to express their knowledge and understanding through writing tasks and oral questioning, and that individual students responded to and benefited from the different assessment techniques in various ways” (p. 137).

Importance of Formative Assessment

Formative assessment, like assessment in general, does influence learning (Crooks, 2002a; Gipps & James, 1998). The case for formative assessment was made in a report commissioned by the British Educational Research Association to argue the case for raising achievement through the use of assessment for formative purposes, rather than through large-scale testing for accountability purposes. This seminal review of the research reported in 578 articles, by Black and Wiliam (1998a, 1998b), states the importance of formative assessment for learning as “The research reported here shows conclusively that formative assessment does improve learning. The gains in achievement appear to be quite considerable, and as noted earlier, amongst the largest ever reported for educational interventions” (p. 61).

The science education research included in the Black and Wiliam review, along with others to provide evidence to back up this knowledge claim, was that of Fred-eriksen and White (1997). Likewise, Hattie (1999) concluded his meta-analysis to evaluate the relative effects of different teaching approaches and different components of teaching by stating that the single most powerful moderator that enhances achievement is feedback.

Having reviewed the literature to document the evidence that formative assessment can indeed raise standards Black and Wiliam (1998a) then addressed the question, “is there evidence that there is room for improvement?” They concluded that there is research evidence “that formative assessment is not well understood by teachers and is weak in practice; that the context of national or local requirements for certification and accountability will exert a powerful influence on its practice; and that its implementation calls for rather deep changes both in teachers’ perceptions of their own role in relation to their students and in their classroom practice” (Black & Wiliam, 1998a, p. 20). The science education literature to support this knowledge claim is found in a number of studies (Black, 1993; Bol & Strage, 1996; Daws & Singh, 1996; Duschl & Gitomer, 1997; Lorsbach, Tobin, Briscoe, & LaMaster, 1992).

Although there has been much advocacy by science educators on the importance of formative assessment to improve learning and standards of achievement (for example, Atkin et al., 2001; Black, 1995b, 1995c, 1998a; Black & Wiliam, 1998a, 1998b; Harlen, 1995b; Harlen & James, 1997; Hunt & Pellegrino, 2002), there have been only a few actual research studies on the process of formative assessment of science learning. These are now reviewed.

Models of Formative Assessment

In addition to the empirical research on formative assessment in the general assessment research literature, for example (Torrance & Pryor, 1998, 2001) with their work on formative assessment in UK primary classrooms (of students aged 4–7), there are three major research studies in science education: Bell and Cowie (2001b), Cowie (2000), and Treagust et al. (2001). The first study sought to research the nature of assessment for formative purposes so as to make the often tacit knowledge of teachers explicit. In this way, the research could help teachers to share their knowledge of this highly skilled pedagogy during teacher development situations. The findings are summarized in a model of formative assessment (Bell, 2000; Bell & Cowie, 2001b; Cowie & Bell, 1999). This model is notable in that it was developed by the teachers involved in the research project (Bell & Cowie, 1999). The 10 teachers were asked to develop a model that would communicate to teachers not involved in the research what it was that they were doing when they were doing formative assessment. The primary teachers in the research by Torrance and Pryor (2001) similarly developed a “practical classroom model.” The teachers within the Bell and Cowie study reported that they undertook two forms of formative assessment: planned formative assessment and interactive formative assessment.

The process of planned formative assessment was characterized by the teachers eliciting, interpreting, and acting on assessment information. The main purpose for which the teachers said they used planned formative assessment was to obtain information from the whole class about progress in learning science as specified in the curriculum. This assessment was planned in that the teacher had planned to undertake a specific activity (for example, a survey or brainstorming) to obtain assessment information on which some action would be taken. The teachers considered the information collected as a part of the planned formative assessment was “general,” “blunt,” and concerned their “big” purposes. It gave them information that was valuable in informing their interactions with the class as a whole with respect to “getting through the curriculum.” This form of formative assessment was planned by the teacher mainly to obtain feedback to inform her or his teaching. The purpose for doing the assessment strongly influenced the other three aspects of the planned formative assessment process of eliciting information, interpreting, and taking action to enhance the students’ learning. Acting on the interpreted information is the essential aspect of formative assessment that distinguishes it from continuous summative assessment. To do this, the teacher needed to plan to have a flexible program and to allow for ways in which she or he could act in response to the information gathered. It also helped to be able to act in a variety of ways in response to that gathered information.

The second form of formative assessment identified by the teachers was interactive formative assessment, which can be characterized as the teachers’ noticing, recognizing, and responding. Interactive formative assessment was what took place during student-teacher interactions. It differed from the first form—planned formative assessment—in that a specific assessment activity was not planned. The interactive assessment arose out of a learning activity. Hence, the details of this kind of formative assessment were not planned and could not be anticipated. Although the teachers often planned or prepared to do interactive formative assessment, they could not plan for or predict what exactly they and the students would be doing, or when it would occur. As interactive formative assessment occurred during student-teacher interaction, it had the potential to occur any time students and teachers interacted. The teachers and students within the project interacted in whole-class, small-group, and one-to-one situations. The main purpose for which the teachers said they did interactive formative assessment was to mediate in the learning of individual students with respect the intended learning. The teachers’ specific purposes for interactive formative assessment emerged in response to what sense they found the students were making. Interactive formative assessment was therefore embedded in and strongly linked to learning and teaching activities. The teachers indicated that through their interactive formative assessment, they refined their short-term goals for the students’ learning within the framework of their long-term goals. The teachers indicated that their purposes for learning could be delayed and negotiated between the teacher and the students through formative assessment feedback. The teachers described interactive formative assessment as teacher and student driven rather than curriculum driven. The response to what the teacher had noticed and recognized was the essential aspect of interactive formative assessment. The response by the teachers was similar to the acting in planned formative assessment, except that the time frame was different—it was more immediate. Within the process of interactive formative assessment, the teachers often had to make quick decisions in circumstances in which they did not have all the necessary information, using “teacher wisdom” rather than intuition or instinct (Jaworski, 1994).

The teachers in the study commented that the two kinds of formative assessment were linked through the purposes for formative assessment; that some teachers used interactive formative assessment more than other teachers; and that a teacher moved from planned to interactive and back. The link between the two parts of the model was seen to be centered around the purposes for doing formative assessment. It is of interest to note that the teachers in the Torrance and Pryor (2001) study placed the “making task and quality criteria explicit” in the center of their classroom practice model, which is linked to purpose. The teachers in the new Zealand study indicated that their assessment for formative purposes tended to be decreased if there was too much emphasis on assessment for summative and accountability purposes (Bell & Cowie, 2001b), reflecting the views of Black (1995c), Harlen and James (1997), and Hill (1999).

The key features of the model are now discussed, in association with other research on formative assessment. First, the model of formative assessment developed by the teachers included the notion of planning, which has also been highlighted by other researchers (Fairbrother, Black, & Gill, 1995; Harlen, 1995b; Torrance & Pryor, 1995).

A second key feature of the model is that formative assessment is described as a complex, highly skilled task, and as it is in other research (Torrance & Pryor, 1998), which relied on the following knowledge bases (Shulman, 1987): content knowledge (for example, knowing the scientific understanding of the concepts being taught); general pedagogical knowledge (for example, of classroom management); curriculum knowledge (for example, of the learning objectives in the curriculum being taught); pedagogical content knowledge (for example, knowing how best to teach atomic theory to a class of 14-year-olds); a knowledge about learners in general and the students in the class; knowledge of educational contexts (for example, the assessment practices in the school); and a knowledge of educational aims and purposes (for example, a possible “science-for-all” emphasis in a national curriculum). To this list, the teachers’ knowledge of progression in students’ learning of specific concepts can be added (Bell & Cowie, 2001c; Jones et al., 2003). The formative assessment also relied upon the processes of pedagogical reasoning and action (Shulman, 1987), including the transformation of content knowledge into pedagogical knowledge, through preparation, representation, selection, and adaptation. The teachers felt the use of both forms of formative assessment, and switching between them was the hallmark of a competent teacher.

A third key feature of the model is teachers’ interaction skills, and the nature of the relationships they had established with the students was also seen as important. It was felt that the teachers needed a disposition to carry out interactive formative assessment; that is, the teachers needed to value and want to interact with the students to find out what they were thinking about. Cowie (2000) also commented on the relationships that developed between the teachers and students as individuals, groups, and a class. Mutual trust and respect were identified as the key factors mediating student willingness to disclose their ideas to teachers and peers, and hence enable formative assessment interactions to occur. Hence the findings support the contention by Tittle (1994) that the views and beliefs of the interpreters and users of assessment information (here teachers and students) are an important dimension of any theory of educational assessment.

A fourth key feature of the model is the central role given to purpose in both forms of formative assessment by the teachers.

A fifth key feature of the model is the action taken as part of both planned and interactive formative assessment, for it distinguishes assessment for formative purposes from that for summative and accountability. The action means that formative assessment can be described as an integral part of teaching and learning and that it is responsive to students. Much of the current literature, for example, Driver, Squires, Rushworth, and Wood-Robinson (1994), on conceptual development in science education involves a consideration of the teacher being responsive to the thinking of students, often phrased as “taking into account students’ thinking.” To respond to and mediate students’ thinking involves the teacher finding out what the thinking is, evaluating the thinking, and responding to it. These are the three components in both planned and interactive formative assessment. The teachers in the research made the claim that they did not think they could promote learning in science unless they were doing formative assessment (Bell & Cowie, 1997). The role of the teacher included providing opportunities for formative assessment to be done (for example, having the students discuss in small groups their and scientists’ meanings of “electric current,” rather than listening only to a lecture by the teacher) and using the opportunity to do formative assessment (for example, interacting with the students while they are doing small-group discussion work about their conceptual understandings of electric currents). In addition, the action taken as part of both planned and interactive formative assessment was seen by the teachers as a part of teaching and by the students as a part of learning. The teachers acted and responded on the assessment information they obtained in science (criterion-referenced), student (ipsative), and care-referenced ways. In the care-referenced actions, the teachers took action to sustain and enhance the quality of interactions and relationships between the students and between themselves and the students. Other research has also noted the dual ipsative and criterion-referenced nature of formative assessment (Harlen & James, 1997) and the care aspect of formative assessment (Treagust et al., 2001). A single action or response might have one or more of these aspects in it. It was the action part of planned and interactive formative assessment that the teachers felt they needed more help with in future teacher development.

One important aspect of the “taking action” is the feedback to the student from the teacher or another student. The feedback is more effective in improving learning outcomes if it is about the substance of the work, not superficial aspects (Crooks, 1988; Harlen, 1999); linked with goal setting (Black & Wiliam, 1998a; Gipps & Tun-stall, 1996b; Hattie, 1999; Hattie & Jaeger, 1998); and linked to the students’ strengths and weaknesses of the task, rather than to just the self, as in praise (Black & Wiliam, 1998a; Hattie, Biggs, & Purdie, 1996). The quality of the feedback may involve a comparison between the students’ achievement or performance and other students’ (norm-referenced), standards or learning goals (critierion-referenced), or the student's previous achievements (ipsative). In assessment for formative purposes the ipsative frame of references for feedback is important. Another important aspect of the taking action part of the formative assessment process has become known as “feedforward,” to distinguish it from feedback (Bell & Cowie, 2001b). Whereas “feedback” was used to refer to the response given to the student by the teacher (Sadler, 1989) or another student about the correctness of their learning, the term “feedforward” was used to refer to those aspects of formative assessment in which the teacher was helping students to close the gap between what they know and can do, and what is required of them as indicated in the standards or curriculum objectives (Sadler, 1989). Hence, to provide both feedback and feedforward a teacher must know the curriculum content and standards or curriculum objectives, the progression of students’ learning, and the scaffolding required for learning in the Zone of Proximal development, after Vygotsky (Allal & Ducrey, 2000; Torrance & Pryor, 1998).

A sixth key feature of the teachers’ model of formative assessment was the central role of self-assessment and self-monitoring. This is distinct from feedback, which is given by another person (Sadler, 1989). Research on this aspect of assessment for formative purposes was also reviewed in the meta-analysis by Hattie et al. (1996), who concluded that interventions, which are integrated to suit the individual's self- assessment, orchestrated to the demands of the particular task and context, and self-regulated with discretion, were “highly effective in all domains (performance, study skills and affect) over all ages and abilities, but were particularly useful with high-ability and older students” (p. 128). And they were more effective than the typical study skills training packages. To be able to give effective feedback and feed-forward, the research by Jones et al. (2003), Moreland and Jones (2000), and Moreland et al. (2001) with teachers of technology indicated that the pedagogical content knowledge as well as pedagogical approaches of teachers had to be enhanced. This is so the teachers could make a judgment about where the student's learning is in relation to the intended curriculum learning goals, communicate this to the student, and suggest steps for the student to improve his or her learning, based on their knowledge of progression in learning a specific skill or concept.

The following have been suggested for teachers and students, on the basis of the review of the research evidence, as interventions to improve the use of assessment for formative purposes: “feedback to any pupil should be about the particular quality of his or her work, with advice on what he or she can do to improve, and should avoid comparisons with other pupils“; “for formative assessment to be productive, pupils should be trained in self-assessment so that they can understand the main purposes of their learning and thereby grasp what they need to achieve“; “opportunities for pupils to express their understanding should be designed into any piece of teaching, for this will initiate the interaction whereby formative assessment aids learning“; “the dialogue between pupils and a teacher should be thoughtful, reflective, focused to evoke and explore understandings, and conducted so that all pupils have an opportunity to think and to express their ideas“; “tests and homework exercises can be an invaluable guide to learning, but the exercises must be clear and relevant to learning aims. The feedback on them should give each pupil guidance on how to improve, and each must be given opportunity and help to work at the improvement” (Black & Wiliam, 1998b, pp. 9–13). This was later summarized to the following:

The research indicates that improving learning through assessment depends on five, deceptively simple key factors: the provision of effective feedback to pupils; the active involvement of students in their own learning; adjusting teaching to take account of the results of assessment; a recognition of the profound influence assessment has on the motivation and self-esteem of pupils, both of which are crucial influences on learning; and the need for pupils to be able to assess themselves and understand how to improve. (Assessment Reform Group, 1999) (p. 4), and;

sharing learning goals with pupils; involving pupils in self-assessment; providing feedback which leads to pupils recognizing their next steps and how to take them; underpinned by confidence that every pupil can improve. (Assessment Reform Group, 1999, p. 7)

Publications for teachers, based on the above research reviews, to help them improve the assessment for formative purposes in the classroom, have been produced (Atkin et al., 2001; Clarke, 1998; Clarke, Timperley, & Hattie, 2003). For example, the latter publication has a chapter on each of the components of formative assessment as:

clarifying learning intentions at the planning stage, as a condition for formative assessment to take place in the classroom; sharing learning intentions at the beginning of lessons; involving children in self-evaluation against learning intentions; focusing oral and written feedback around the learning intentions of lessons and tasks; organizing individual target setting so that children's achievement is based on previous achievement as well as aiming for the next level up; appropriate questioning; and raising children's self-esteem via the language of the classroom and the ways achievement is celebrated. (Clarke et al., 2003, p. 14)

The last key feature of the model is the teacher development that occurred in its development by the teachers (Bell & Cowie, 1997; Bell & Cowie, 2001c), providing some information to answer Black and Wiliam's question, “Is there evidence about how to improve formative assessment?” Information has also been provided by other researchers, using collaborative action-research (Torrance & Pryor, 2001), reflective surveys (Black & Harrison, 2001a, 2001b), and reflection on teachers’ knowledge bases (Jones et al., 2003). A notable feature of the literature is that teacher development for assessment for formative purposes also involves changing one's overall pedagogy, not just the assessment aspects (Ash & Levitt, 2003; Bell & Cowie, 2001c; Black & Wiliam, 1998a, 1998b; Treagust et al., 2001).

Research on Students and Assessment for Formative Purposes

Students have considerable agency in the practice of formative assessment. But although feedback, feedforward, and self-assessment position both the teacher and student as taking action during formative assessment, the core activity of formative assessment lies in “the sequence of two actions. The first is the perception by the learner of a gap between the desired goal an his or her present state (of knowledge, and/or understanding, and/or skill). The second is the action taken by the learner to close that gap in order to attain the desired goal (Ramaprasad, 1983; Sadler, 1989)” (Black & Wiliam, 1998a) (emphases added).

There is a growing interest in the wider education research literature in the views of students on teaching, learning, and assessment (as distinguished from their views on the subject matter content), for example, Heady (2001) and Morgan and Morris (1999). Although there have been reviews on the impact of assessment on students (Crooks, 1988; Hattie & Jaeger, 1998; Natriello, 1987), there has been little research until recently on students’ views of assessment (Brookhart, 2001; Gipps & Tunstall, 1996a; Jones et al., 2003). Two studies are worthy of discussion here. In the primary school, Pollard, Triggs, Broadfoot, McNess, and Osborn (2000) reported on the findings of the Primary Assessment, Curriculum and Experience (PACE) project in the United Kiingdom, stating that “the picture of children's experience of classroom assessment that emerges from different sections of the pupil interviews is remarkably consistent. They are aware of assessment only as a summative activity and use criteria of neatness, correctness, quantity and effort when commenting on their own and others work” (p. 152). In the research by Brookhart (2001) the able students did not keep distinct the formative and summative purposes of assessment, but rather these successful students integrated the two. However, there has been research where students have been interviewed on their views of science assessment for formative purposes (Cowie, 2000). In this doctoral research, 75 students (years 7 to 10, or ages 11 to 14) were interviewed in either individual or group situations about their views on classroom assessment. The findings indicate that the students constructed themselves as active and intentional participants in learning, its assessment, and their self-assessment of it. The criteria for judging the success of their learning, reported by students, included the ability to perform a task, gaining good marks or grades, the teacher confirming their ideas were correct, and feelings of completeness and coherence. Another finding was that students viewed formative assessment as embedded in and accomplished through interaction with teachers, peers, and parents. Disclosure was another aspect of students’ views of formative assessment. The students were very aware that their questions, actions, and book work had the potential to disclose not only what they knew but also what they did not know, to peers and teachers, who may or may not make positive judgments and actions on the basis of these disclosures. Students indicated that they withheld disclosure if the classroom was not safe, doing so only in a trusting relationship with the teacher and peers. This would influence not just the validity but the essence of formative assessment. Student disclosure is central to formative assessment, and participation in assessment interactions could lead to both benefit and harm in learning, social, and relationship constructions. Torrance and Pryor (1998) described the teacher during formative assessment as using power-with and power-for students in their learning.

Cowie (2000) also detailed the ways in which student perspectives of formative assessment in the classroom contributed to the mutual construction of what it means to be a student and a teacher in that classroom, that is, notions of identity. For example, “student perceptions that time and attention were limited and teachers assessed what was important to them, meant that teacher assessment served to communicate to the students who and what was important to the teacher” (p. 260); “for students, the key feature of formative assessment as a meaning making activity was that it contributed to their identity in the classroom … the students contended that disclosing their ideas in an attempt to enhance their understanding could lead their peers and the teacher to perceive them as ‘try hards’, ‘bright’ or ‘dumb’ and to their learning being enhanced or them being embarrassed and feeling stupid. They indicated that for them, assessment and learning were intimately connected and inherently linked with who they were and how they felt” (p. 261). The students and teachers in Cowie's study were seen as actors (that is, taking action) in formative assessment. The students were actors in formative assessment in three ways: their academic and social goals and interests mediated their interactions; they sought to manage the disclosure of their learning by choosing (or not choosing) to ask questions and by acting to restrict teachers’ incidental access to their book work; and they assessed the teacher to ascertain how the teacher reacted to their questions and therefore to find out what was seen as important by the teacher. In summary, Cowie (2000) stated that the students were both active in the formative assessment process and profoundly affected by it, as did Reay and Wiliam (1999). The students in the Cowie (2000) study insisted that

their teachers could only assess their learning through face-to-face interaction with them. Face-to-face interaction was considered to enhance the fidelity (Wiliam, 1992) of teacher formative assessment, because students could negotiate the meaning of teacher questions and because students were more prepared to ask questions, thereby disclosing their views. Teachers were said to provide more useful feedback during one-to-one and small group interactions. (p. 267)

Students’ views of assessment are also embedded in use of the questions generated by students as assessment information for diagnostic and formative purposes (Biddulph, 1989; Rop, 2002; Zeegers, 2003).

ASSESSMENT OF SCIENCE LEARNING FOR SUMMATIVE PURPOSES

A fifth trend in classroom assessment of science learning is the ongoing research and development of assessment for summative purposes. Classroom assessment of science learning for summative purposes is that which summarizes the learning achieved after teaching is completed and includes end-of-unit tests, teacher assessments for qualifications, and teacher assessments for reporting to parents, caregivers, and others outside the classroom. Whereas assessment for formative purposes is to improve learning, assessment for summative purposes is assessment of learning (Crooks, 2001). Included in this section is ongoing or continuous summative assessments, in which a series of short summative assessments is aggregated in some way, usually to reduce the assessment information into a single score or grade (Harlen, 1998).

In the past decade, the bulk of the development on classroom assessment of science learning has been largely about assessment for summative purposes and has been concerned with developing the summative assessment of both a wider range of science learning outcomes and the use of a wider range of assessment tasks or formats, in order to increase the quality of the summative assessments, especially validity. This trend is noticeable in the handbook-type publications for teachers on classroom assessment (for example, Phye, 1997) and on assessment of science learning (Enger & Yager, 1998; Mintzes, Wandersee, & Novak, 2001, 1999; Shepardson, 2001); exemplars of assessment as curriculum support (Ministry of Education, downloaded 2003); released assessment items and information from assessment for national and international accountability (for example, Crooks & Flockton, 1996; Eley & Caygill, 2001); resource banks (Gilmore & Hattie, 2001; Marston & Croft, 1999); pre-post-assessment items used in research into science learning (Barker & Carr, 1989); and released national/state examinations papers.

The last decade of research of science learning for summative purposes has included researching the assessment of a wider range of learning outcomes using performance assessment and researching the use of a wider range of assessment formats, including performance assessment, concept maps, portfolios, interviews, prediction tasks, learning stories, observations, dynamic assessments, experimental and customized challenges, group assessment, and computer-based assessment. These are now discussed.

THE ASSESSMENT OF A WIDER RANGE OF LEARNING OUTCOMES USING PERFORMANCE ASSESSMENT

Performance assessment to assess a wider range of learning goals, in diverse learning situations (Solano-Flores & Shavelson, 1997; Stoddart, Abrams, Gasper, & Canaday, 2000), has been a focus of research on classroom assessment of science learning. It was felt that traditional pen-paper tests, requiring recall and recognition of knowledge “about,” did not validly assess the learning of some goals of science education (Fusco & Barton, 2001). Performance assessments were developed to assess production or performance and hence enable the assessment of all the curricula goals, not just those that were readily assessed by multiple choice or short answers. Performance-based assessment has been defined as “the execution of some task or process which has to be assessed through actual demonstration, that is, a productive activity (Wiggins, 1993)” (Cumming & Maxwell, 1999, p. 180). It is the assessment of actual performance showing what a student can do, rather than what a student can skillfully recall. The term may also include an emphasis on the integration of knowledge, practices, holistic applications (that is, to the whole and not just separate parts); multiple opportunities for teaching and learning (Wolfe, Bixby, Glenn, & Gardner, 1991); assessment in contexts that mirror real-life science or science in everyday life (Lubben & Ramsden, 1998); collaborative inquiry, problem solving, co-construction of understandings, and knowledge-building communities (Fusco & Barton, 2001; Rogoff, 1990); and many different kinds of performance, rather than just one kind (Eisner, 1993; McGinn & Roth, 1998). The literature indicates that performance assessments are highly sensitive, not only to the tasks and the occasions sampled, but also to the method (Ruiz-Primo & Shavelson, 1996b) and the kinds of knowledges that students need to access to complete a performance task (Erickson & Meyer, 1998).

The use of performance assessment has been most notable in the assessment for summative purposes of laboratory or “practical” work, not by the use of pencil-and-paper assessment of knowledge-about and knowledge-how-to, but by assessment of performance (Bednarski, 2003; DeTure, Fraser, Giddings, & Doran, 1995; Doran et al., 1993; Erickson & Meyer, 1998; Fairbrother, 1993; Gott, Welford, & Foulds, 1998; Harlen, 1999; Stefani & Tariq, 1996; Tariq, Stefani, Butcher, & Heylings, 1998). Critiques in the research of the use of performance assessments to assess investigative work include the influence of the always-present content and context in the assessment task (Harlen, 1999); the use of the visiting examiner when the classroom performance assessments are being done for national qualifications (Kennedy & Bennett, 2001); the need to select an investigation that could satisfy the assessment criteria, the perceived tension between the assessment and teaching roles of the teacher and the allocation of marks (Lubben & Ramsden, 1998); calls for caution when using performance-based assessments concerning the establishment of their validity and reliability (for example, Hattie & Jaeger, 1998; Shaw, 1997); the need to address “the psychometric findings that have highlighted the importance of effective scoring protocols, the judgments used in setting cut-offs and standards, and the importance of ensuring the construct representation of performance tests particularly given the costs and the hazards of construct under-representation of these tests” (Hattie & Jaeger, 1998); and a need to expand the current visions of performance assessment (Fusco & Barton, 2001) to include three ideals encompassed in critical, inclusive, feminist, and multicultural views of science education: that “performance/assessment addresses the value-laden decisions about what and whose science is learned and assessed and include multiple world-views, that performance assessment in science simultaneously emerges in response to local needs, and that the performance/assessment is a method as well as an ongoing search for method” (Fusco & Barton, 2001, p. 339).

RESEARCHING A WIDER RANGE OF ASSESSMENT FORMATS

A second aspect of research on classroom assessment of science learning for summative purposes is research on the use of a wider range of assessment task formats, often called alternative assessments, including performance assessments, concept maps, portfolios, interviews including think-aloud protocols, learning stories, observational methods, dynamic assessment, self-assessment, and self-reports (Dori, 2003). The main rationale for widening the range of assessment tasks has been to match the means of generating assessment information with the learning outcomes, or standards, thereby increasing the validity; the wider range of learning goals being assessed requires different assessment tasks to maintain validity and to develop authentic assessments in that they are embedded in the teaching and learning and not disconnected with it. Hence, there has been an increased use of the terms “school-based,” “alternative,” “embedded,” and “authentic” assessments (Dori, 2003), as well as “performance assessments,” “problem-based assessments,” and “competence-based assessments” under the general umbrella of “authentic assessments” (Cumming & Maxwell, 1999) to describe this wider range of assessment task formats. Their use has not been limited to the classroom; they are also included in nationwide assessments (Dori, 2003). Examples of the research on this wider range of assessment task formats are now discussed.

Concept Maps

Previous reviews of concept mapping as an assessment tool provide a valuable introduction to their use in the classroom assessment of science learning (Edmondson, 1999; Fisher, Wandersee, & Moody, 2000; White & Gunstone, 1992). Concept maps have been researched for use for assessment for diagnostic and formative purposes (for example, Treagust, 1995; Childers & Lowry, 1997; Tripp et al., 1997); as a research tool (for example, Wallace & Mintzes, 1990); and as a tool for assessment for summative purposes (for example, Barenholz & Tamir, 1992; Childers & Lowry, 1997; Kinchin, 2001; Liu & Hinchey, 1996; McClure, Sonak, & Suen, 1999; Mintzes et al., 1999; Rice, Ryan, & Samson, 1998; Roth & Roychoudhury, 1993; Ruiz-Primo & Shavelson, 1996a; Stoddart et al., 2000; Wilson, 1996). Ruiz-Primo and Shavelson (1996a) reviewed the literature on the use of concept maps in science assessment and described a concept map used as an assessment tool as: “(a) a task that elicits evidence bearing on a student's knowledge structure in a domain, (b) a format for the student's response, and (c) a scoring system by which the student's concept map can be evaluated accurately and consistently” (p. 569).

A concern raised in these studies is about the validity and reliability of the concept maps as assessment tools, and in particular the use of a scoring rubric and/or the comparison/correlation between concept map scores and those on conventional tests (Kinchin, 2001; Liu & Hinchey, 1996; McClure et al., 1999; Rye & Rubba, 2002; White & Gunstone, 1992). For example, Stoddart et al. (2000) documented the development and evaluation of concept mapping as an assessment tool for summative purposes. They document in some detail the development of a concept-mapping method for a specific learning activity, using a scoring rubric, which extracts quantitative information about the quality of understanding from each map in three stages: vocabulary review, content scoring, and a content validity check. Hence, they did not use measures of the elaborateness of the maps or the number of links and map components. The core of the rubric was based on three prepositional variables (accuracy, level of explanation, and complexity). Inter-rater agreement and inter-rater reliability found this concept-map scoring rubric to be reliable, valid, and practical.

Portfolios

Portfolios are another assessment task format in the trend toward more authentic assessment and are described as: “a container of collected evidence with a purpose. Evidence is documentation that can be used by one person or groups of persons to infer another person's knowledge, skill, and/or disposition” (Collins, 1992, p. 453). Other descriptions of portfolios may include that they contain a sample of student work, evidence of reflection and self-evaluation as this represents the students’ understanding of the assessment criteria and where their achievements are in relation to these, the students’ incremental development in their learning of science, and a rich and broad array of evidence of learning (Anderson & Bachor, 1998).

The issues for the teacher are what multiple goals for science learning are being assessed; what counts as evidence of learning—both the progress toward the goals and the achievement of the goals. What might the portfolios be used for? Weekly or yearly assessments? Reporting to parents, the school, or next year's teacher? (Collins, 1992). Other issues include the method of scoring, which needs to be commensurate with the degree of complexity and multifaceted nature of the assessment tasks and the learning. Although a score may not be appropriate to maintain this complexity, the school of state/national assessment system may require a single score or grade. Other ways of “scoring” need to be developed, such as holistic expert judgment.

Anderson and Bachor (1998), when reviewing the Canadian research on the use of portfolios, list the issues with the use of portfolios for assessment as the consistency of portfolio contents between students; the validity of the contents with respect to learning goals; the level of agreement between judges; the stability of estimates of student achievements; the rigor of standards; the reliability of scoring criteria and rubrics used in evaluating the contents of portfolios; the costs and feasibility of portfolio use; and the extent to which students are involved in making judgments, for example, in co-constructing the criteria, selection of the samples of work, and the application criteria in the marking process, in so-called learner-centered pedagogies and curricula. They also explain the fall in usage as students shift to higher grades on the increased subject specialization, larger student loads per teacher, and an increased focus on obtaining marks and grades for reporting student achievement to those outside the classroom. They report that rubrics are being used in two ways: the teacher selects, perhaps in conjunction with his or her students, the criteria to be used in evaluating the portfolio; and the students use the criteria to help them decide what to include in their portfolios. The notions of validity and reliability in the context of the use of portfolios for assessment of student achievement have been reexamined, and the questions asked about the meaning of these terms, once used for large-scale pen-and-paper testing and measurement, are still appropriate here. Are classroom validity and reliability different from those of large-scale testing?

The reported benefits of using portfolios in the literature include: “students taking more responsibility for their own learning by assessing their own work, learning and its assessment being viewed as a developmental process that occurs over extended time periods, and the encouragement of learning activities that are consistent with current notions of how people learn and what is worth learning” (Gitomer & Duschl, 1995, p. 299); a tool for changing instructional practice in fundamental ways (Duschl & Gitomer, 1997); “students … engaging in learning activities consistent with current psychological, historical, and sociological conceptions of growth of scientific knowledge … teaching is organized to encourage conceptual change, learners are active constructors of meaning … and assessment is an invaluable tool that teachers as well as students use to make instructional decisions” (Gitomer & Duschl, 1995); learners successfully organize and integrate newly acquired scientific knowledge; feel less anxious about learning physics; devote considerable time to reading and studying outside class; internalize and personalize the content material and enjoy the learning experience, although there may be no significant difference in learner achievement (Slater, Ryan, & Samson, 1997; Slater, 1997); there is more of a match with learner-centered curricula and pedagogies (Anderson & Bachor, 1998), especially when the learner is involved in the decision making about the identification of relevant learning outcomes, samples of student work and the marking; and portfolios have been used to document from the critical science perspective, a public story of science for community change with homeless youths (Fusco & Barton, 2001).

There is, however, little empirical evidence to support the use of portfolios (Gitomer & Duschl, 1998), and reliability and validity results may be disappointing (Cizek, 1997; Shapley & Bush, 1999). The questions being asked are, Can portfolios show evidence of complex scientific thinking in several domains? Is there any consistency of high performance over all pieces of work, that is, homogeneity?

Interviews and Conversations

Another assessment format is that of interview and conversations. Interviews have been used to assess for summative purposes (Bell, 1995) to elicit student thinking as to whether they have learned the intended science learning outcomes or to elicit what they have learned, whether intended or not. The interview-about-instances and interview-about-events format, initially developed for research purposes (Osborne & Gilbert, 1980), has been adapted for use in classrooms (Bell, Osborne, & Tasker, 1985). Likewise, (Griffard & Wandersee, 2001) used a think-aloud task to diagnose alternative conceptions of photosynthesis, in conjunction with a traditional pen-and-paper test. Lederman et al. (2002) documented research findings on the use of an open-ended instrument, the views of the Nature of Science Questionnaire, which in conjunction with individual interviews aims to provide meaningful assessments of students’ nature of science views. They argued against mass assessments of large samples, aimed at describing or evaluating student beliefs using standardized forced-choice pencil-and-paper assessment instruments. Instead they argue for individual classroom interventions aimed at enhancing learners nature of science views and hence assessment for formative purposes.

Predict-and-Explain Situations

Another assessment task format is that of providing the students with a situation or phenomeona, about which they have to make a prediction, and give an explanation for what actually does happen (Lawrence & Pallrand, 2000; White & Gunstone, 1992). The prediction and explanation can provide information for assessment for summative purposes.

Learning Stories

Learning stories (Carr, 2001) have been developed for assessment for summative (as well as formative) purposes in an early childhood setting. Learning stories are described as

structured observations in everyday or “authentic” settings, designed to provide a cumulative series of qualitative “snapshots” or written vignettes of individual children displaying one or more of the target learning dispositions… . Practitioners collect “critical incidents” that highlight one or more of these dispositions and a series of learning stories over time, for a particular child, can be put together and scanned for what Carr has called “learning narratives“: what we might call in the present context, “developmental trajectories” of learning dispositions. Children's stories are kept in a portfolio; often they include photographs or photocopies of children's work and children's comments (Carr & Claxton, 2002, p. 22).

Observational Methods

Observations may be used in the classroom teaching and learning situation as a source of assessment information for formative and summative purposes. For example, (Leat & Nichols, 2000) used “mysteries” as an assessment tool for formative and diagnostic purposes with 13- and 14-year-old UK pupils.

Dynamic Assessments

Dynamic assessments (as distinct from static intelligence tests) have been used as assessment formats for summative purposes (Lidz, 1987). For example, the study of (Grigorenko & Sternberg, 1998) involved “the assessor setting ‘examinees’ a task too hard for them and observing how they respond and how they make use of standardised prompts and hints they are offered” (Carr & Claxton, 2002, p. 19). Dynamic assessments are linked theoretically to Vygotsky's notion of the Zone of Proximal Development and reportedly measure the “learning power” of the student, what the student is capable of generating through scaffolded interaction with the assessor. Dynamic assessments were also used to investigate students’ mental models of chemical equilibrium and the resulting positive influence of tutoring in cognitive apprenticeship, such as coaching, modeling, scaffolding, articulation, reflection, and exploration (Chiu, Chou, & Liu, 2002).

Experimental and Customized Challenges

Experimental and customized challenges (e.g., jig-saws, problem situations; Norris, 1992) are also another way of eliciting assessment information for summative purposes.

Self and Peer Assessments

Self and peer assessments (Claxton, 1995; Gale, Martin, & McQueen, 2002; Stefani & Tariq, 1996; Taras, 2002; Wiediger & Hutchinson, 2002; Zoller, Fastow, Lubezky, & Tsaparlis, 1999) have also been used for assessment of science learning summative purposes and are often an integral component of other assessment task formats, for example, self-reports, journals, questionnaires, interviews, portfolios.

Group Assessment

Group assessment is the assessment of a group's collective learning rather than that of an individual. The use of group work in science education is increasing as the ability to work cooperatively as part of a team (research and development teams) to achieve a common goal is highly valued by employers in the science and technology sector, and is therefore included in the goals of some science curricula. It is also being advocated by the research into learning science from a sociocultural view point (Rogoff, 1990). Lowe and Fisher (2000), as part of doctoral research, studied the effect of year 9 and year 10 New Zealand students being in small cooperative groups, in which the students performed all their assignments, tests, laboratory work and fieldwork, on their motivation and attitudes toward science. “All members of the group received the same mark for any given assessment exercise and students were encouraged to communicate and work co-operatively during these activities” (p. 131). The protocol for group organization regarding assessment was as follows: “For written tests, students were arranged in their groups at the laboratory benches to allow them to work together with a minimum of contact with other groups. Talk within the group was permissible but talk between groups was not. Answers were by consensus and one group member had the task of writing the script, which was handled in and marked. All members received the same grade” (p. 133). The study reported that the students interviewed stated that “they preferred working in groups, especially during tests where they reported they felt they were learning from their peers even as they completed tests. The students stated that they felt less nervous when doing their tests since they were doing it with their friends” (p. 141). The teachers expressed some initial concern about group work mostly in relation to assessment; they reported positively on the formative purposes of the tests in groups; and they spent significantly less time carrying out assessment, particularly marking. No correlation data between group assessments and the usual individual assessments were reported, nor were any measures of the quality of the assessments.

Computer-Based Assessment

The use of computer technology to assess for summative purposes has been documented (Fisher et al., 2000). Sewell, Stevens, and Lewis (1995) used multimedia computer technology as a tool for teaching and assessing biological science with university students. They found a high rank correlation (0.96, Spearman's correlation) between the computer-based assessment of the knowledge base gained from the teaching program and the marks obtained in the sessional written examination.

In summary, alternatives to paper-and-pen testing formats have been developed and researched in science education. However, concerns about the use of a wider range of assessment tasks have been raised and include time constraints, financial constraints, teacher and student knowledge of assessment, the difficulty in creating authentic tasks, the quality of the wider range of tasks, especially validity and reliability (Lester, Lambdim, & Preston, 1997), and the need for professional development (Gitomer & Duschl, 1998; Ruiz-Primo & Shavelson, 1996b).

A parallel aspect of this trend to use alternative assessment tasks has been the trend to move from using norm-referenced and standardized, commercially made (by people external to the classroom) tests, to criterion-referenced, construct-referenced, or ipsative-referenced teacher-made assessments. Norm-referenced assessment is where individuals are compared with the norm of a group, indicating whether they can do something better or less well than others, and not what an individual can or cannot do. Criterion-referenced assessment compares a student's learning with a well-defined objective, that is, the desired learning goals. Ipsative assessment compares a student's performance with her or his previous performance, and construct-referenced assessment is that made within the context of the school and marked by teachers, of a particular idea or construct (Wiliam, 1992). Others have also noted that criterion-referenced assessment has tended to move away from overspecification toward a more holistic approach (Gipps, 1994a; Moss, 1992; Popham, 1987, 2003), allowing for the assessment of more complex skills and processes than can detailed ones. A disadvantage is that it can result in less reliability, but this can be addressed by the use of exemplars of student work at particular levels and group moderation (Gipps et al., 1995).

INCREASING THE QUALITY OF ASSESSMENTS

The sixth of the trends in assessment in science classrooms (and in other classrooms) has been research on the development of high-quality assessment procedures and is based in the debates of the shift from a paradigm of measurement and psychometric approaches based on true score theory (Black, 2001; Cumming & Maxwell, 1999) to a “new paradigm of assessment” (Gipps, 1994a). In educational assessment, quality is just not a technical issue, as assessment involves making and acting on choices and judgments, which are underpinned by social values (Messick, 1994; Berlak et al., 1992; (Gitomer & Duschl, 1998) and discourses of power (Cherryholmes, 1988). Assessment can be seen as a social practice determined by the specific social, historical, and political contexts in which they are undertaken (Gipps, 1999). Given today's social values, for example, on equity, and given the move from psychometric testing and measurement toward educational assessment, quality is no longer thought of in terms of the initial use of the terms validity and reliability as previously. The notion of quality in educational assessment has been developed to reflect the notions of assessment for educational purposes, that is, formative assessment (Cowie & Bell, 1996), embedded assessment (Treagust et al., 2001), authentic assessment (Cumming & Maxwell, 1999), holistic assessment (Wiliam, 1994), and the use of quality assessment terms such as validity, equity, trustworthiness, fairness (Gipps, 1998; Gipps & Murphy, 1994); inference, generizablity, consequences, social values (Gitomer & Duschl, 1998); manageability, facility, discrimination (Osborne & Ratcliffe, 2002); reliability, dependability, validity, disclosure, fidelity (Wiliam, 1992), confidence (Black, 1993), and equity, trustworthiness, and appropriateness (Cowie & Bell, 1996).

Reliability is only a small aspect of the dependability of a test, and therefore traditional statistical techniques of estimating reliability (test-retest, mark-remark, and parallel forms reliability and split-half reliability) are not relevant to classroom assessment of learning. Therefore other indicators of quality are of more use to classroom assessment (Wiliam, 1992).

A key indication of quality of educational assessments is that of validity. In the 1970s and 1980s, there was much criticism of the low validity of summative assessments used by teachers in classroom-based assessment (Doran et al., 1993) and in external testing and examinations, for example, for national qualifications (Gauld, 1980; Keeves & Alagumalai, 1998). The meaning of validity expanded as alternatives to pen-and-paper testing were developed (Crooks, Kane, & Cohen, 1996). Whereas reliability is affirmed by statistical means, validity relies “heavily on human judgment and is therefore harder to carry out, report and defend” (Crooks et al., 1996, p. 266). The initial meaning of validity as “measuring what it purports to measure” in relation to traditional multiple choice and pen-and-paper tests has been expanded as the notion of validity has been developed with respect to the quality of alternative assessments, such as performance assessment (Moss, 1992). Crooks et al. (1996) indicate the breadth of current understandings of validity and threats to the validity in their account of eight different stages of the assessment “chain” and the associated threats to validity. For Crooks et al. (1996) the validity of the entire assessment procedure is constrained by the strength of the weakest of the eight links in the validity chain.

Whereas some view an independence of validity and reliability in some circumstances (Moss, 1994), others see the two notions as interdependent (Crooks et al., 1996; Gitomer & Duschl, 1998), viewing some degree of generalizability (reliability) as essential for validity. For assessment for formative purposes, the validity of the assessments is more important than the reliability (Harlen & James, 1997; Moss, 1994). There is a tension, in devising assessment procedures, between local validity and beyond-local reliability (Carr, 2001; Carr & Claxton, 2002). Cumming & Maxwell (1999) argued for more attention to be given to authentic learning goals or objectives, teaching practices, and assessment tasks as interdependencies. Then “the validity of an assessment can be evaluated in terms of the extent to which the assessment relates to the ascribed educational values, learning theories and teaching theories as well as to the realisation of the desired assessment theory” (p. 193).

Recent studies on the reliability and validity of newly developed assessments of science learning include researching multiple-choice diagnostic instruments to assess high school students’ understanding of inorganic chemistry qualitative analysis (Tan, Goh, Chia, & Treagust, 2002); student competence in conducting scientific inquiry (Zachos, Hick, Doane, & Sargent, 2000); ascertaining whether students have attained specific ideas in benchmarks and standards (Stern & Ahlgren, 2002); the sensitivity of close and proximal assessments to the changes in students’ pre- and post-test performances (Ruiz-Primo, Shavelson, Hamilton, & Klein, 2002); multiple-choice and open-ended formats to assess students’ understanding of protein synthesis (Pittman, 1999); alternative methods of answering and scoring multiple-choice tests (Taylor & Gardner, 1999); nature of science views (Lederman et al., 2002; Aiken-head, 1987; Taylor & Gardner, 1999); concept mapping (Stoddart et al., 2000; Liu & Hinchey, 1996; Ruiz-Primo & Shavelson, 1996a); time-series design in assessments (Lin & Frances, 1999); inconsistency in test grading by teachers of science (Klein, 2002); the use of distractor-driven multiple choice tests to assess children's conceptions (Sadler, 1998); the use of examinations to elicit “misconceptions” in college chemistry (Zoller, 1996); an assessment scheme for practical science in Hong Kong (Cheung, Hattie, Bucat, & Douglas, 1996); assessment tasks to assess the ideas and evidence—the processes and practices—of science, that is, how we know as well as what we know (Osborne & Ratcliffe, 2002); the use of rubrics (Osborne & Ratcliffe, 2002; Toth et al., 2002); the science achievement outcomes for different subgroups of students using different assessment formats (Lawrenz, Huffman, & Welch, 2001); and the quality of interviews as an assessment tool, in the classroom and in research (Welzel & Roth, 1998).

Researchers have argued that new forms of educational assessment (often called alternative assessments) cannot be fairly appraised unless the older definition of validity is broadened (Linn, Baker, & Dunbar, 1991). Research on the broader notion of quality of assessments of science learning includes those addressing consequences, equity, fairness, cultural validity, trustworthiness, appropriateness, manageability, fidelity, and authenticity. Each of these newer notions of quality is now discussed.

Consequences

Gitomer and Duschl (1998) argued that typically, the validity of assessments has been considered only in terms of construct validity—how well the evidence supports the interpretations made on the basis of the assessment. However, Messick (1989) raised the prominence of a second consideration of validity, the consequences of an assessment, that is, consequential validity, which is centrally important to assessment of formative purposes, given that the definition of formative assessment is based on the taking of action to improve learning (Black, 1993; Cowie & Bell, 1996; Crooks, 2001) and given that the appraisal is made in relation to its effectiveness in improving learning. In considering the concept of consequences, Cowie (2000) asserted that the consequences of formative assessment—cognitive, social, and emotional—cannot be separated out, and “therefore adequate and appropriate (valid) ways of generating, interpreting and responding to information gained from students and their learning, are those that benefit and not harm student learning, identity, feelings and relationships with others” (Cowie, 2000, p. 281).

Equity and Fairness

Equity is an important factor in considering the quality of an assessment and is associated with issues of moral and social justice (Darling-Hammond, 1994) and the equitable and inclusive practice and production of science, multiple worldviews, and science assessment across diversity (Fusco & Barton, 2001; Roth & McGinn, 1998). It implies practices and interpretation of results that are fair and just to all groups, and a definition of achievement which applies to all students, not just a subgroup (Gipps & Murphy, 1994); equal opportunity to sit and achieve within an exam (Wiliam, 1994); and providing opportunities for all students to participate in communication and particularly in the classroom interactions that are the heart of assessment for formative purposes (Cowie & Bell, 1996; Torrance, 1993; Crooks, 1988), even if different modes of communication and task formats have to be used (Kent, 1996; Lawrenz et al., 2001).

Fairness is an aspect of equity and validity. Students do not come to school with identical experiences, nor do they have identical experiences at school. Therefore, multiple opportunities for assessment might be needed to provide fairness and comparable treatment for all students in a class—students who will have differing educational experiences—to demonstrate their achievement if they are disadvantaged by any one assessment in a program (Gipps, 1998). The notion of fairness can be viewed as having three aspects: in the sense of assessing students on a fair basis, in the sense of not jeopardizing students’ chances to learn the subject matter while they were being assessed, and in the sense of not depriving students of opportunities of receiving an all-around education (Yung, 2001). Equity and fairness may be in terms of gender (Gipps, 1998; Gipps & Murphy, 1994) or ethnicity (Darling-Hammond, 1994; Gipps, 1998; Lawrenz et al., 2001; Lee, 1999, 2001). The two main messages here are that where differences in performance are ignored and not monitored, patterns of inequality will increase, and that to ensure assessments are as fair as possible, we need to address the curriculum content (the constructs) being taught and assessed, teacher attitudes toward different groups of students, and the assessment mode and item format (Gipps, 1998).

Cultural Validity

Cultural validity has been suggested as an indication of quality in science assessment (Klein et al., 1997; Lokan, Adams, & Doig, 1999; Solano-Flores & Nelson-Barber, 2001). To attain cultural validity, development of the assessments must consider how the sociocultural context in which students live influences the ways in which they make sense of science items and the ways in which they solve them. These sociocultural influences include the values, beliefs, experiences, communication patterns, teaching and learning styles, and epistemologies inherent in the students’ cultural backgrounds, as well as the socioeconomic conditions prevailing in their cultural groups. They contend that current approaches to handling student diversity in assessment (e.g., adapting or translating tests, providing assessment accommodations, estimating test cultural bias) are considered to be limited and lacking a sociocultural perspective. Solano-Flores and Nelson-Barber (2001) asserted that there are five aspects to cultural validity: “student epistemology, student language proficiency, cultural world views, cultural communication and socialization styles and student life context and values” (Solano-Flores & Nelson-Barber, 2001, p. 566).

Trustworthiness

Trustworthiness relates to whether something or someone can be trusted in the classroom setting and is based on the perceptions of both teachers and students and is an essential element of teaching, learning, and assessment, particularly formative assessment (Bell & Cowie, 1997; Cowie & Bell, 1996). Teachers must trust students to provide them with reasonably honest and representative information about their understandings and misunderstandings. Students must trust teachers to provide them with learning opportunities, to show interest in and support for their ideas and questions, to act on what they find out in good faith, and have faith and trust in the assessment practices. Trust in the relationship between a student and teacher in the practice of formative assessment also effects the disclosure by the students of what they know and can do (Cowie, 2000). Cowie stated that from a student perspective, a valid formative assessment is trustworthy, one in which students can have trust in the process as well as the person, where both support and the process do not undermine student understanding, affect, and relationships; give all students access to opportunities to participate in formative assessment; and encourage them to participate in and respond to formative feedback.

Appropriateness

To be judged appropriate by teachers and students, assessment must be beneficial and not harmful to student learning (Crooks, 1988). Hence, appropriate formative assessment, for example, is that which is first equitable and trustworthy but also supportive of learning (Black, 1995c), is indicative of what counts as learning (Crooks, 1988), is matching of the views of teaching and learning used in the classroom (Gipps, 1994a; Torrance, 1993), and addresses the importance of students’ views and the ongoing interactive nature of the practice of assessment for formative purposes (Bell & Cowie, 1997; Cowie & Bell, 1996). Validity concerns are raised when students do not give the fullest responses they are capable of (Eley & Caygill, 2001, 2002; Gauld, 1980; Kent, 1996; Shaw, 1997).

Manageability

An aspect of quality that is of great concern for teachers is that of manageability, that the assessment can be managed within the busy classroom life of teachers and students, and not take time away from teaching and learning the set curriculum (McClure et al., 1999; Stoddart et al., 2000).

Fidelity and Disclosure

Wiliam (1992) identified two issues, disclosure and fidelity, which may limit the information teachers have to notice and recognize in interactive formative assessment. The disclosure of an assessment strategy is the extent to which it produces evidence of attainment (or nonattainment) from an individual in the area being assessed (Cowie & Bell, 1996). Wiliam (1992) defined fidelity as the extent to which evidence of attainment, which has been disclosed, is observed faithfully. He claimed that fidelity is undermined if evidence of attainment is disclosed but not observed. For example, the teacher may not hear a small-group discussion in which the students demonstrate they understand a concept. Fidelity is also undermined if the evidence is observed but incorrectly interpreted. For example, if the teacher did not understand the student's thinking, it is possible that there is insufficient common-alty in the teacher and the students’ thinking.

Authenticity

Another aspect of quality is that of authentic assessment (a term used mostly in the United States), for example (Brown, 1992; Kamen, 1996). It includes a construct of the teaching, learning, and assessment that is contextualized and meaningful for students; holistic rather than extremely specific (Erickson & Meyer, 1998); representative of activities actually done in out-of-school settings (Atkin et al., 2001); interacting with/in the world in informed, reflective, critical, and agentic ways (Fusco & Barton, 2001; Rodriguez, 1998); and includes authentic learning goals and including of performance assessments of complex tasks, problem-based assessments, and competence-based assessments (Cumming & Maxwell, 1999; Darling-Hammond, 1995; Darling-Hammond & Snyder, 2000; Doran et al., 1993). An assessment is authentic (and of sufficient quality) if the form and criteria for success are explicit and public; it involves collaboration (which is not seen as cheating); it is contextualized; it represents realistic and fair practices in the discipline; it uses scoring commensurate with the complexity and multifaceted nature of the assessments; it identifies strengths; it is multipurpose; it enables the integration of knowledge and skills learned from different sources; it is dynamic, as evidence can be added or removed during its development; and it encourages metacognition and reflection along with peer and self-evaluation (Collins, 1992; Wiggins, 1989). Concerns about authentic assessment include “camouflage,” which Cumming and Maxwell (1999) describe as occurring “when a traditional form of assessment is ‘dressed up’ to appear authentic, often by the introduction of ‘real world’ elements or tokenism” (p. 188). The extra reading demands of the camouflage may not facilitate a solution and in fact may even add to the literacy demands of the task for some students. Another concern is that the assessment task will invariably be in a context, as will the teaching. If the context is familiar, it may be measuring only recall and comprehension, not higher order cognitive skills such as argumentation, that examine evidence critically. If the context is too unfamiliarand demands too high a level of literacy, the wording of the contextual information may be a distraction for students answering the question, add a reading comprehension problem, or confuse the students’ interpretation of the demands of the assessment (Osborne & Ratcliffe, 2002; Eley & Caygill, 2001).

THEORIZING ASSESSMENT

A seventh trend in research and development in assessment of science classrooms (as of other classrooms) has been to consider pedagogy, learning, assessment, and curriculum together, rather than individually in an analytical, reductionist approach. Hence, a discussion on assessment cannot be divorced from a discussion of teaching and learning, or from its curriculum and political contexts (Carr et al., 2000). If teaching, learning, assessment, and curriculum are considered in an integrated and interdependent way, one might theorize them in similar ways, and therefore assessments should create a “learning environment in which students are engaging in learning activities consistent with current psychological, philosophical historical, and sociological conceptions of the growth of scientific knowledge” (Gitomer & Duschl, 1995, p. 300). This match/mismatch between theorizing, and between practices and theorizing is discussed by Bol and Strage (1996) and Hickey and Zuiker (2003).

As theorizing about learning and teaching has developed from a behaviorist to cognitive science to sociocultural views (Bell & Gilbert, 1996; Duit & Treagust, 1998), so too has theorizing of assessment.

For example, there has been the development of views of assessment to match the constructivist views of learning (Berlak et al., 1992; Gipps, 1994a; Wiliam, 1994), and a growing number of studies theorize assessment as a sociocultural practice (Bell & Cowie, 2001b; Broadfoot, 1996; Carr, 2001; Chiu et al., 2002; Filer, 1995; Filer, 2000; Filer & Pollard, 2000; Gipps, 1999; Keys, 1995; McGinn & Roth, 1998; Roth & McGinn, 1997; Welzel & Roth, 1998; Cowie, 2000; Fusco & Barton, 2001; Hickey & Zuiker, 2003; Pryor & Torrance, 2000; Solano-Flores & Nelson-Barber, 2001; Torrance & Pryor, 1998), and with discursive, post-structuralist theorizing (Bell, 2000; Fusco & Barton, 2001; Gipps, 1999; Sarf, 1998; Torrance, 2000). As Berlak (2000) stated, “there is an overwhelming body of research conducted over the last two decades documenting that, beyond a shadow of a doubt, that the school context, the particularities of its history, the immediate and wider socio-economic context, the language, the race and social class of the students and their families, and the culture of the school itself have an enormous bearing students’ interest in and performance on all school tasks, including taking standardised test, and examinations” (p. 193). To view assessment as a sociocultural practice is to view it as value laden, socially constructed, and historically, socially, and politically situated. That is, one can never do assessment separate from one's own history (individual or social) or outside of its contexts. As Gipps (1999) said, “to see assessment as a scientific, objective activity is mistaken; assessment is not an exact science” (p. 370). Assessment may be viewed as a purposeful, intentional, responsive activity involving meaning making and giving feedback to students and teachers, to improve learning; an integral part of teaching and learning; a situated and contextualized activity; a partnership between teacher and students; and involving the use of language to communicate meaning (Bell, 2000; Bell & Cowie, 2001b).

Theorizing assessment as a sociocultural practice raises several issues for researchers. One is the issue of whose theorizing and the purpose of the theorizing (Bell, 2000; Bell & Cowie, 2001b, 2001c; Torrance & Pryor, 1998, 2001). In both of these major research studies, the teachers and university researchers involved have been encouraged to theorize their own assessment practices and to develop classroom practice models, using their own shared vocabulary. This theorizing was identified by the teachers as an important aspect of their teacher development practices. Another is the unit of analysis, which is “the event, rather than the individual is the primary unit of analysis for evaluating learning environments from a socio-cultural perspective… . The key issue in studying innovative curricula is the knowledge practices in which learners collectively participate” (Hickey & Zuiker, 2003, p. 548). This is evident in the use of cameos (Bell & Cowie, 1997, 2001b) and “incidents” (Torrance & Pryor, 1998) in research on formative assessment.

If assessment is theorized in terms of a sociocultural view of mind, the implications (Gipps, 1999) include that assessment can only be fully understood if the social, cultural, and political contexts in the classroom are taken into account; the practices of assessment reflect the values, culture of the classroom, and, in particular, those of the teacher; assessment is a social practice, constructed within social and cultural norms of the classroom; what is assessed is what is socially and culturally valued; the cultural and social knowledge of the teacher and students will mediate their responses to assessment; assessments are value-laden and socially constructed; a distinction needs to be made between what a student can typically do (without mediational tools) and best performance (with the use of mediational tools); assessments need to give feedback to students on the assessment process itself to enable them to do self and peer formative assessment; and teachers and students need to negotiate the process of assessment to be used, the criteria for achievement, and what counts as acceptable knowledge.

FURTHER RESEARCH ON CLASSROOM ASSESSMENT OF SCIENCE LEARNING

Despite the wealth of research reviewed in this chapter, there are many opportunities for further research, including professional development, pre- and inservice, higher education for teachers of science on classroom assessment of science learning (Bell & Cowie, 2001c; Campbell & Evans, 2000; Higgins, Hartley, & Skelton, 2002; Yorke, 2003); online teaching and assessment, especially that using web-based sites (Buchanan, 2000; Peat & Franklin, 2002), where feedback and feedforward are given to students without face-to-face contact; group assessment (Black, 2001); students’ views of assessment practices, as well as teaching practices; and a critique of the research paradigms and methods used to date, including action research, case studies, cameos, incidents, classroom observations, interviews, pre- and post-testing; the progression in students’ learning and how this research findings might be used in assessment for formative and summative purposes (Osborne & Ratcliffe, 2002); the quality of assessment for summative purposes, given the trend from normative to criterion, construct and ipsative referenced assessment; and the interaction between classroom assessments for formative and summative purposes.

Further research on the quality of assessments will continue as new assessment task formats are developed, particularly on the effect of context on performance and construct validity (Gipps, 1998).

ACKNOWLEDGMENTS

Thanks to Audrey Champagne and John Pryor, who reviewed this chapter.

REFERENCES

Aikenhead, G. (1987). Views on science-technology-society (question book and Canadian standard responses). Saskatchewan: Department of Curriculum Studies, University of Saskatchewan.

Allal, L. (2002). The assessment of learning dispositions in the classroom. Assessment in Education, 9(1), 55.

Allal, L., & Ducrey, G. P. (2000). Assessment of—or in—the zone of proximal development. Learning and Instruction, 10(2), 137–152.

Anderson, J. O., & Bachor, D. (1998). A Canadian perspective on portfolio use in student assessment. Assessment in Education, 5(3), 327, 353.

Apple, M. (1982). Education and power. Boston: Routledge & Kegan Paul.

Apple, M. (1996). Cultural politics and power. Buckingham, UK: Open University Press.

Ash, D., & Levitt, K. (2003). Working within the zone of proximal development: Formative assessment as professional development. Journal of Science Teacher Education, 14(1), 23.

Assessment Reform Group. (1999). Assessment for learning: beyond the black box. Cambridge, UK: University of Cambridge.

Atkin, J. M., Black, P., Coffey, J., & National Research Council (U.S.). Committee on Classroom Assessment and the National Science Education Standards. (2001). Classroom assessment and the National Science Education Standards. Washington, DC: Center for Education National Research Council, National Academy Press.

Atkin, M. (2002). How science teachers lose power. Studies in Science Education, 37, 163–171.

Baird, J., & Northfield, J. (Eds.). (1992). Learning from the PEEL experience. Melbourne, Australia: Monash University.

Barenholz, H., & Tamir, P. (1992). A comprehensive use of concept mapping in design, instruction and assessment. Research in Science and Technology Education, 10, 37–52.

Barker, M., & Carr, M. (1989). Teaching and learning about photosynthesis. Part 1: an assessment in terms of students prior knowledge. International Journal of Science Education, 11(1), 49–56.

Bednarski, M. (2003). Assessing performance tasks. The Science Teacher, 70(4), 34.

Bell, B. (1995). Interviewing: a technique for assessing science knowledge. In R. Duit (Ed.), Learning science in the schools: Research reforming practice. Mahwah, NJ: Lawrence Erlbaum Associates.

Bell, B. (2000). Formative assessment and science education: modelling and theorising. In R. Miller, J. Leach, J. Osborne (Eds.), Improving science education: the contribution of research. Buckingham, UK: Open University Press.

Bell, B., & Cowie, B. (1997). Formative assessment and science education: Research report of the Learning in Science Project (Assessment). Hamilton, New Zealand: Centre for Science Mathematics Technology Education Research, University of Waikato.

Bell, B., & Cowie, B. (1999). Researching teachers doing formative assessment. In J. Loughran (Ed.), Researching teaching. London: Falmer Press.

Bell, B., & Cowie, B. (2001a). The characteristics of formative assessment in science education. Science Education, 85(5), 536–553.

Bell, B., & Cowie, B. (2001b). Formative assessment and science education. Dordrecht and Boston: Kluwer Academic.

Bell, B., & Cowie, B. (2001c). Teacher development and formative assessment. Waikato Journal of Education, 7, 37–50.

Bell, B., & Gilbert, J. (1996). Views of learning to underpin teacher development. In Teacher development: A model from science education (pp. 38–69). London: Falmer Press.

Bell, B., Jones, A., & Carr, M. (1995). The development of the recent national New Zealand Science Curriculum. Studies in Science Education, 26.

Bell, B. F., Osborne, R., & Tasker, R. (1985). Finding out what children think. In P. Freyberg (Ed.), Learning in science: the implications of children's science. Auckland, New Zealand: Heinemann.

Berlak, H. (2000). Cultural politics, the science of assessment and democratic renewal of public education. In A. Filer (Ed.), Assessment: social practice and social product. London: RoutledgeFalmer.

Berlak, H., Newmann, E., Adams, E., Archbald, D., Burgess, T., Raven, J., et al. (Eds.). (1992). Toward a New Science of Educational Testing and Assessment. Albany: State University of New York Press.

Biddulph, F. (1989). Children's questions; their place in primary science education. Unpublished D.Phil. thesis, University of Waikato, Hamilton.

Biggs, J. (1998). Assessment and classroom learning: A role for summative assessment? Assessment in Education, 5(5), 103–110.

Black, P. (1993). Formative and summative assessment by teachers. Studies in Science Education, 21, 49–97.

Black, P. (1995a). 1987–1995—The struggle to formulate a national science curriculum for science in England and Wales. Studies in Science Education, 26, 159–188.

Black, P. (1995b). Assessment and feedback in science education. Studies in Educational Evaluation, 21(3), 257.

Black, P. (1995c). Can teachers use assessment to improve learning? British Journal of Curriculum and Assessment, 5(2), 7–11.

Black, P. (1995d). Curriculum and assessment in science education: The policy interface. International Journal of Science Education, 17, 453.

Black, P. (1998a). Assessment by teachers and the improvement of students’ learning. In K. Tobin (Ed.), International handbook of science education (pp. 811–822). London: Kluwer Academic.

Black, P. (1998b). Learning, league tables and national assessment: Opportunity lost or hope deferred? Oxford Review of Education, 24(1), 57–68.

Black, P. (2000). Research and the development of educational assessment. Oxford Review of Education, 26(3–4), 407–419.

Black, P. (2001). Dreams, strategies and systems: portraits of assessment past, present and future. Assessment in Education, 8(1), 65–85.

Black, P. (2002). Report to the Qualifications Development Group, Ministry of Education, New Zealand, on the proposals for development of the National Certificate of Educational Achievement. London: King's College London.

Black, P., & Harrison, C. (2001a). Feedback in questioning and marking: the science teacher's role in formative assessment. School Science Review, 82(301), 55–61.

Black, P., & Harrison, C. (2001b). Self- and peer-assessment and taking responsibility. School Science Review, 83(302).

Black, P., & Wiliam, D. (1998a). Assessment and classroom learning. Assessment in Education, 5(1), 7–74.

Black, P., & Wiliam, D. (1998b). Inside the black box—Raising standards through classroom assessment. Phi Delta Kappan, 80(2), 139.

Bol, L., & Strage, A. (1996). The contradictions between teachers’ instructional goals and their assessment practices in high school biology courses. Science Education, 80(2), 145–163.

Broadfoot, P. (1996). Education, assessment and society. Buckingham, UK: Open University Press.

Broadfoot, P. (2002). Editorial. Dynamic versus arbitrary standards: recognising the human factor in assessment. Assessment in Education, 9(2), 157–159.

Brookhart, S. M. (2001). Successful students’ formative and summative uses of assessment information. Assessment in Education, 8(2), 153–169.

Brown, R. (Ed.). (1992). Authentic assessment: A collection. Melbourne: Hawker Brownlow Education.

Buchan, A., & Welford, G. (1994). Policy into practice: the effects of practical assessment on the teaching of science. Research in Science & Technological Education, 12(1), 21, 29.

Buchanan, T. (2000). The efficacy of a World-Wide Web mediated formative assessment. Journal of Computer Assisted Learning, 16(3), 193–200.

Butler, J. (1995). Teachers judging standards in senior science subjects: Fifteen years of the Queensland Experiment. Studies in Science Education, 26, 135–157.

Campbell, C., & Evans, J. A. (2000). Investigation of preservice teachers’ classroom assessment practices during student teaching. Journal of Educational Research, 93(6), 350–355.

Carr, M. (2001). Assessment in early childhood: Learning stories in learning places. London: Paul Chapman.

Carr, M., & Claxton, G. (2002). Tracking the development of learning dispositions. Assessment in Education, 9(9), 9–37.

Carr, M., McGee, C., Jones, A., McKinley, E., Bell, B., Barr, H., et al. (2000). Strategic research: Initiative literature review: The effects of curricula and assessment on pedagogical approaches and on educational outcomes. Wellington, New Zealand: Ministry of Education.

Cheng, M. H., & Cheung, F. W. M. (2001). Science and biology assessment in relation to the recently proposed education reform in Hong Kong. Journal of Biological Education, 35(4), 170.

Cherryholmes, C. (1988). Construct validity and discourses of research. American Journal of Education, 96, 421–457.

Cheung, D., Hattie, J., Bucat, R., & Douglas, G. (1996). Measuring the degree of implementation of school-based assessment schemes for practical science. Research in Science Education, 26(4), 375–389.

Childers, P. B., & Lowry, M. (1997). Engaging students through formative assessment in science. The Clearing House, 71(2), 97.

Chiu, M.-H., Chou, C.-C., & Liu, C.-J. (2002). Dynamic processes of conceptual change: Analysis of constructing mental models of chemical equilibrium. Journal of Research in Science Teaching, 39(8), 688.

Cizek, G. J. (1997). Learning, achievement and assessment: constructs at the crossroads. In G. D. Phye (Ed.), Handbook of classroom assessment: learning, adjustment and achievement. San Diego: Academic Press.

Clarke, S. (1998). Targeting assessment in the primary classroom. London: Hodder & Stoughton.

Clarke, S., Timperley, H., & Hattie, J. (2003). Unlocking formative assessment: practical strategies for enhancing students’ lerning in the primary and intermediate classroom. New Zealand version. Auckland: Hodder Moa Beckett.

Claxton, G. (1995). What kind of learning does self-assessment drive? Assessment in Education, 2(3), 335, 339.

Codd, J., McAlpine, D., & Poskitt, J. (1995). Assessment policies in New Zealand: Educational reform or political agenda. In B. Tuck (Ed.), Setting the standards. Palmerston North, New Zealand: Dunmore Press.

Collins, A. (1992). Portfolios for science education: Issues in purpose, structure, and authenticity. Science Education, 76(4), 451.

Collins, A. (1995). National Science Education Standards in the United States: A process and a product. Studies in Science Education, 26, 7–37.

Collins, A. (1998). National science education standards: A political document. Journal of Research in Science Teaching, 35(7), 711.

Cowie, B. (1997). Formative assessment and science classrooms. In B. a. B. Bell (Ed.), Developing the science curriculum in Aotearoa New Zealand. Auckland: Addison Wesley Longman.

Cowie, B. (2000). Formative assessment in science classrooms. Unpublished Ph.D. thesis, University of Waikato, Hamilton, New Zealand.

Cowie, B., & Bell, B. (1996). Validity and formative assessment in the classroom. Paper presented at the International Symposium on Validity in Educational Assessment, University of Otago, Dunedin, New Zealand, June 28–30, 1996.

Cowie, B., & Bell, B. (1999). A model of formative assessment in science education. Assessment in Education, 6(1), 102–116.

Crooks, T. (1988). The impact of classroom evaluation practices on students. Review of Educational Research, 58(4), 438–481.

Crooks, T. (2001). The validity of formative assessments. Paper presented at the paper presented to the Annual Meeting of the British Educational Research Association, Leeds, UK, September 13–15, 2001.

Crooks, T. (2002a). Assessment, accountability and achievement—Principles, possibilities and pitfalls. Paper presented at the annual conference of the New Zealand Association for Research in Education, Palmerston North, New Zealand, December 5–8, 2002.

Crooks, T. (2002b). Educational assessment in New Zealand schools. Assessment in Education, 9(2), 217, 237.

Crooks, T., & Flockton, L. (1996). National Education Monitoring Report 1: Science assessment results 1995. Dunedin, New Zealand: Educational Assessment Research Unit, University of Otago.

Crooks, T., Kane, M., & Cohen, A. (1996). Threats to the valid use of assessments. Assessment in Education, 3(3), 265–285.

Cumming, J., & Maxwell, G. (1999). Contextualising authentic assessment. Assessment in Education, 6(2), 177–194.

Darling-Hammond, L. (1994). Performance-based assessment and educational equity. Harvard Educational Review, 64(1), 5–30.

Darling-Hammond, L. (1995). Authentic assessment in action. New York: Teachers’ College Press.

Darling-Hammond, L., & Snyder, J. (2000). Authentic assessment of teaching in context. Teaching and Teacher Education, 16(5–6), 523–545.

Daws, N., & Singh, B. (1996). Formative assessment: to what extent is its potential to enhance student learning being relaised? School Science Review, 77(281), 93–100.

Deese, W. C., Ramsey, L. L., Walczyk, J., & Eddy, D. (2000). Using demonstration assessments to improve learning. Journal of Chemical Education, 77(11), 1511.

Delpit, L. (1995). Other people's children: Cultural conflict in the classroom. New York: The New Press.

DeTure, L., Fraser, B. J., Giddings, J., & Doran, R. L. (1995). Assessment and investigation of science laboratory skills among year 5 students. Research in Science Education, 25(3), 253–266.

Donnelly, J. F., & Jenkins, E. W. (2001). Science education. Policy, professionalism and change. London: Paul Chapman.

Doran, R. L., Lawrenz, F., & Helgeson, S. (1993). Research on assessment in science. In D. Gabel (Ed.), Handbook of research in science teaching and learning (pp. 388–442). New York: Macmillan.

Dori, Y. (2003). From nationwide standardised testing to school-based alternative embedded assessment in Israel: Students’ performance in the Matriculation 2000 Project. Journal of Research in Science Teaching, 40(1), 34–52.

Driver, R., Squires, A., Rushworth, P., & Wood-Robinson, V. (1994). Making sense of secondary science: Research into children's ideas. London: Routledge.

Duit, R., & Treagust, D. F. (1998). Learning in science—From behaviorism towards social constructivism and beyond. In K. Tobin (Ed.), International handbook of science education. Dordrecht, the Netherlands: Kluwer Academic.

Duschl, R., & Gitomer, D. H. (1997). Strategies and challenges to changing the focus of assessment and instruction in science classrooms. Educational Assessment, 4, 37–73.

Edmondson, K. (1999). Assessing science understanding through concept maps. In J. Novak (Ed.), Assessing science understanding: a human constructivist view. San Diego: Academic Press.

Eisner, E. (1985). The educational imagination: On the design and evaluation of school programs. New York: Macmillan.

Eisner, E. (1993). Reshaping assessment in education: some criteria in search of practice. Journal of Curriculum Studies, 25, 219–233.

Eley, L., & Caygill, R. (2001). Making the most of testing: Examination of different assessment formats. SET: Research Information for Teachers, 2, 20–23.

Eley, L., & Caygill, R. (2002). One test fits all? An examination of differing assessment task formats. New Zealand Journal of Educational Studies, 37(1), 27–38.

Enger, S. K., & Yager, R. (1998). The Iowa assessment handbook. Iowa City: Science Education Center, University of Iowa.

Erickson, G., & Meyer, K. (1998). Performance assessment tasks in science: What are they measuring? In K. Tobin (Ed.), International handbook of science education. London: Kluwer Academic.

Fairbrother, B. (1993). Problems in the assessment of scientific skills. In D. West (Ed.), Teaching, learning and assessment in science education. London: Paul Chapman.

Fairbrother, B., Black, P., & Gill, P. (Eds.). (1995). Teachers assessing pupils. London: Association of Science Education.

Feltham, N. F., & Downs, C. T. (2002). Three forms of assessment of prior knowledge, and improved performance following an enrichment programme, of English second language biology students within the context of a marine theme. International Journal of Science Education, 24(2), 157–184.

Filer, A. (1995). Teacher Assessment: social process and social product. Assessment in Education, 2(1).

Filer, A. (Ed.). (2000). Assessment: social practice and social product. London: RoutledgeFalmer.

Filer, A., & Pollard, A. (2000). The social world of pupil assessment: Processes and contexts of primary schooling. London: Continuum.

Fisher, K., Wandersee, J. H., & Moody, D. (2000). Mapping biology knowledge. Dordrecht, the Netherlands: Kluwer Academic.

Francisco, J. S., Nakhleh, M. B., Nurrenbern, S. C., & Miller, M. L. (2002). Assessing student understanding of general chemistry with concept mapping. Journal of Chemical Education, 79(2), 248.

Frederiksen, J., & White, B. (1997). Reflective assessment of students’ research within an inquiry-based middle school science curriculum. Paper presented at the annual meeting of the AERA, Chicago.

Fusco, D., & Barton, A. C. (2001). Representing student achievements in science. Journal of Research in Science Teaching, 38(3), 337–354.

Gale, K., Martin, K., & McQueen, G. (2002). Triadic assessment. Assessment and Evaluation in Higher Education, 27, 557–567.

Gauld, C. (1980). Subject orientated test construction. Research in Science Education, 10, 77–82.

Gilmore, A. (2002). Large-scale assessment and teachers’ assessment capacity: Learning opportunities for teachers in the National Monitoring Project in New Zealand. Assessment in Education, 9(3), 319.

Gilmore, A., & Hattie, J. (2001). Understanding usage of an Internet based information resource for teachers: The assessment resource banks. New Zealand Journal of Educational Studies, 36(2), 237–257.

Gipps, C. (1994a). Beyond testing: Towards a theory of educational assessment. London: The Falmer Press.

Gipps, C. (1994b). Developments in educational assessment or what makes a good test? Assessment in Education, 1(3).

Gipps, C. (1998). Equity in education and assessment. Paper presented at the annual conference of the New Zealand Association for Research in Education, Dunedin, December 1998.

Gipps, C. (1999). Socio-cultural aspects of assessment. Review of Research in Education, 24, 355–392.

Gipps, C., Brown, M., Mccallum, B., & McAlister, S. (1995). Intuition or evidence? Buckingham, UK: Open University Press.

Gipps, C., & James, M. (1998). Broadening the basis of assessment to prevent the narrowing of learning. The Curriculum Journal, 9(3), 285–297.

Gipps, C., & Murphy, P. (1994). A fair test? Buckingham, UK: Open University Press.

Gipps, C., & Tunstall, P. (1996a). “How does your teacher help you to make your work better?” Children's understanding of formative assessment. Curriculum Journal, 7(2), 185–203.

Gipps, C., & Tunstall, P. (1996b). Teacher feedback to young children in formative assessment: A typology. British Educational Research Journal, 22(4), 389–404.

Gitomer, D. H., & Duschl, R. (1995). Moving towards a portfolio culture in science education. In R. Duit (Ed.), Learning science in schools: Research reforming practice. Hillsdale, NJ: Lawrence Erlbaum Associates.

Gitomer, D. H., & Duschl, R. (1998). Emerging issues and practices in science assessment. In K. Tobin (Ed.), International handbook of science education. London: Kluwer Academic.

Glover, P., & Thomas, R. (1999). Coming to grips with continuous assessment. Assessment in Education, 6(1), 111, 117.

Gott, R., Welford, G., & Foulds, K. (1998). The assessment of practical work in science. Oxford: Blackwell.

Griffard, P. B., & Wandersee, J. H. (2001). The two-tier instrument on photosynthesis: What does it diagnose? International Journal of Science Education, 23(10), 1039–1052.

Grigorenko, E. L., & Sternberg, R. J. (1998). Dynamic testing. Psychological Bulletin, 124(1), 75–111.

Han, J.-J. (1995). The quest for national standards in science education in Korea. Studies in Science Education, 26, 59–71.

Harlen, W. (1995a). Standards and science education in Scottish schools. Studies in Science Education, 26, 107–134.

Harlen, W. (1995b). To the rescue of formative assessment. Primary Science Review, 37, 14–16.

Harlen, W. (1998). Classroom assessment: A dimension of purposes and procedures. Paper presented at the annual conference of the New Zealand Association of Educational Research, Dunedin, December 1998.

Harlen, W. (1999). Purposes and procedures for assessing science process skills. Assessment in Education, 6(1), 129.

Harlen, W., & James, M. (1997). Assessment and learning. Assessment in Education, 4(3), 365.

Hattie, J. (1999). Influences on student learning. Paper presented at the Inaugural Professorial lecture, University of Auckland. Retrieved 1 August 2003 from http://www.arts.auckland.ac.nz/edu/staff/jhattie/Inaugural.html.

Hattie, J., Biggs, J., & Purdie, N. (1996). Effects of learning skills interventions on student learning: A meta-analysis. Review of Educational Research, 66, 99–136.

Hattie, J., & Jaeger, R. (1998). Assessment and classroom learning: a deductive approach. Assessment in Education, 5(5), 111.

Heady, J. E. (2001). Gauging students’ learning in the classroom. Journal of College Science Teaching, 31(3), 157.

Hickey, D. T., & Zuiker, S. J. (2003). A new perspective for evaluating innovative science programs. Science Education, 87(4), 539–563.

Higgins, R., Hartley, P., & Skelton, A. (2002). The conscientious consumer: Reconsidering the role of assessment feedback in student learning. Studies in Higher Education, 27(1), 53–64.

Hill, M. (1999). Assessment in self-managing schools: Primary teachers balancing learning and accountability demands in the 1990s. New Zealand Journal of Educational Studies, 34(1), 176–185.

Hill, M. (2001). Dot, slash, cross: How assessment can drive teachers to ticking instead of teaching. SET: Research information for teachers, 1, 21–25.

Hunt, E., & Pellegrino, J. (2002). Issues, examples, and challenges of formative assessment. New Directions for Teaching & Learning, 89, 73.

Jaworski, B. (1994). Investigating mathematics teaching: A constructivist enquiry. London: The Falmer Press.

Johnston, P., Guice, S., Baker, K., Malone, J., & Michelson, N. (1995). Assessment of teaching and learning in “literature-based” classrooms. Teaching and Teacher Education, 11(4), 359–371.

Jones, A., Cowie, B., & Moreland, J. (2003). Enhancing formative interactions in science and technology: a synthesis of teacher student perspectives. Paper presented at the NARST Annual Conference, Philadelphia, March 23–26, 2003.

Kamen, M. (1996). A teacher's implementation of authentic assessment in an elementary science classroom. Journal of Research in Science Teaching, 33(8), 859–877.

Keeves, J., & Alagumalai, S. (1998). Advances in measurement in science education. In K. Tobin (Ed.), International handbook of science education. London: Kluwer Academic.

Keiler, L., & Woolnough, B. (2002). Practical work in school science: The dominance of assessment. School Science Review, 83(304), 83–88.

Kennedy, D., & Bennett, J. (2001). Practical work at the upper high school level: The evaluation of a new model of assessment. International Journal of Science Education, 23(1), 97–110.

Kent, L. (1996). How shall we know them? Comparison of Maori student responses for written and oral assessment tasks. Unpublished M.Ed. thesis, University of Waikato, Hamilton, New Zealand.

Keys, C. W. (1995). An interpretive study of students’ use of scientific reasoning during a collaborative report writing intervention in ninth grade general science. Science Education, 79(4), 415.

Kinchin, I. M. (2001). If concept mapping is so helpful to learning biology, why aren't we all doing it? International Journal of Science Education, 23(12), 1257–1269.

Klein, J. (2002). The failure of a decision support system: Inconsistency in test grading by teachers. Teaching and Teacher Education, 18(8), 1023.

Klein, S., Jovanovic, J., Stecher, B., McCaffrey, D., Shavelson, R., Haertel, E., et al. (1997). Gender and racial/ethnic differences on performance assessments in science. Educational Evaluation and Policy Analysis, 19, 83–97.

Lawrence, M., & Pallrand, G. (2000). A case study of the effectiveness of teacher experience in the use of explanation-based assessment in high school physics. School Science and Mathematics, 100(1), 36.

Lawrenz, F., Huffman, D., & Welch, W. (2001). The science achievement of various subgroups on alternative assessment formats. Science Education, 85(3), 279–290.

Leat, D., & Nichols, A. (2000). Brains on the table: Diagnostic and formative assessment through obsevation. Assessment in Education, 7(1), 103.

Lederman, N. G., Abd-El-Khalick, F., Bell, R. L., & Schwartz, R. S. (2002). Views of nature of science questionnaire: Toward valid and meaningful assessment of learners’ conceptions of nature of science. Journal of Research in Science Teaching, 39(6), 497–521.

Lee, O. (1999). Equity implications based on the conceptions of science achievement in major reform documents. Review of Educational Research, 69(1), 83.

Lee, O. (2001). Culture and language in science education: what do we know and what do we need to know? Journal of Research in Science Teaching, 38(5), 499.

Lester, F., Lambdim, D., & Preston, R. (1997). A new vision of the nature and purposes of assessment in the mathematics classroom. In G. D. Phye (Ed.), Handbook of classroom assessment: learning, adjustment and achievement. San Diego: Academic Press.

Lidz, C. S. (1987). Dynamic assessment: An interactional approach to evaluating learning potential. New York: Guilford Press.

Lin, H. S., & Frances, L. (1999). Using time-series design in the assessment of teaching effectiveness. Science Education, 83(4), 409.

Linn, R. L., Baker, E., & Dunbar, S. (1991). Complex, peformance-based assessment: expectations and validity criteria. Education Researcher, 20, 15–21.

Liu, X., & Hinchey, M. (1996). The internal consistency of a concept mapping scoring scheme and its effect on prediction validity. International Journal of Science Education, 18(8), 921–937.

Lokan, J., Adams, R., & Doig, B. (1999). Broadening assessment, improving fairness? Some examples from school science. Assessment in Education, 6(1), 83.

Lorsbach, A., Tobin, K., Briscoe, C., & LaMaster, S. (1992). An interpretation of assessment methods in middle school science. International Journal of Science Education, 14, 305–317.

Lowe, P., & Fisher, D. L. (2000). Peer power: The effect of group work and assessment on student attitudes in science. SAMEpapers 2000, 129–147.

Lubben, F., & Ramsden, J. B. (1998). Assessing pre-university students through extended individual investigations: teachers’ and examiners’ views. International Journal of Science Education, 20(7), 833–848.

Marston, C., & Croft, C. (1999). What do students know in science? Analysis of data from the assessment resource banks. SET: Research Information for Teachers, 12(2), 1–4.

Mavrommatis, Y. (1997). Understanding assessment in the classroom: Phases of the assessment process—the assessment episode. Assessment in Education, 4(3), 381.

Mawhinney, H. B. (1998). Patterns of social control in assessment practices in Canadian frameworks for accountability in education. Educational Policy, 12(1–2), 98–109.

McClure, J. R., Sonak, B., & Suen, H. K. (1999). Concept map assessment of classroom learning: Reliability, validity, and logistical practicality. Journal of Research in Science Teaching, 36(4), 475–492.

McGinn, M. K., & Roth, W. M. (1998). Assessing students’ understanding about levers: Better test instruments are not enough. International Journal of Science Education, 20(7), 813–832.

Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement in education. Washington, DC: American Council on Education and National Council on Measurement in Education.

Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessment. Educational Researcher, 23(2), 13–23.

Ministry of Education. (1993). The New Zealand curriculum framework. Wellington: Learning Media.

Ministry of Education. (2003). Assessment exemplars. Wellington, NZ: Ministry of Education. Retrieved from http://www.tki.org.nz/e/community/ncea/

Mintzes, J., Wandersee, J. H., & Novak, J. (Eds.). (1999). Assessing science understanding: A human constructivist view. San Diego: Academic Press.

Mintzes, J., Wandersee, J. H., & Novak, J. (2001). Assessing understanding in biology. Journal of Biological Education, 35(3), 118–124.

Moreland, J., & Jones, A. (2000). Emerging assessment practices in an emergent curriculum: Implications for technology. International Journal of Technology and Design Education, 10(3), 283–305.

Moreland, J., Jones, J., & Northover, A. (2001). Enhancing teachers’ technological knowledge and assessment practices to enhance stuent learning in technology: A two year classroom study. Research in Science Education, 31(1), 155–176.

Morgan, C., & Morris, G. (1999). Good teaching and learning: Pupils and teachers speak. Buckingham, UK: Open University Press.

Moss, P. A. (1992). Shifting conceptions of validity in educational measurement: Implications of performance assessment. Review of Educational Research, 62(3), 229–258.

Moss, P. A. (1994). Can there be validity without reliability? Educational Researcher, 23(2), 5–12.

National Research Council. (1999). The assessment of science meets the science of assessment. Washington, DC: National Academy Press.

Natriello, G. (1987). The impact of evaluation processes on students. Educational Psychologist, 22, 155–175.

Newmann, D., Griffin, P., & Cole, M. (1989). The construction zone: Working for cognitive change in school. Cambridge, UK: Cambridge University Press.

Nitko, A. (1995). Curriculum-based continuous assessment: a framework for concepts, procedures and policy. Assessment in Education, 2(3), 321.

Norris, S. P. (1992). Testing for the disposition to think critically. Informal Logic, 2/3, 157–164.

Orphwood, G. (1995). Juggling educational needs and political realities in Canada: National standards, provincial control and teachers’ professionalism. Studies in Science Education, 26, 39–57.

Orpwood, G. (2001). The role of assessment in science curriculum reform. Assessment in Education, 8(2), 135.

Osborne, J., & Ratcliffe, M. (2002). Developing effective methods of assessing ideas and evidence. School Science Review, 83(305), 113–123.

Osborne, R., & Gilbert, J. (1980). A method for the investigation of concept understanding in science. European Journal of Science Education, 2(3), 311–321.

Parker, J., & Rennie, L. (1998). Equitable assessment issues. In K. Tobin (Ed.), International handbook of science education (Vol. 2, pp. 897–910). London: Kluwer Academic.

Peat, M., & Franklin, S. (2002). Supporting student learning: the use of computer-based formative assessment modules. British Journal of Educational Technology, 33(5), 515–523.

Perrenoud, P. (1998). From formative evaluation to a controlled regulation of learning processes. Towards a wider conceptual field. Assessment in Education, 5(1), 85–102.

Phye, G. D. E. (1997). Handbook of classroom assessment: Learning, adjustment and achievement. San Diego: Academic Press.

Pittman, K. M. (1999). Student-generated analogies: Another way of knowing? Journal of Research in Science Teaching, 36(1), 1–22.

Pollard, A., Triggs, P., Broadfoot, P., McNess, E., & Osborn, M. (2000). What pupils say: Changing policy and practice in primary education. London: Continuum.

Popham, W. J. (1987). Two decades of educational objectives. International Journal of Educational Research, 11(1).

Popham, W. J. (2003). Trouble with testing. The American School Board Journal, 190(2), 14.

Preece, P. F. W., & Skinner, N. C. (1999). The national assessment in science at Key Stage 3 in England and Wales and its impact on teaching and learning. Assessment in Education, 6(1), 11.

Pryor, J., & Torrance, H. (2000). Questioning the three bears: The social construction of classroom assessment. In A. Filer (Ed.), Assessment: social practice and social product. London: RoutledgeFalmer.

Ramaprasad, A. (1983). On the definition of feedback. Behavioural Science, 28(1), 4–13.

Reay, D., & Wiliam, D. (1999). “I'll be a nothing“: Structure, agency and the construction of identity through assessment. British Educational Research Journal, 25(3), 343–354.

Rice, D. C., Ryan, J., & Samson, S. (1998). Using concept maps to assess student learning in the science classroom: must different methods compete? Journal of Research in Science Teaching, 35(10), 1103–1127.

Rodriguez, A. J. (1998). Strategies for counterresistence: Toward sociotransformative constructivism and learning to teach science for diversity and for understanding. Journal of Research in Science Teaching, 35, 589–622.

Rogoff, B. (1990). Apprenticeship in thinking. New York: Cambridge University Press.

Rop, C. J. (2002). The meaning of student inquiry questions: A teacher's beliefs and responses. International Journal of Science Education, 24(7), 717.

Roth, W.-M., & McGinn, M. K. (1997). Graphing: Cognitive ability or practice? Science Education, 81(1), 91.

Roth, W. M., & McGinn, M. K. (1998). UnDELETE science education: Lives/work/voices. Journal of Research in Science Teaching, 35, 399–421.

Roth, W. M., & Roychoudhury, A. (1993). The concept map as a tool for the collaborative construction of knowledge: A microanalysis of high school physics students. Journal of Research in Science Teaching, 30(5), 503–534.

Ruiz-Primo, M. A., & Shavelson, R. J. (1996a). Problems and issues in the use of concept maps in science assessment. Journal of Research in Science Teaching, 33(6), 569–600.

Ruiz-Primo, M. A., & Shavelson, R. J. (1996b). Rhetoric and reality in science performance assessments: An update. Journal of Research in Science Teaching, 33(10), 1045–1063.

Ruiz-Primo, M. A., Shavelson, R. J., Hamilton, L., & Klein, S. (2002). On the evaluation of systemic science education reform: Searching for instructional sensitivity. Journal of Research in Science Teaching, 39(5), 369–393.

Rye, J. A., & Rubba, P. A. (2002). Scoring concept maps: An expert map-based scheme weighted for relationships. School Science and Mathematics, 102(1), 33.

Sadler, P. (1998). Psychometric models of student conceptions in science: Reconciling qualitative studies and distractor-driven assessment instruments. Journal of Research in Science Teaching, 35(3), 265–296.

Sadler, R. (1989). Formative assessment and the design of instructional systems. Instructional Science, 18(2), 119–144.

Sarf, A. (1998). On two metaphors for learning and the dangers of choosing just one. Educational Researcher, 27(4–13).

Scriven, M. (1967). The methodology of evaluation. In M. Scriven (Ed.), Perspectives of curriculum evaluation. Chicago: Rand McNally.

Scriven, M. (1990). Beyond formative and summative evaluation. In K. J. Rehage, M. McLaughlin, and D. Phillips (Eds.), Evaluation and education: At quarter century. NSSE yearbook. Chicago: NSSE.

Sewell, R. D. E., Stevens, R. G., & Lewis, D. J. A. (1995). Multimedia computer technology as a tool for teaching and assessment of biological science. Journal of Biological Education, 29, 27.

Shapley, K. S., & Bush, M. J. (1999). Developing a valid and reliable portfolio assessment in the primary grades: Building on practical experience. Applied Measurement in Education, 12(2), 111–132.

Shaw, J. (1997). Threats to the validity of science performance assessments for English language learners. Journal of Research in Science Teaching, 34(7), 721–743.

Shepardson, D. P. (Ed.). (2001). Assessment in science: A guide to professional development and classroom practice. Dordrecht, the Netherlands: Kluwer Academic.

Shulman, L. (1987). Knowledge and teaching: foundations of the new reforms. Harvard Educational Review, 57, 1–22.

Simpson, M. (1993). Diagnostic assessment and its contribtuion to pupils’ learning. In D. West (Ed.), Teaching, learning and assessment in science education. London: Paul Chapman Publishing.

Slater, T. F. (1997). The effectiveness of portfolio assessments in science. Journal of College Science Teaching, 26(5), 315.

Slater, T., Ryan, J., & Samson, S. (1997). Impact and dynamics of portfolio assessment and traditional assessment in a college physics course. Journal of Research in Science Teaching, 34(3), 255–271.

Smith, P. S., Hounshell, P., Copolo, C., & Wilkerson, S. (1992). The impact of end-of-course testing in chemistry on curriculum and instruction. Science Education, 76(5), 523–530.

Solano-Flores, G., & Nelson-Barber, S. (2001). On the cultural validity of science assessments. Journal of Research in Science Teaching, 38(5), 553–573.

Solano-Flores, G., & Shavelson, R. J. (1997). Development of peformance assessment in science: conceptual, practical and logistical issues. Educational Measurement, 1997, 16.24.

Stefani, L. A. J., & Tariq, V. N. (1996). Running group practical projects for first-year undergraduate students. Journal of Biological Education, 30, 36.

Stern, L., & Ahlgren, A. (2002). Analysis of students’ assessments in middle school curriculum materials: Aiming precisely at benchmarks and standards. Journal of Research in Science Teaching, 39(9), 889–910.

Stoddart, T., Abrams, R., Gasper, E., & Canaday, D. (2000). Concept maps as assessment in science inquiry learning—a report of methodology. International Journal of Science Education, 22(12), 1221–1246.

Swain, J. (1996). The impact and effect of key stage 3 science tests. School Science Review, 78(283), 79–90.

Swain, J. (1997). The impact and effect of key stage 3 science tasks. School Science Review, 78(284), 99–104.

Tamir, P. (1998). Assessment and evaluation in science education: Opportunities to learn and outcomes. In K. Tobin (Ed.), International Handbook of Science Education (pp. 761–789). London: Kluwer Academic.

Tan, K. C. D., Goh, N. K., Chia, L. S., & Treagust, D. F. (2002). Development and application of a two-tier multiple choice diagnostic instrument to assess high school students’ understanding of inorganic chemistry qualitative analysis. Journal of Research in Science Teaching, 39(4), 283–301.

Taras, M. (2002). Using assessment for learning and learning from assessment. Assessment and Evaluation in Higher Education, 27(6), 501–510.

Tariq, V. N., Stefani, L. A. J., Butcher, A. C., & Heylings, D. J. A. (1998). Developing a new approach to the assessment of project work. Assessment and Evaluation in Higher Education, 23(3), 221.

Taylor, C., & Gardner, P. (1999). An alternative method of answering and scoring multiple choice tests. Research in Science Education, 29(3), 353–363.

Tittle, C. (1994). Toward an educational psychology of assessment for teaching and learning: Theories, contexts and validation arguments. Educational Psychologist, 29(3), 149–162.

Torrance, H. (1993). Formative assessment: Some theoretical problems and empirical questions. Cambridge Journal of Education, 23(3), 333–343.

Torrance, H. (2000). Post-modernism and educational assessment. In A. Filer (Ed.), Assessment; social practice and social product. London: RoutledgeFalmer.

Torrance, H., & Pryor, J. (1995). Investigating teacher assessment in infant classrooms: Methodological problems and emerging issues. Assessment in Education, 2(3), 305–320.

Torrance, H., & Pryor, J. (1998). Investigating formative assessment: Teaching, learning and assessment in the classroom. Buckingham, UK: Open University Press.

Torrance, H., & Pryor, J. (2001). Developing formative assessment in the classroom: using action research to explore and modify theory. British Educational Research Journal, 27(5), 615–631.

Toth, E. E., Suthers, D. D., & Lesgold, A. M. (2002). “Mapping to know“: The effects of representational guidance and reflective assessment on scientific inquiry. Science Education, 86(2), 264–286.

Treagust, D. F. (1995). Diagnostic assessment. In R. Duit (Ed.), Learning science in the schools: Research reforming practice. Hillsdale, NJ: Lawrence Erlbaum Associates.

Treagust, D. F., Jacobowitz, R., Gallagher, J. L., & Parker, J. (2001). Using assessment as a guide in teaching for understanding: A case study of a middle school science class learning about sound. Science Education, 85(2), 137–157.

Tripp, G., Murphy, A., Stafford, B., & Childers, P. B. (1997). Peer tutors and students work with formative assessment. The Clearing House, 71(2), 103.

Volkmann, M. J., & Abell, S. K. (2003). Seamless assessment. Science and Children, 40(8), 41.

Wallace, J., & Mintzes, J. (1990). The concept map as a research tool: Exploring conceptual change in biology. Journal of Research in Science Teaching, 27(10), 1033–1052.

Welzel, M., & Roth, W. M. (1998). Do interviews really assess students’ knowledge? International Journal of Science Education, 20(1), 25–44.

White, R., & Gunstone, R. (1992). Probing understanding. London: The Falmer Press.

Wiediger, S. D., & Hutchinson, J. S. (2002). The significance of accurate student self-assessment in understanding of chemical concepts. Journal of Chemical Education, 79(1), 120.

Wiggins, G. (1989). A true test: Toward more authentic and equitable assessment. Phi Delta Kappan, 703–713.

Wiggins, G. (1993). Assessing student performance. San Franisco: Jossey-Bass.

Wiliam, D. (1992). Some technical issues in assessment: A user's guide. British Journal of Curriculum and Assessment, 2(3), 11–20.

Wiliam, D. (1994). Towards a philosophy for educational assessment. Paper presented at the Annual Conference of the British Education Research Association, Bath, England.

Wilson, J. (1996). Concept maps about chemical equilibrium and students’ achievement scores. Research in Science Education, 26(2), 169–185.

Wolfe, D., Bixby, J., Glenn, J. I., & Gardner, H. (1991). To use their minds well: Investigating new forms of student assessment. Review of Research in Education, 17, 31–74.

Yorke, M. (2003). Formative assessment in higher education: Moves towards theory and the enhancement of pedagogic practice. Higher Education, 45(4), 477–501.

Yung, B. H. W. (2001). Three views of fairness in a school-based assessment scheme of practical work in biology. International Journal of Science Education, 23(10), 985–1005.

Zachos, P., Hick, T. L., Doane, W. E. J., & Sargent, C. (2000). Setting theoretical and empirical foundations for assessing scientific inquiry and discovery in educational programs. Journal of Research in Science Teaching, 37(9), 938–962.

Zeegers, Y. (2003). Pedagogical content knowledge or pedagogical reasoning about science teaching and learning. Paper presented at the Paper presented at the Annual Conference of the Australasian Science Education Research Association, Melbourne, July 2003.

Zoller, U. (1996). The use of examinations for revealing and distinguishing between students’ misconceptions, misunderstandings and “no conceptions” in college chemistry. Research in Science Education, 26(3), 317–326.

Zoller, U., Fastow, M., Lubezky, A., & Tsaparlis, G. (1999). Students’ self-assessment in chemistry examinations requiring higher- and lower-order cognitive skills. Journal of Chemical Education, 76(1), 112.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.14.132