Chapter 9
Measured Lives

The rise of assessment as the engine of change in English schools

James, M. (2000) ‘Measured lives: the rise of assessment as the engine of change in English schools’, The Curriculum Journal, 11(3): 343–364.

The Class of 2000

In the summer of 2000, 16 year olds sitting General Certificate of Secondary Education (GCSE) examinations in England were the first cohort of students to have followed the National Curriculum throughout their eleven years of compulsory schooling from its introduction in September 1989. The National Curriculum was implemented without piloting so they have, in a sense, been the guinea pigs for the new system and its assessment.

In 1991, aged 7, at the end of Year 2 and Key Stage 1, they experienced the first standard assessment tasks (SATs) in the core subjects of English, mathematics and science. These introduced: a reading test in which the determination of attainment level could rest crucially on the number of mistakes made in reading a 100-word passage from a story book; an activity based on a mathematics game that was much praised as a learning activity but proved difficult and time-consuming for teachers to assess; and the notorious ‘floating and sinking’ activity in science which puzzled many 7 year olds, especially those who had the beginnings of an intuitive understanding of density of mass as different from weight (James and Conner, 1993). At this time the SATs were administered and marked by teachers and moderated locally with guidance from the national assessment body, the Schools Examinations and Assessment Council (James, 1994). Scores were calculated on a 10-level scale designed to calibrate attainment from 5 to 16, with the average child expected to progress one level every one to two years; level 2 was then expected to ‘stretch a typical 7 year old’. Teacher assessment (TA) was also carried out using the hundreds of Statements of Attainment (SoAs) that then constituted the criteria for attainment targets in each subject. This was the time of the checklist, as teachers resorted to ticking boxes as a way of recording whether, when and to what degree children had demonstrated the criteria. Their records and judgements were expected to reflect their observations over the key stage but since they were moderated towards the end of Year 2 they often felt obliged to keep the evidence. Cupboards were stuffed with bulging portfolios of children’s work. For the first time parents received statutory reports with the national levels of attainment attached to teachers’ comments on their child’s progress.

Four years later, in 1995, at the end of Key Stage 2, this same cohort were the first to take Year 6 externally set and marked tests in the core subjects, although speaking and listening (Attainment Target 1 in English), using and applying (Attainment Target 1 in mathematics) and scientific investigation (Attainment Target 1 in science) were still assessed by teachers whose own assessments of the attainment targets covered by the tests were also reported to parents alongside test results. At this point the bluntness of the 10-level scale for measuring individual attainment from 5 to 16 became apparent because many children, while achieving a level 2 or 3 at age 7, were assessed as having made what looked like minimal progress of perhaps one level by age 11. To some extent this was the result of the many adjustments made to the curriculum and assessments at that time (for instance, the introduction of a separate spelling test to sharpen up the assessment of basic skills in English) but it was also an intrinsic weakness of a system that did not compare like with like. Both the curriculum and the tests were different at age 11 from age 7, but the common scale and criteria gave the impression that there was strict comparability.

By 1998, when this cohort reached the end of Year 9 and Key Stage 3, they took another set of statutory tests in the core subjects. The National Curriculum was introduced in secondary schools shortly after the start in primary schools, so 14 year olds had been taking Key Stage 3 tests since 1992. However, six years later problems were still experienced and, on this occasion, the government had trouble with the software agency it had employed to process the data. As a consequence, having sat their tests in May, students did not receive their results until the end of September – far too late to inform decisions about courses in Key Stage 4. Again, there were inconsistencies between assessments at the same level from Key Stage 2 to Key Stage 3 which needed to be explained to parents to allay fears that their children were making little or no progress.

As before, teacher assessments in the core subjects were reported alongside test results, as were teacher assessments in the foundation subjects of design technology, history, geography, modern foreign languages and information technology. Key Stage 3 optional tests and tasks in non-core subjects were published in 1996 by the Schools Curriculum and Assessment Agency (SCAA) (which had replaced SEAC and the National Curriculum Council) for use by teachers, although evidence about how they were used at this time is limited (Gipps and Clarke, 1996). Anecdotal evidence suggests that some tests and tasks (e.g. in geography) were valued more highly than others.

The revisions to the National Curriculum and its assessment arrangements made in 1996, following the Dearing review in 1993/4, and stimulated partly by a boycott of Key Stage 3 English tests by teachers in 1992/3, had eliminated levels 9 and 10 from the national scale and substituted ‘exceptional performance’ for attainment above level 8. The hundreds of atomistic statements of attainment had also gone and been replaced by composite level descriptions which were intended to be used more holistically in ‘best fit’ judgements by teachers. Music, art and physical education, however, were assessed using ‘end of key stage statements’ although many schools misunderstood the guidance they were given and used the A–D grades, which they were asked to utilize when reporting these results to government, as a shorthand for reporting to parents. This might not have been a problem had the grades been used more conventionally. However, ‘exceptional performance’ was allocated a ‘D’ grade while poor performance carried an ‘A’, causing considerable confusion all round.

The system now required that a child’s individual test and TA results should be reported to parents alongside statistics for the whole year group in the school and national statistics for the previous year. This was designed to assist them in judging how the performance of their child compared with others. Government did not, and still does not, publish performance tables of school results at Key Stage 3 although there were some early, but largely unfounded, fears that unscrupulous local journalists might compile their own simply by getting hold of the report of just one child in each school.

Although there had been much debate about the relationship of the Key Stage 4 curriculum to the GSCE (Stobart, 1991), by the time that this cohort embarked on Key Stage 4 courses, the GCSE (and in some cases the General National Vocational Qualification (GNVQ)) was firmly established as the main means of assessing attainment on the National Curriculum. The number of examining boards had reduced, and coalesced into three main examining groups, but they had to seek approval from the Qualifications and Curriculum Authority (which had replaced the SCAA) for syllabuses in National Curriculum subjects. The 1996 revisions to the curriculum had, however, reintroduced some flexibility to the Key Stage 4 curriculum. Whereas in 1989 this cohort had been expected to continue all ten original National Curriculum subjects to 16, by the time they entered Year 10 the compulsory curriculum had reduced to the core subjects plus design and technology, a modern foreign language, physical education and religious education, with information technology emerging as a separate subject in line with the government’s new emphasis on the key skills of literacy, numeracy and information and communication technology (ICT) competence. Thus the arts and humanities became optional at this key stage. To all intents and purposes the broad and balanced compulsory curriculum ended with Key Stage 3.

Although information had been published about the alignment of National Curriculum levels with GCSE grades, the A* to G grading system of GCSE quickly dominated discourse at Key Stage 4 as students and schools came under increasing pressure to produce the five A* to C passes that (as the inheritance of comparison with pre-1988 O level examinations) continued to be the measure of success in published performance tables and as the passport to advanced further education. Despite attempts to celebrate performance at the lower levels, and in different aspects of achievement through the National Record of Achievement (NRA), this magic number of higher grade GCSEs remains the touchstone for performance in the eyes of many parents, employers and the media, and perhaps many students and teachers as well.

For those in the Class of 2000 who hope to continue with their full- time education post-16, their guinea pig status is likely to persist. They will be the first group taking the new Advanced Subsidiary (AS) examinations at the end of Year 12 before deciding whether to continue to full A levels by taking A2 examinations in Year 13. However, when most 15–16 year olds were asked to consider sixth form and further education courses in spring term 2000, many of the new AS/A level syllabuses had not yet been seen by the teachers who would teach them. Having negotiated the obstacle course that has been their experience since 1989, there is surely some argument that this cohort should receive not a maroon NRA folder on leaving school, but a red beret!

One would expect all this change in the life of a single cohort of students to be destabilizing and stressful. There is considerable evidence for this. The Primary Assessment, Curriculum and Experience (PACE) project has monitored the impact of policy changes on the experience of headteachers, teachers and students in English primary schools from 1989 to 1997. This includes a longitudinal study of a sample of pupils who were in Year 1 in 1989 – now the GCSE Class of 2000. Findings from PACE suggest that:

The attitudes and learning behaviour exhibited by the children as they experienced an increasing tightening of curriculum control and of the impact of external and overt assessment became more strongly evident at the end of Key Stage 2. They were ‘performance orientated’ rather than ‘learning orientated’. The children in the study were very aware of the importance of good marks, getting things right. In a climate of explicit and formal assessment, however, many of them avoided challenge and had a low tolerance of ambiguity.

(Pollard and Triggs, 2000, cited in Broadfoot, 1998: 9)

Other, devastating, evidence is provided by ChildLine which, in 1999, received almost 800 calls about test and examination stress and nineteen of these children said that they were so stressed that they had contemplated or attempted suicide. Three young people actually killed themselves during the exam period (Slater, 2000).

In April 2000, however, the Chief Inspector of Schools, Chris Woodhead, argued that if young children, particularly, feel the pressure of tests it is because teachers communicate it to them. He may have a point. Indeed, the PACE project emphasized the way in which policy directives have been translated into classroom practice through a series of ‘mediations’, including the mediations of teachers. Thus the pressures that teachers experience are communicated to students, although the PACE researchers point out that this is not a case of simple transference, i.e. passing on the pressure and the stress to relieve their own.

My son is one of the Class of 2000. He took his first written GCSE paper on the day I started this article. On the morning of the examination he looked marginally more tense than he had before other tests, which he had been relaxed about, and I asked why. He replied, ‘Well, these matter to me.’ This is not to say that he had not felt the pressure of previous tests. He said that teachers had constantly tried to impress him and his peers with the importance of getting good scores. However, he believed that increasingly frequent testing had inoculated a proportion of students against taking them too seriously, particularly as tests taken before the end of Key Stage 4 had no very important consequences for them personally in terms of selection for future schools or courses. While the pressure on students is real, it may vary according to the extent to which they perceive high stakes for themselves, and the extent to which teachers protect them from the pressure schools now experience. The publication of league tables was designed to encourage public pressure on schools through open competition for students. The use of school results in the Office for Standards in Education (OFSTED) inspection process (introduced from 1992), in statutory school target setting (made a requirement from 1998), and in the provision of evidence of students’ progress to support teachers’ applications to go through the pay threshold (from September 2000), have also added to the pressures on teachers and schools from this source.

The next generation

There is no immediate prospect that 4- and 5-year-old children who start school in 2000 will have a better experience. Indeed it could be worse as far as testing is concerned. Baseline assessment is now established in Year 1 and can be the focus of the very first, if rather perplexing, encounter with the class teacher, as Torrance and Pryor (1998: 69–76) so graphically illustrate. Statutory end of Key Stage 1 assessment in Year 2 continues, although its form has changed considerably and level categories are now subdivided. Driven by value-added analyses of cohort data, level 2B is currently regarded as the goal for 7 year olds if they are to achieve the national target of level 4 by age 11.

Optional tests in English, mathematics and science were first produced for years 3, 4 and 5 (8, 9 and 10 year olds) in 1997, originally to provide national benchmark data. QCA received 20,000 requests for these tests in 2000, which would account for around 80 per cent of primary schools in the country. While remaining optional, there is speculation (The Times Educational Supplement, 12 May 2000: 1) that we will soon see widespread use, especially if teachers are pressed to provide evidence of the progress students make in their classes for the purposes of claiming performance- related pay increases. There are also plans for the introduction by 2002 of tests for the most able 9 year olds as part of the government’s ‘Excellence in Cities’ initiative. These will give them access to university summer schools and master classes at specialist and private schools.

The results of end of Key Stage 2 tests have become increasingly high profile as they are reported in performance tables for primary schools. The Secretary of State, David Blunkett, has staked his job on achieving ambitious national targets of 80 per cent of 11 year olds achieving level 4 in English and 75 per cent achieving level 4 in Maths by 2002. This is challenging because level 4 was originally intended to stretch the average 11 year old. There is increasing evidence that this is now having a marked backwash effect on curriculum, organization and teaching, especially in Year 6. The following account illustrates how the pressure of accountability (a managerial concept) is putting strain on a middle school teacher’s sense of her educational and pastoral responsibility to children (a moral concept).

In the six years that I have been teaching I have found my role changing quite significantly. These six years have all been spent within one school and in the same year group (Year 6). When I began my teaching career I kept my class for most of their timetable and taught the majority of subjects to them. We, therefore, had very close pastoral, as well as academic, contact.

Six years later, I have my class for only nine out of 35 lessons in a school week. This is not a trend that just affects myself, as over the last year in particular, setting has become commonplace in both maths and English even within Years 5 and 6. Previously these lessons had been class-based but now are taught by a narrower team of staff with a more specialist knowledge of their subjects. Increasing pressure to perform academically means that I often use my class lesson to fit in extra science work – a time that I would have used to discuss issues that had arisen during the week within the class. Year 6 pupils have returned to school during the Easter holidays and on Saturday mornings as part of the pre-SATs (Standard Assessment Tests) preparations. The focus of these lessons has been on raising the standard of academic achievement within maths and English through small-group, intensive teaching.

The SATs in Year 6 may exaggerate this trend, particularly where League Tables of published results are important in a small town that has three Middle Schools. Reading Test and Maths scores in Year 6 and Year 8 are used as an indicator of the success of the school as part of the Local Authority Value-Added initiative. Consequently, academic achievement has a high priority throughout the four Middle School year groups, often, I feel, at the expense of the pastoral system.

These developments leave me confused. While I am determined to give the pupils in my care the best academic start that I can, I instinctively feel that I cannot achieve this unless the children are happy and cared for in terms of their general well-being. My pastoral role has increasingly been squeezed into breaktimes and lunchtimes, almost as a side-line interest, and yet the children seem to need this time, particularly as the academic pressure increases around the time of the May SATs.

(Farrow, 1999: 2–3)

This is clear illustration of Broadfoot’s (1998: 12) point that:

External accountability has increased and although personal and moral responsibility was [in the PACE study] still seen as important, there was some evidence of a shift in climate from a covenant based on trust to a contract based on the delivery of education to meet national economic goals rather than as a form of personal development. Some teachers expressed fragmented identities, torn between a discourse which emphasized technical and managerial skills and values which continued to emphasize the importance of an emotional and affective dimension to teaching.

Broadfoot claims that this conflict was particularly characteristic of older teachers with ‘newer teachers more likely to find satisfaction with a more constrained and instrumental role without losing their commitment to the affective side of teaching’. Farrow’s account, however, suggests that some younger teachers also experience role dissonance.

Concerns with the transition between Key Stage 2 and Key Stage 3 and the evidence that a proportion of students are underachieving, stand still or even regress at the start of the secondary phase have stimulated a range of government initiatives such as booster classes, summer schools and catch-up programmes in literacy and numeracy. In order to assess the extent to which Year 7 students manage to catch up, ‘progress’ tests in English and mathematics, using the same tests as 11 year olds, were piloted by QCA with 12 year olds in May 2000. These are likely to be offered to all students who failed to achieve level 4 at the end of Key Stage 2 in 2001.

End of Key Stage 3 tests are now well established in Year 9 and the GCSE/GNVQ continues to be the measure of performance at the end of Key Stage 4 in Year 11. Although no national tests have been introduced in Year 10, most schools conduct their own Year 10 examinations and ‘mock’ GCSE examinations in the autumn or spring term of Year 11. As mentioned earlier, changes in the post-16 curriculum mean that all students, including those following academic courses, can now expect to sit external examinations in both Years 12 and 13. In addition, by 2002, the most able academic students will be offered an additional ‘world class’ test. Known as the Advanced Extension examination this will enable top universities to choose between students with equally good grades: something that admissions tutors at Oxbridge have been demanding for years.

The curriculum for 14 to 18 year olds remains in a state of flux and it is likely that it will remain so for some time as policy-makers continue to re-examine the relationship between vocational and academic courses and between pre-16 and post-16 provision. It is conceivable that the role and value of the GCSE as the main qualification at 16, which when it was initiated was described as ‘divisive, bureaucratic, retrogressive and obsolescent’ (Nuttall, 1984: 143), will at last be reassessed by policy-makers. In 2000 the Secretary of State agreed to allow some students starting Key Stage 4 to skip GCSEs in some subjects and move straight to AS level and GNVQ courses. This suggests, however, that rather than ending external examinations at 16+, which would bring us in line with other European countries (Broadfoot, 1996), we may simply see more flexibility about which examinations are taken when.

Taken as a whole, the picture of current activity indicates that many children starting school in 2000 should expect to take some form of external test or examination every year with the sole exception of Year 8, although even here some new tests are planned for the most able, for the same purpose as the proposed extension tests for 9 year olds. Given the attention that government is now turning to Key Stage 3, and the research evidence that suggests students regard Year 8 as a year for ‘social exploration’ rather than ‘real work’ (Rudduck et al., 1996: 133), one might ask whether some form of test for the whole of Year 8 can be far away.

How did it come to be like this?

England has now achieved the dubious distinction of subjecting its school students to more external tests than any other country in the world and spending more money on doing so (Whetton, 1999). There is little evidence that other countries are rushing to follow our lead; indeed, countries like Japan are earnestly trying to extract themselves from their ‘examination hell’. So why do we appear to be so bent on pursuing this course? And why has the general direction towards more and more formal assessment been maintained despite a change in government?

From an historical and sociological perspective, Broadfoot (1999) notes that assessment procedures in England have always played a key role in controlling an otherwise almost anarchic system. Prior to 1988, in the absence of anything approaching a National Curriculum, tests such as the 11+ and examinations at 16 and 18 were powerful in determining what was taught in schools by defining what was necessary to gain access to the next stage. In this way they were instrumental in establishing some consensus and common direction in a basically laissez-faire system. The broader uses of assessment, beyond the certification and accreditation of individual achievement, to influence the behaviour of teachers and schools, have therefore long been recognized, if not always explicitly, by policy-makers and practitioners alike.

Given this inheritance, it is not so surprising that the Task Group on Assessment and Testing (TGAT), which was set up in 1987 to design the assessment system for the new National Curriculum, was asked by Kenneth Baker, the Conservative Secretary of State for Education at the time, to come up with a system to fulfil a number of different roles. According to Daugherty (1995), who has very fully documented the development of National Curriculum assessment from 1987 to 1994, the group’s original brief was to produce a system to serve both ‘informative’ and diagnostic purposes. By the time that TGAT reported these had become four purposes: formative, diagnostic, summative and evaluative. If formative and summative purposes can both be considered ‘informative’ in some way, then the clear addition was the ‘evaluative’ purpose that was elaborated as ‘mainly concerned with publicizing and evaluating the work of the education service’. (In Britain the term ‘evaluation’ usually refers to the appraisal of programmes or organizations, rather than individuals. This distinguishes it from the American usage that refers to the judgement-forming aspect of assessment, which follows the collection of evidence through measurement, and can apply to either individuals or programmes.)

From this point, as Daugherty (1995: 22) notes, the evaluative purpose was firmly in the frame although it was not perhaps given the kind of consideration in the TGAT report that policy-makers might have wished. Commenting on the whole TGAT experience from the distance of almost ten years, when nothing but the vestiges of the 10-level system was left of the TGAT proposals, Paul Black, who chaired the group, reflected on its political naivety:

The TGAT proposed that assessment results be reported in a context of interpretation so that they would not mislead those they were meant to inform. With hindsight, it was naive to imagine that the government, with its commitment to a market where choice would be guided by tests, would support a complex approach. On the other hand, no convincing alternative has been produced. It is naïve to think that publication of results can be resisted, and equally naïve to propose that the public should agree that the interpretation is so complex that they can’t be allowed to see the raw scores.

(Black, 1997: 41)

Possibly the group should have foreseen this problem and done more to confront it, but they were probably lulled into a sense of false security by the initial remit that made only brief mention of publication. Another (false) comfort was the membership of the group, which was made up almost entirely of education professionals, and the fact that they were allowed to get on with their work almost undisturbed. Thus, they were able to concentrate on designing a system to meet formative and summative purposes in combination, with the formative purpose uppermost. They did this by building on four principles: the system would be criterion-referenced, formative, moderated and designed to enable and demonstrate progression. The involvement of teachers was the central tenet; they would assess students’ work using criteria embodied in attainment targets, and using standard assessment tasks to check their judgements post hoc. Group moderation would assist them to develop common judgements concerning appropriate standards and the 10-level scale would enable them to monitor progression. They would sum up their judgements for reporting purposes but the chief value of their involvement would be their direct access to evidence of students’ learning which would enable them to plan the next steps in teaching.

There were many technical issues to be resolved but the TGAT report was a design brief, not a total solution, and the group estimated that the preparation and trial of new methods would take at least five years. Of course, no politician in England is prepared to wait five years for a new system to be implemented so, despite a warm welcome from the Secretary of State, there were early indications that all would not go well. Margaret Thatcher’s memoirs record her differences with her minister and her perception that the TGAT report was a subversion of a left-wing ‘educational establishment’:

that it was then welcomed by the Labour party, the National Union of Teachers and the Times Educational Establishment was enough to confirm for me that its approach was suspect.

(Thatcher, 1993: 595)

Others saw it quite differently: as a Trojan horse of the political right. Black (1997: 29) claims that the only really hostile reception he directly experienced was at the hands of academics at a meeting of the British Educational Research Association; they were concerned that the proposals would lead the way to increased summative testing and the introduction of crude performance indicators. According to an anonymous civil servant (quoted by Taylor, 1995: 181): ‘TGAT was not wanted . . . the “right-wing” camp, if you like, were trying to recover ground and as soon as they saw the chinks opening up, they went in.’ So, fears were well founded.

The first idea to be rejected was group moderation by teachers. Then the notion of judgement on broad profile components was replaced by complicated aggregation rules using atomized criteria in statements of attainment. Tasks were replaced by tests and teacher assessment was effectively down- graded. Finally, the 10-level system was modified to take out Key Stage 4. By the time of the Dearing review in 1993/4 nothing much of the TGAT proposals was left. There is much to regret in this because, as Desmond Nuttall said, shortly after the TGAT report was published:

There is some complacency in the Report: there is some misrepresentation; but by and large the proposals make assessment the servant of the curriculum rather than its dominating master.

(Nuttall, 1988: 9)

Since Dearing, the direction of development has been consolidating. Some effort has been made to support teacher assessment through, for example, the exemplification of standards material, but moderation procedures (a source of teachers’ professional development) have fallen away. And, as the earlier sections of this article illustrate, the major effort and resource has gone into the production, administration, analysis and reporting of more and more tests. But what purpose do they serve?

Monitoring, evaluation and assumptions about change

Despite the rhetoric about informing parents of their children’s progress, or providing teachers and students with the information needed to plan the next steps in teaching and learning, all the evidence points to the fourth of TGAT’s purposes – the evaluative purpose – as having become the most influential in government policy. As described in the first part of this article, assessment is increasingly used for measuring and judging the performance of teachers, schools, local education authorities and the system as a whole. This is where the high stakes lie. This contrasts markedly with, say, Japan’s ‘examination hell’ where the high stakes are for students who strive to gain places in the most prestigious junior high schools and senior high schools, as a route to the best universities. The use of test results for selection means that the summative purpose is uppermost. In contrast with England, there is no suggestion that these results should be used to evaluate the performance of teachers and schools. The culture of the country, still strongly influenced by Confucius, leads most people to believe that results are dependent on students’ own efforts. It is up to them to take advantage of the knowledge imparted by teachers, who are not considered accountable if they do not.

In England, as yet, there is little evidence that tests and examinations, below the age of 16, are being used for high stakes selection. Thus there are fewer consequences for individual students, although the positive or negative affects on motivation of knowing how one is ranked should not be underestimated (Broadfoot, 1999). The highest stakes are for schools, whose aggregate results at Key Stage 2, Key Stage 4 and at 18+ are published in performance tables, and who stand to lose students, and therefore resources, if they fail to perform well. The same is true for local education authorities and is likely to become the experience of individual teachers in the context of performance-related pay.

So the Chief Inspector of Schools was probably right when he told teachers’ unions that if children are stressed then it is because teachers are communicating their own anxieties to their students. Yet who can blame them? Each year all schools now receive, from the Department for Education and Employment (DfEE), a copy of the Autumn Package for each key stage. These are between 50 and 80 pages each and contain national summary results, national benchmark information and national value-added information. Schools also receive an even more detailed Performance and Assessment (PANDA) report from OFSTED which is constructed on the basis of school inspection data and shows how schools compare with others, taking account of factors such as electoral ward statistics, the number of children in overcrowded households, single-parent families and the proportion of adults in higher education. Local education authorities (LEAs) also conduct their own analyses and distribute these to schools. Schools are then expected to use all this information – as well as their own information, which may include value-added analyses using commercial packages – to compare their results with similar schools and to set targets. (The amount of time this consumes is enormous, if they take it seriously.) These targets have to be agreed with LEAs who then incorporate them into their own targets which are in turn monitored by the DfEE and OFSTED to ensure that they are ambitious enough. It is easy to see, therefore, how the national targets, upon which the Secretary of State has staked his reputation, get mediated through LEAs to schools. But it does not stop there. With the introduction of the pay threshold and individual target-setting for students, the pressure from the system is increasingly felt at the level of the individual. There are increasing reports in the media of the stress on teachers, including cases of suicide, and it is inevitable that some of this pressure will be communicated to students. The Class of 2000 may not all have felt this pressure but the next generation of students is unlikely to be so lucky. In large measure this pressure is a direct consequence of the particular change strategy that successive governments have chosen to adopt.

The social and political goals of Conservative and New Labour education policies have been different in important respects. The Conservative governments of Margaret Thatcher and John Major were primarily interested in pursuing economic goals. They believed these could be advanced, in part, by creating a market in education through increased competition. Diversity and choice were their watchwords. The New Labour government of Tony Blair, while still concerned with securing the country’s economic growth, holds that this will only be achieved if more of the population become economically active. However, old manual skills-based labour markets have disappeared and the knowledge economy of the twenty-first century demands that workers have higher levels of cognitive and interpersonal skill. The 30/30/40 society of the 1990s, described by Hutton (1995) as consisting of 30 per cent of the population unemployed, 30 per cent in insecure jobs and only 40 per cent in jobs with longer term prospects, was unsustainable both economically and socially. So the goal of the present government is to increase access and inclusion but to raise standards in education and training at the same time. There has been no attempt to deny the value of meritocracy – indeed meritocratic principles are actively promoted – but there is a concern that those who rise to the top should include all classes, not just those privileged by birth or wealth. In June 2000, the Prime Minister was quoted as saying:

Gordon Brown [the Chancellor] and I believe passionately in extending opportunity for all. But none of us will have any truck with old-fashioned egalitarianism that levels down. We are unashamed supporters of excellence. But we need to give far more of our kids a shot at it.

(Quoted in The Times Higher Education Supplement, 9 June 2000: 1)

This is all rather reminiscent of some of the writing of the communist Antonio Gramsci who, in the 1930s, argued that:

our aim is to produce a new stratum of intellectuals, including those capable of the highest degree of specialization, from a social group which has not traditionally developed the appropriate attitudes . . .

(Gramsci, 1971: 43)

Gramsci was critical of reforms to education in Mussolini’s Italy, based on ‘active’ learning and ‘education’ rather than ‘instruction’. He believed that instead of being democratic, as the reforms were intended, they would perpetuate social differences. Based on his own experience of success at school and university, which is described by his editors as a triumph of intellectual purpose over ill-health, undernourishment and overwork, he advocated a traditional education which would:

take the child up to the threshold of his choice of job, forming him during this time as a person capable of thinking, studying, and ruling – or controlling those who rule.

(40)

A similar intolerance of liberal progressivist educational ideologies, perhaps for similar reasons, seems to characterize the current Labour government and might explain why, on the surface, it seems to have diverted little from the course of previous Conservative governments with respect to policy on curriculum and assessment. Another possible reason is that, although they differ on ends, both Conservative and Labour governments appear to have agreement on means. They share a view on how to bring about improvement. The dominant metaphor has been the need ‘to drive up standards’ and the implication has been that standards will only improve if schools are made to ‘try harder’ – and that a ‘power-coercive’ strategy (Bennis, Benne and Chin, 1969) is the way to achieve this. To some extent their faith in this political solution has been borne out. Trend data published in the Autumn Packages 1999 indicate that standards of attainment related to national target levels have been rising steadily over the last five years, but more obviously in Key Stages 2 and 4 where results are reported through published performance tables.

So, should we celebrate this state of affairs and be content that the policy continues, convinced that it will be possible to achieve ever higher standards of performance? Undoubtedly, there is room for improvement and some, perhaps many, schools and teachers are complacent or too accepting of poor standards. Thus, the coercive approach has an impact. Whether this can be usefully sustained into the foreseeable future is another question. It seems to me that there are two major problems with uncritical and indefinite adherence to this policy: the first concerns definitions of achievement and the second concerns the model of change.

As is illustrated by Farrow’s account and the PACE study, there is evidence of backwash from the tests onto the curriculum and teaching. Educational achievement is increasingly associated with performance on tests and evidence is emerging that tests of the core subjects, especially in Key Stage 2, are narrowing the curriculum by becoming ends in themselves. Thus, there are reports (The Times Educational Supplement, 26 May 2000: 9) that tests in May are followed by two ‘dead months’ when the Key Stage 2 curriculum is deemed to have ended. In many ways this replicates the experience at Key Stage 4 when some students ceremoniously burn their books after their GCSE exams. The idea that education ends with a test or examination is inimical to the idea of lifelong education that government is also trying to promote; so is the idea that what is worth learning is defined by what can be demonstrated in pencil and paper tests. If assessment is to contribute positively to lifelong learning, it will need to support learning how to learn and to develop new modes to capture ‘deep’ learning (knowledge of concepts, principles and processes that can be applied in creative ways in novel contexts) rather than ‘surface’ learning (of factual information and procedures that may only be memorized for tests) (see James and Gipps, 1998, for a fuller account).

Concerning the model of change, there is good evidence from other systems (Linn, 2000) that attainment results generally rise after the introduction of a new test but then plateau or begin to fall away after a few years. Perhaps the English National Curriculum assessments are still too new for this phenomenon to be observed, but one can imagine that this will happen eventually if for no other reason than that there is a limit to what can be done simply by urging schools to try harder. To sustain improvement, schools, teachers and students need to work ‘smarter’ also. They need to become more effective by challenging and changing some of their deeply held ideas and behaviours. This can only come about if they are given opportunities for professional development – what Bennis, Benne and Chin (1969) described as a ‘normative-re-educative’ strategy of change. This was why TGAT put teacher involvement at the centre of its proposals. To be fair, the Standards and Effectiveness Unit of the DfEE has already demonstrated its awareness of this issue, if only implicitly. It has turned its attention to providing advice to teachers on best practice in teaching and learning, which, as the primary process in which schools are engaged, must be the base on which further improvement needs to be grounded. Although still controversial, the literacy and numeracy strategies were the first manifestations of this. More recently the DfEE has funded a unit, set up at the London Institute of Education, to conduct reviews of research to inform policy and practice. It has also published a review of evidence on thinking skills (McGuinness, 1999), to be followed by development work, and it has convened a working group on formative assessment in response to the influential review of research evidence carried out by Black and Wiliam (1998a, 1998b) (see below).

None of this indicates that the elaborate testing regime now established will be dismantled, at least in the short term, but we can hope for some shifts in policy that will give greater prominence to developing a more positive relationship between assessment and learning. A further boost to this may come from the appointment of Professor David Hargreaves as Chief Executive of the Qualifications and Curriculum Authority, from September 2000, who, in his first public statement, said that he would like to see formative assessment at the heart of a reappraisal of testing and examination systems in England.

The way forward: giving priority to formative assessment

One of the problems for TGAT was that some of its thinking was simply ahead of its time, and the mistakes that it certainly made came to colour overall judgement of its value. Few lay-people fully understood what was meant by formative assessment, or how important this was for raising standards by improving learning. Earlier initiatives, such as records of achievement in the 1970s and 1980s, had established an ‘emacipatory’ assessment discourse (Broadfoot, 1999) among educators who developed and promoted these ideas. However, in guidance from government agencies, a discourse of ‘performativity’ (Lyotard, 1984) prevailed.

Following the introduction of National Curriculum assessment, the general view seemed to be that if teachers know the levels at which their students are performing they will have all the information they need to decide next steps. Thus summative results were expected to fulfil the formative purpose. This was not what TGAT had in mind because the group consistently argued that it was not the results, but the evidence on which judgements are based, that provide the key to improvement. Summative assessment requires an overall judgement that of necessity irons out inconsistencies in the evidence of performance on different occasions and in different circumstances. In formative assessment, however, these inconsistencies become the focus of interest (Simpson, 1990) because they indicate where problems in students’ learning occur. These differences in approach to the evidence do not imply that formative and summative assessment require entirely separate systems. Indeed, they need to be linked through reference to the same criteria and standards. Even the assessed activities can be common; it is simply that the evidence is treated in different ways for the different purposes (Black, 1995; Harlen and James, 1997; Wiliam, 1999). Thus, as TGAT argued, evidence that is analysed first for the purpose of discovering a student’s strengths and weaknesses and areas for improvement, can then be summed up to provide an overall judgement of attainment level. But it has to be this way round; a teacher cannot recapture the detail if it has already been lost in a categorical judgement.

These arguments are fairly subtle and there is little to indicate that policy-makers understood them. It needed more than argument – it needed evidence – even to begin to convince them. An opportunity to supply this evidence came in 1997 when the Assessment Reform Group (then called the Assessment Policy Task Group of the British Educational Research Association) secured funds from the Nuffield Foundation to carry out a review of recent research on the impact of assessment on children’s learning. The group commissioned Paul Black and Dylan Wiliam, who used it as an opportunity to look especially at the evidence base for the effectiveness of formative assessment. Appreciating policy-makers’ concern for results, they paid particular attention to quasi-experimental studies that produced measures of ‘effect’. At the close of their work their conclusion was unequivocal:

The research reported here shows conclusively that formative assessment does improve learning. The gains in achievement appear to be quite considerable, and . . . among the largest ever reported for educational interventions. As an illustration of just how big these gains are, an effect size of 0.7, if it could be achieved on a nationwide scale, would be equivalent to raising the mathematics attainment score of an ‘average’ country like England, New Zealand or the United States into the ‘top five’ after the Pacific rim countries of Singapore, Korea, Japan and Hong Kong.

If this point is accepted, then the second move is for teachers in schools to be provoked and supported in trying to establish new practices in formative assessment, there being extensive evidence to show that present levels of practice in this aspect of teaching are low, and that the level of resources devoted to its support, at least in the UK since 1988, has been almost negligible.

(Black and Wiliam, 1998a: 61)

Most of the research studies reviewed were intervention projects, which trained teachers or others to use a range of practical tactics in the classroom and evaluated the results. No single, simple formula for success emerged but all the tactics in some way fulfilled Sadler’s (1989) three indispensable conditions for improvement in learning: that students need to understand the criteria and standards being aimed for; that they should be able to assess their own actual or current performance and compare it with these standards; and that they should be able to engage in action to close the gap. Thus the projects worked with teachers on, variously: communicating criteria in student-speak; discussing examples of assessed work; developing questioning and written and oral feedback using narrative comments not grades; introducing peer and self-assessment; and developing repertoires of strategies for improvement.

The findings of Black and Wiliam’s review were published in an academic journal (Black and Wiliam, 1998a) but they were also condensed into a short pamphlet for practitioners and policy-makers (Black and Wiliam, 1998b). This was launched at a press conference and a seminar for policy-makers across the UK. Initial reactions were mixed. Teachers’ associations were enthusiastic, as were most policy-makers, although some thought that they were doing formative assessment already, mistaking their own rhetoric for reality. The press largely misunderstood what was being advocated and thought that Black and Wiliam were proposing that teachers should abandon marking students’ work. Gradually, however, the messages percolated and many invitations to Black and Wiliam and members of the Assessment Reform Group, to talk to groups of teachers and advisors, suggested that practitioners welcomed ideas, which are grounded in evidence – but essentially practical – and that restore teachers’ sense of professionalism.

There was still some way to go with policy-makers who continued not to see the contradictions between current policy and what was being advocated. So the Assessment Reform Group (1999) produced a follow-up pamphlet which tackled some of the policy issues more directly. By this time it had adopted the phrase ‘assessment for learning’ as less open to misinterpretation than ‘formative assessment’. A draft of this pamphlet was presented at a research seminar at the DfEE to which representatives of the Standards and Effectiveness Unit, OFSTED, QCA and the Teacher Training Agency (TTA) were invited. There were still those who had difficulty perceiving assessment as wider than measurement, but others were interested in how they might take the ideas on board. For example, OFSTED attempted to incorporate aspects of assessment for learning into its new framework for inspection, the QCA carried out a small survey on current practice with teachers and head- teachers and, as mentioned earlier, the DfEE set up a working party on formative assessment.

It is still too early to say whether these initiatives will have real impact. Undoubtedly, the temptation will be simply to add formative assessment to existing assessment policy, which is still dominated by demands for information that can be used to monitor and evaluate the system. As the Assessment Reform Group tried to point out, this will not be satisfactory because the tensions between these purposes need to be resolved. Although it will not be appropriate to go back to the TGAT report, any further review of assessment and testing will need to do what TGAT attempted, i.e. examine carefully what each kind of assessment (formative, diagnostic, summative and evaluative) can contribute to educational improvement and precisely how they relate to one another. We need a coherent system in England and we are still a long way from achieving it. It also needs to be more economical of resources, both money and people’s time, and considerably less stressful. If nothing is done, our children risk drowning in measurement and teachers will lack the strength to save them.

Conclusion

Of necessity, this account has focused upon England and, in so far as England has such a unique (and peculiar) history, other countries may have little to learn from this experience. Even Wales, whose assessment policy has been closely aligned with England’s, is beginning to shift away (Daugherty, 2000). However, there are a number of specific points to be drawn from England’s story that may have wider relevance. I frame them here as questions, rather than statements, to indicate their provisional status.

  1. However superficially successful an assessment policy has been in raising performance on tests and examinations, is there a point at which the system becomes so stressful for students and teachers that it becomes counter-productive? There is evidence that England has reached this point: there is a crisis in teacher recruitment, despite the recent introduction of additional financial incentives for trainee teachers, and high numbers of experienced teachers are leaving the profession early. The increased ‘performance’ orientation of students is also likely to undermine the goal of encouraging positive attitudes to lifelong learning.
  2. Is not the combination of a prescriptive National Curriculum and an elaborate statutory assessment regime simply a case of overkill, which is ultimately wasteful of resources? No other country has both. It is surely possible to achieve a satisfactory measure of transparency in schools, which will satisfy public accountability, by developing standards for curriculum and learning and monitoring achievement through audited self-evaluation (MacBeath, 1999). This would put learning, not simply performance, back in the frame at both individual and school level.
  3. If assessment is a lever for change in schools, should more attention be paid to the models of change that underpin this assumption? In particular should the limits of coercive strategies be recognized and should attention now turn to developing powerful approaches to formative assessment as a central dimension of effective pedagogy?
  4. If lifelong learning in a rapidly changing knowledge society is an important goal, should more effort and resources be put into new modes of assessment that will support the development of the kinds of knowledge and skills that economic, social and family life in the twenty-first century will demand?
  5. Is the meritocratic vision adequate as a means of promoting social justice?

While more people from disadvantaged groups may find a place in the sun, what happens to those left out in the cold? A basic tenet of the movement for ‘emancipatory’ assessment or ‘assessment for learning’ is that all people can learn and that all should be helped to do so. This is neither an argument for ‘levelling down’ nor simply for greater representation of children from lower socio-economic groups among the elite. It is an argument for enabling all to make the kind of progress undreamed of in the past. Giving everyone support, not simply ‘access’, is what is really required to fulfil individual, institutional, national and global aspirations.

References

Assessment Reform Group (1999) Assessment for Learning: Beyond the Black Box. Cambridge: University of Cambridge School of Education.

Bennis, W. G., Benne, K. D. and Chin, R. (1969) The Planning of Change (2nd edn). New York: Holt, Rinehart & Winston.

Black, P. (1995) ‘Can teachers use assessment to improve learning?’. British Journal of Curriculum and Assessment 5(2): 7–11.

Black, P. J. (1997) ‘Whatever happened to TGAT?’. In Cullingford, C. (ed.) Assessment versus Evaluation. London: Cassell: 24–50.

Black, P. and Wiliam, D. (1998a) ‘Assessment and classroom learning’. Assessment in Education: Principles, Policy and Practice 5(1): 7–75.

Black, P. and Wiliam, D. (1998b) Inside the Black Box: Raising Standards through Classroom Assessment. London: King’s College.

Broadfoot, P. M. (1996) Education, Assessment and Society. Buckingham: Open University Press.

Broadfoot, P. (1998) ‘Categories, standards and instrumentalism: theorizing the changing discourse of assessment policy in English primary education’. Paper presented at the annual conference of the American Educational Research Association, San Diego, California, April.

Broadfoot, P. (1999) ‘Empowerment or performativity? English assessment policy in the late twentieth century’. Paper presented at the British Educational Research Association Annual Conference, University of Sussex, 2–5 September.

Daugherty, R. (1995) National Curriculum Assessment: A Review of Policy 1987–1994. London: Falmer Press.

Daugherty, R. (2000) ‘National Curriculum assessment policies in Wales: administrative devolution or indigenous policy development?’. Welsh Journal of Education, December.

Farrow, L. (1999) ‘An investigation into concepts of care in a middle school’. Unpublished M.Ed. thesis, University of Cambridge Faculty of Education.

Gipps, C. and Clarke, S. (1996) Monitoring Consistency in Teacher Assessment and the Impact of SCAA’s Guidance Materials at Key Stages 1, 2 and 3. London: SCAA.

Gramsci, A. (1971) Selections from the Prison Notebooks, ed. and trans. by Q. Hoare and G. Nowell-Smith. London: Lawrence & Wishart.

Harlen, W. and James, M. (1997) ‘Assessment and learning: differences and relationships between formative and summative assessment’. Assessment in Education: Principles, Policy and Practice 4(3): 365–79.

Hutton, W. (1995) The State We’re In. London: Jonathan Cape.

James, M. (1994) ‘Experience of quality assurance at Key Stage 1’. In Harlen, W. (ed.) Enhancing Quality in Assessment. London: Paul Chapman: 116–38.

James, M. and Conner, C. (1993) ‘Are reliability and validity achievable in National Curriculum assessment? Some reflections on moderation at Key Stage 1 in 1992’. The Curriculum Journal 4(1): 5–19.

James, M. and Gipps, C. (1998) ‘Broadening the basis of assessment to prevent the narrowing of learning’. The Curriculum Journal 9(3): 285–97.

Linn, R. L. (2000) ‘Assessments and accountability’. Educational Researcher 29(2): 4–16.

Lyotard, J. (1984) The Post-Modern Condition. Manchester: Manchester University Press.

MacBeath, J. (1999) Schools Must Speak for Themselves. London: Routledge.

McGuinness, C. (1999) From Thinking Skills to Thinking Classrooms: A Review and Evaluation of Approaches for Developing Pupils’ Thinking, Research Report RR115. London: DfEE.

Nuttall, D. L. (1984) ‘Doomsday or a new dawn? The prospects for a common system of examining at 16+’. In Broadfoot, P. (ed.) Selection, Certification and Control. Lewes: Falmer Press.

Nuttall, D. L. (1988) ‘National Assessment: “Complacency or Misrepresentation”?’. Lecture given as part of a series on the Education Reform Bill at the University of London Institute of Education, March.

Pollard, A. and Triggs, P. (2000) Policy, Practice and Pupil Experience. Continuum International Publishing Group.

Rudduck, J., Chaplain, R. and Wallace, G. (eds) (1996) School Improvement. What Can Pupils Tell Us?. London: David Fulton.

Sadler, D. R. (1989) ‘Formative assessment and the design of instructional systems’. Instructional Science 18: 119–44.

Simpson, M. (1990) ‘Why criterion-referenced assessment is unlikely to improve learning’. The Curriculum Journal 1(2): 171–83.

Slater, J. (2000) ‘Generation stress’. The Times Educational Supplement, 5 May: 24.

Stobart, G. (1991) ‘GCSE meets Key Stage 4: something had to give’. Cambridge Journal of Education 21 (2): 177–87.

Taylor, T. (1995) ‘Movers and shakers: high politics and the origins of the National Curriculum’. The Curriculum Journal 6(2): 161–84.

Thatcher, M. (1993) The Downing Street Years. London: HarperCollins.

Torrance, H. and Pryor, J. (1998) Investigating Formative Assessment: Teaching, Learning and Assessment in the Classroom. Buckingham: Open University Press.

Whetton, C. (1999) ‘Attempting to find the true cost of assessment systems’. Paper presented at the IAEA conference, Bled, Slovenia.

Wiliam, D. (1999) ‘“There is no alternative”: mitigating the tension between formative and summative functions of assessment’. Paper presented at the eighth conference of the European Association for Reading on Learning and Instruction, Gothenburg, August.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.70.170