|
Testing and Assessment
Testing and Assessment: A
response from The Mathematical Association
Download the
report here (Word Doc 48kb)
This response represents an amalgamation of
individual and group responses, moderated by the Teaching Committee of
the Association.
General Issues
-
A centrally run system of testing and
assessment is important for the credibility of qualifications,
and for the moderation of teacher assessment. Educational
institutions also need to be accountable to stakeholders and funders
(eg the taxpayer), and a national system of assessment is one
(though only one) measure of satisfactory performance. It is
therefore important for individuals, at least at critical point(s),
for end users of education such as employers, and for stakeholders
in the education system.
-
Other systems have
been well explored, eg in the Tomlinson report.
-
Many of our members believe they now have
evidence that ‘high stakes assessment’ is distorting the curriculum
and reducing creativity: in particular, the committee is
referred to the small-scale evidence submitted to QCA and others by
the MA, ATM and NANAMIC: Impact of Assessment on Learning and
Teaching Mathematics (front page at www.m-a.org.uk).
-
QCA’s effectiveness
with respect to mathematics has been reduced by the disbandment of
the maths team; this situation has recently been addressed by the
appointment of a maths specialist within the leading team. We are
concerned about the apparent lack of accountability to, or even
cognisance of, the professional community: witness the recent drives
to push through 2-tier GCSE, or 3-6-9-12 etc GCE Maths
qualifications at the behest of exam boards and in the face of
deep-seated unease among the community.
-
We also have grave concerns about the development
of some current initiatives devolved to within the exam boards:
we see little evidence that they have the necessary depth or breadth
of expertise for the development of significant changes in
mathematics education, and already see signs that assessments are
driving the new initiatives (development of Functional Maths
qualifications, of changes to GCSE Mathematics, and of a second
Mathematics GCSE), rather than being driven by sound mathematics
education philosophy and research.
Our evidence
suggests that we are increasingly seeing the growth in value of what
we can assess, rather than the assessment of what we value: this is a
profoundly worrying trend which must be reversed if we are to build our
capacity for good mathematics education in our classrooms. We feel
it is exacerbated by the existence of a number of commercially-driven
exam boards; we deplore the increasing availability of board-endorsed or
marketed study materials which undoubtedly compromise the integrity of
the system, and in particular we query the need for more than one exam
board operating any given qualification. (We have more detailed
proposals regarding this proposal, should they be thought useful)
National Key Stage Tests
-
How effective are they?
In mathematics, the Key Stage tests in themselves are good
assessments comprising well-developed questions which probe
understanding, that is, when used well they support good mathematics
learning. Broadly speaking, they rank students correctly and are
useful as diagnostic tools for pinpointing areas of relative
strength and weakness. In other words, they have the potential to be
used as good instruments for assessment for learning.
-
Do
they adequately reflect levels of performance?
At Key Stages 1, 2 and 3 the tests increasingly inflate the levels
‘achieved’, even without coaching immediately prior to the test, eg
on a level 4-6 paper at Key Stage 3 the marks are broadly divided
equally between the 3 levels concerned, yet it is only necessary to
answer most of the levels 4 and 5 questions correctly to be
‘awarded’ a level 6. This causes misunderstanding among all
concerned, and is a nonsense.
-
Changes over time:
The IPPR report, “Assessment and Testing”1,
presents evidence to suggest that improvements in National
Curriculum Levels overstate underlying improvements in attainment.
“Although the two are not
directly comparable, improvements in TIMSS
(Trends
in International Mathematics and Science Study)
are thus much less impressive than the measured improvements in key
stage test results. The Statistics Commission considered these issues
in 2005 and concluded that: ‘The Commission believes that it has been
established that (a) the improvement in Key Stage 2 test scores between
1995 and 2000 substantially overstates the improvement in standards in
English primary schools over that period, but (b) there was nevertheless
some rise in standards.’ (Statistics Commission 2005: 4). Looking at
the secondary phase, the percentages of pupils attaining the benchmark
at Key Stage 3 and Key Stage 4 have continued to rise although progress
on international attainment measures has stalled. Evidence from TIMSS
for Key Stage 3 (Year 9) does not show any significant change in
performance between 1995 and 2003 (Ruddock et al 2004). Analysis of the
international study PISA (Programme for International Student
Assessment) shows that for a given score at Key Stage 3 or Key Stage 4,
pupils attained on average a higher PISA score in 2000 than in 2003 (Micklewright
and Schnepf 2006). 2
-
Cause and Effect:
We know of no evidence showing that the introduction of high stakes
testing has in itself raised standards: our members are overwhelming
of the opinion that where progress has been made, particularly with
non-specialist or inexperienced teachers, it has been supported by
the use of the National Strategies, especially where these have been
applied intelligently.
-
Coaching for the test,
now occupying inflated teaching time and effort in almost all
schools for which we have information at each Key Stage, is not
constructive: short term ‘teaching how to’ is no substitute for
longterm teaching of understanding and relationship within and
beyond mathematics part of a broad and balanced curriculum. It is
interesting that in Wales, where testing is no longer ‘high stakes’,
apparent (but, according to work samples, not real) attainment has
decreased, but healthy practice such as cross-phase moderation, is
now being adopted.
-
Such testing is marginally effective in
concentrating students’ minds, but in terms of longterm learning the
current practices are destructive, and counterproductive in terms of
sustained attainment. Neither are they effective in holding schools
accountable for performance: feeder schools with very similar
results can produce students with widely varying usable mathematics
skills and understanding by the following September. In general only
the most confident and competent schools or teachers are able to
withstand the perceived pressure to warp teaching according to the
tests. In general, teacher assessment appears to have little value
to stakeholders, or even to leadership teams in schools.
-
We
fail to see how the proposals in ‘Making Good Progress’ would
address these concerns: at worst, the difficulties will be
exacerbated, with a greater proportion of time devoted to coaching
for tests rather than building for longterm robustness and fluency.
It will be possible to hothouse students to ‘achieve’ at higher
levels because only a comparatively small range of skills is being
tested at any one time, but this is well-known by effective teachers
to be counterproductive in the medium term. Further, recent work by
Dylan Wiliam has cast doubt on the accuracy of results at KS2 and
KS3, estimating that about 32% of KS2 results and 43% of KS3 results
are at least one level out. 3
-
A move
to single-level tests would strongly encourage a move to
single-level teaching in order to prepare students for them. This
has not been common since the early days of the National Curriculum
due to the fragmentation of learning it can lead to. Such a move
would be counter to the proposed changes to KS3 and 4 Programmes of
Study which allow for an increase, rather than a reduction, in
curricular freedom and personalisation; it would undermine existing
good practice.
-
Increasing the frequency of high-stakes testing
for schools will increase the pressure on students to perform. In
our view, it is unethical to put children under such pressure when
the results are of more importance to the school than they are to
individual students.
-
We do not believe in any case that it is possible
reliably to discriminate at a certain level with a single test:
‘levelness’ is an amalgam of skills, concepts and knowledge, and the
variation with which students ‘achieve’ a given level at present
suggests they each develop in different directions at different
speeds. ‘Levels’ in a given area of mathematics are far more
constructively employed as discriminators for progression, that is,
for teacher and student assessment for learning. Our concern is not
with single-level tests in themselves but with the way in with they
are currently used. There have been examples of single-level tests
that have worked reasonably well, for example, GAIM and the MEI
National Curriculum scheme for GCSE, where they are used ‘en
passant’ as part of the development of a broad palette.
-
We see no argument to keep these tests as ‘high
stakes’ assessment in terms of league tables: Northern Ireland and
Scotland have well-respected systems without resorting to such, and
in Wales the system is developing well without them. As in-school
components of formative assessment, alongside teacher assessment,
they are valuable. Until GCSE we see no need for blanket formal
high-stakes summative assessment.
-
We feel the levels are broadly age-appropriate,
given that ‘the average’ should mean perhaps 60% can achieve
meaningfully (and sustainably) at or above that level. But
achievement should mean broad ‘mastery’: at present, critical,
typically more challenging, areas for understanding and progression
are skimped at every stage in the race to ‘achieve’ more highly, for
example, notions of proportionality, and a robust fluency with
numbers such that algebra is but a trivial generalisation of
understanding, are often under-developed.
Testing and Assessment at
16 and after:
o
Is testing and assessment in
‘summative’ tests fit for purpose? GCEs
are not good indicators for successful progression in cognate subjects:
the correlation to degree class is low. As good school-leaving
certificates giving evidence of learners’ achievements and strengths,
GCEs tend to focus on a very narrow skill range so give only a partial
picture. At GCSE especially, grades are given at such low mark
thresholds that they hardly can be said to celebrate learners’
achievements, nor, since they do not require mastery, are they of
consistent use to end-users.
o
Additionally, particularly ‘high
stakes’ qualifications such as Mathematics GCSE are now skewing even
provision in the secondary and college curricula, with increasing
resources concentrated on borderline C/D students at the expense of
those more or less able, and often for short term gain rather than
confident mastery of subject material: in other words, coaching for the
test rather than teaching for longterm understanding of fundamentals.
o
Are the changes to coursework due to
come into effect in 2009 reasonable?
Changes to coursework (and other GCSE changes in mathematics) have been
rushed in without sufficient trialling of their effects on teaching and
learning: there is no indication that replacement papers will assess
effectively those skills engendered by proper coursework, although in
the best classrooms there will now be more time available to properly
develop those skills.
o
Is holding formal summative tests at
ages 16, 17 and 18 imposing too great a burden on students?
Too much of the year is taken up with examinations and direct
preparation for them. The move to reduce the number of modules will help
by cutting the number of assessments and reducing the use of the January
sitting. (Just testing at 16, 17 and 18 would be a step forward, at the
moment, many students are tested at 14.5, 15, 15.5 and 16 (for modular
GCSEs), then at 16.5, 17, 17.5 and 18 for GCEs.)
o
If so, what changes should be made?
At GCSE there are already too many changes
in the pipeline for teachers easily to be able to make good use of them
for students: these, and the change to 4 modules at GCE, must first be
given time to embed. We must avoid the current destructive practice of
making a set of changes then planning the next set before the previous
ones have been implemented or evaluated. Longterm, efforts must be made
to reduce the present fragmented and time-consuming burden of external
summative assessment.
o
To what extent is frequent, modular
assessment altering both the scope of teaching and the style of
teaching? In the worst cases, it has
allowed many teachers to teach in a dull and boring way, using the stick
of impending examinations to motivate learners rather than inspiring and
enthusing them. It has introduced a narrow focus on imparting to
learners mark-winning behaviours rather than teaching them a coherent
understanding of the subject. Links between different parts of the
subject are not examined because the topics are in different modules,
and as they are not examined they are not taught. They have contributed
to the paucity of teaching multi-stage problem-solving and the synthesis
and communication of arguments. The widespread perception is that best
practice in teaching and learning is mutually exclusive with optimising
module results.
o
How does the national assessment
system interact with university entrance?
There is a tension between how far the system is there to reflect
achievement to date, and how far it is to act as selection instrument
for progression, where progression can be in a wide variety of
directions. In particular, GCE has long since ceased to be primarily a
selection instrument for universities. Attempts at bolt-ons, like A*
grades or ‘stretch and challenge’ are unlikely to be effective in
meeting that use although they may serve to undermine other uses. In
mathematics especially, it is difficult to see how one could set a paper
which was accessible across the population now undertaking GCE studies
yet providing opportunity for the most able in the subject to shine.
o
What does it mean for a national
system of testing and assessment that universities are setting entrance
tests as individual institutions? The
increase in the number of these tests makes it clear that they are not
meeting the needs of at least some end-users. Some suggest that a more
constructive development might be a coordinated set of papers in each
subject, rather than students being tested repeatedly by each of their
five university choices. Perhaps we should see the role of GCE as being
part of a leaving certificate which wraps up and attests the learner’s
achievement at school/college and use separate instruments for assessing
suitability for progression to higher education in particular subjects,
though of course this adds to the ‘burden of assessment’. If we did this
in maths, we would probably need a suite of papers of increasing
difficulty: what would excite Anglia Ruskin in a candidate would not be
the same as what would make Trinity, Cambridge sit up and take note,
although GCEs probably remain appropriate for selection for some
institutions. The key thing would be students would have to sit at most
one extra set of papers and it would suffice for all their university
applications in a subject (and they would do this at a sensible time of
year). GCE is being asked to do an additional job from the one for which
it was introduced; in making a fairish job of serving that new role it
has ceased to fulfil its original role as well.
J.Golding
May 2007, for The Mathematical Association
1: Assessment and Testing: Making space for
teaching and learning, IPPR, December 2006
2: Assessment and Testing: Making space for
teaching and learning, IPPR, December 2006
3: ‘The Reliability of Assessments’ in Assessment and Learning, P
Black and D Wiliam, Gardner J (ed), London: Sage, 2006
30.05.07 |