Not just a number
Standing on a deserted train station in Market Rasen some ten years ago, Carol FitzGibbon and I had an extended conversation about the government's attitude to value-added. (If you knew Market Rasen station, you'd appreciate the full poignancy of this. A train once famously ground to a halt a quarter of a mile beyond the station and then reversed shame-facedly back to collect its passengers. The driver had forgotten the station was there.)
Carol was at that time a member of a working party set up by the Department of Education to consider value-added and was experiencing the full weight of ministerial - and hence civil service - scepticism. What, after all, was wrong with raw-score league tables?
She believed that government could (and should) set up a value-added service for schools at least as useful as the ALIS and YELLIS services for which she was then responsible.
She also believed that attempting to reduce the performance of a whole school to a single number was a futile quest: too many complexities and internal variations would make the resulting number almost meaningless.
Ten years on and history has vindicated both viewpoints. The government was persuaded that raw-score tables were failing to tell the whole story and introduced its value-added tables, with the crude median-centred methodology still in use today.
Unsurprisingly, it was pilloried in all quarters. The data was supposed to be centered on 100 but the majority of schools obtained scores below this figure. (This was the ultimate statistical achievement - presenting the majority of schools as below average.)
By failing to standardise the data, the tables managed to place schools in the wrong order with large schools tending to be nearer to the norm figure than smaller schools.
Most tellingly, the tables were excoriated by the Royal Statistical Society for failing to indicate which scores were statistically significant, so no conclusions could be drawn from them at all. (Not that this deterred national - and more lethally, local - newspapers from publishing wholly misleading rank orders of schools.)
Another startling observation was that schools with the most favoured intakes were tending to obtain the best value-added scores. No surprise there, you might think - but one of the main purposes of value-added had been to allow fair comparisons between schools.
Enter the Fischer Family Trust and the popularisation of contextual value-added. By introducing other factors to the input scores driving the value-added calculation, FFT has been able in part to resolve this anomaly.
Each factor is weighted according to how closely it correlates with pupil results. The additional factors included in input scores at pupil level are gender, age, ethnicity, mobility and SEN stage.
The analyses clearly show which results are statistically significant - though results are still presented in a form which tempt the reader to invalid conclusions.
More controversially, at whole school level it includes as a factor the average prior attainment of the whole school cohort - as well as the spread of prior attainment of the whole cohort - thereby seeking to make valid comparisons between schools whatever the balance of their intakes.
FFT analyses also include a social deprivation factor at whole school level. Unfortunately, the data used is free school meal eligibility, the use of which was largely discredited in research commissioned by Ofsted in 2000.
Moreover, this factor seems to affect schools inconsistently, making FFT data almost unusable for some schools in rural areas.
Despite these limitations, some of the LEAs who bought into FFT have tended to use its calculated scores as though sent down from Mount Sinai. Many have also abandoned their own equally valuable value-added analyses.
In addition, early assessment procedures for school improvement partners imbued FFT analyses with almost mystical significance. SIPs are still provided with 'FFT supplementary data' - wholly unnecessary now that they have the CVA Panda.
Panda vs Fischer
Which brings us to the CVA Panda itself, published for the first time last year, and scheduled to provide the value-added scores for next year's league tables.
It uses the methodology pioneered by the Fischer Family Trust and retains most of its strengths. However, it includes two additional factors at pupil level: first language and an in-care measure.
Though it continues to use free school meal data, this factor is applied at pupil level, where it is far less significant, and is moderated by another, arguably more robust, social deprivation factor - the IDACI, or Income Deprivation Affecting Children Index.
It is hard to dispute that the CVA Panda is currently the best measure we have. But this in turn engenders a danger that it will be treated with even more veneration than the Fischer Family Trust data.
Many of its most avid users are not blessed with the statistical insight to interpret it validly. And because the calculation on which it is based is iterative, it is almost impossible to reach inside the black box and challenge it.
And it does still have some real weaknesses. As an accountability measure, its biggest failing lies in 'garbage in, garbage out'. That is, it measures pupils' prior attainment by their SATs results.
If we have little faith in the validity of the SATs results (because, for instance, of the effects of coaching or the suspected unreliability of both the tests and the marking) then the value-added scores become meaningless, however sophisticated the methodology.
For the more important uses of value-added - internal analyses of results within schools - the lack of flexibility is very limiting. For even the most elementary review, we need to look at specific subsets of our results.
How well did a specific group of mentored pupils perform? Were their results statistically significant? Which subjects have significant results? How does the whole school score change if certain pupils or subjects are removed? Which particular input factors are most affecting our score? Few of these answers are easily available.
Even statistically significant data tells us nothing about cause and effect. All it can do is point us in the direction of the right questions.
Different analyses give different results, so we need as many value-added analyses as we can get. We use the ones we trust to help in our internal reviews. And we find the most flattering of the rest to resist the claimed certainties of the CVA Panda and potentially damaging conclusions by Ofsted.
The more sophisticated we make value-added, the more useful it can be for our internal reviews. But at the same time, the more it becomes an occult science.
However, one clear conclusion begins to emerge. Carol FitzGibbon, inspired by the ambiance of Market Rasen station, was right. Attempting to put schools into an 'effectiveness rank order' driven by a single calculation is a futile and damaging exercise.
And, perhaps more profoundly, an accountability regime dependent on each school doing better than other schools is inherently self-defeating.
Tony Neal was head of De Aston School in Market Rasen, Lincolnshire until he retired in December. He was a long-time member of Council and association president in 2001-02. He is the ASCL representative to the GTC.
© 2017 Association of School and College Leaders