Written evidence submitted by David Blow

 

Submission to the Select Committee regarding the grading of exams - David Blow

 

Introduction and reason for submitting evidence - David Blow

I am currently Executive Head of SESSET (South East Surrey Schools Education Trust) which comprises 3 secondary comprehensive schools (The Ashcombe, Therfield and The Warwick). Until 2019, I was Head of The Ashcombe School.  For a number of years, I have worked very closely with ASCL on data matters, including presenting at its regular Leadership of Data conferences.  This has involved liaising with Ofqual, DfE, Ofsted and exam boards. 

 

I was invited to be a member of the Ofqual External Advisory Group, and agreed, signing a Non-Disclosure Agreement.  I am adhering to this, and am not disclosing any of the discussions from the meetings.  Reference is only made to widely available information, including the Ofqual Interim Technical Report, https://www.gov.uk/government/publications/awarding-gcse-as-a-levels-in-summer-2020-interim-report which naturally draws heavily on papers presented to the Group.

 

As part of my voluntary contribution to ASCL regarding the 2020 exam grading, I have worked very closely with ASCL and prepared documents and spreadsheets which were made freely available through the ASCL website (as I have done over time with other data matters such as transition matrices and Progress 8).

 

The reason for submitting evidence is to help bring some clarity to what happened, bringing together the technical aspects with the wide variety of words and understandings circulating throughout the period since lockdown.  I was involved through ASCL in unpicking the 2008 CVA crisis and then again in 2012 in establishing what had actually happened in the 2012 GCSE English grading crisis.  During this period, I worked with DfE to get the national subject transition matrices readily available and wrote the spreadsheets to enable schools to use them as a practical school improvement tool.  All of these experiences have given me insights and examples highly relevant to the current situation.

 

There is a Summary followed by a detailed Submission (with links to the documents referred to), and then there are Annexes with more information and evidence.

 

 

Submission to the Select Committee regarding the grading of exams - David Blow

Introduction and reason for submitting evidence - David Blow

Submission

Summary

2 incompatible objectives set from the beginning

why are these incompatible?

continued incompatible messaging

ASCL published two documents to set out principles and approach

CAGs v standardisation and "prediction"

Ofqual published Consultation document (3rd April announcement, then 15th April)

"Predicted grades"

Ofqual External Advisory Group on Exam Grading

ASCL webinar & ppts

Burning issues during the Ofqual consultation period (15th Apr - 25th May)

"Improving schools"  ("rapidly improving" / "trajectory")

Year 10 pupils getting grades

Technical points

Exam boards and Ofqual focus on grades and outcomes; schools focus on progress

Ofqual advice on calculating grades and "previous years" (15th May)

Both Objectives and approaches were being disseminated by exam boards and other organisations

ASCL publish step-by-step guidance to calculate grade distributions in line with Ofqual "previous years"  19 May 2020

Schools submit CAGs (12th June 2020)

June - FFT Datalab reveal extent of over-estimation (12th June)

June - Ofqual develop nuanced model

end of June - when actual CAGs arrive at Ofqual, the nuanced model has to be abandoned

Time to look at actual CAGs when confirming model very limited

Ofqual Summer Symposium

Outline of model and process

Comparison between 2019 grades and submitted CAGs

Equalities analysis

Dealing with small numbers of candidates entered in a subject at a centres

U grades

"Top grades at AL"

Overarching frustration

Aftermath following decision to use CAGs (late August)

Misunderstanding re AL VA grades compared with outcomes

Schools Week and other articles

Extracts from the articles are in Annex 2 - Articles

ASCL modest proposal, eventually rejected by Ofqual (late Aug \ Sept)

House of Commons Education Select Committee hearing (2nd Sept)

Annex 1 - FFT Datalab analysis of data

19th & 20th March - Awarding grades in 2020

12th June - 3 blogs on the grades proposed by schools 12 Jun 2020

13th August - A level results 2020 - why independent schools have done well out of this year's awarding process/

Annex 2 - Articles

Daily Telegraph (Friday 21st August)

Mixed message from Williamson meant process was 'doomed to fail'

Schools Week  - Schools that followed advice to deflate grades must now be given appeal route (21st Aug)

Schools Week  - Schools that followed advice to deflate grades must now be given appeal route (21st Aug)

Annex 3 - Education Select Committee hearing (2nd Sept)

Annex 4 - Analysis of changes in Progress 8 from one year to the next

Comparing P8 Maths and P8 English at a school level in 2018 and 2019

Change in P8 in English and Maths between 2018 and 2019

Spread of P8 in each year, and the difference and average

School-wide improvement from 2018 to 2019

Annex 5 - Comparing the awarded grades in June 2020 with other calculations

Overview

Information about the Ofqual and exam board standardisation model

"Previous years"

Comparison with the ASCL subject Transition Matrices approach & Toolkit

Overall steps for each subject

Infographic of ASCL approach

More about the calculations in the Ofqual standardisation model and the information being sent to centres on the Wednesday, the day before Results Day

How do the two different methodologies compare?

Particular differences between approaches

"Requirements for the calculation of results in summer 2020.pdf"

Prediction matrices

Comparable outcomes

Ofqual video explaining approach

Research in 2013-14 on the use of KS2 data in prediction matrices

r-squared for the GCSE subjects listed in the DfE Subject Transition Matrices for June 2019

 

 

 

Submission

 

Summary

This submission explains why the Centre Assessed Grades were over-estimated by schools, leading to the standardisation process being unable to use them much, and leading to a challenging situation, even though the modelling process from a technical perspective was of high quality, for example, with no additional disadvantage to any particular groups relative to 2019.

 

There were two incompatible objectives set from the beginning.

20th March - Gavin Williamson announced:

Objective 1 ("individual grade") : "students are awarded a grade which fairly reflects the work that they have put in". after 18th March in Parliament " We will work with the sector and Ofqual to ensure that children get the qualifications that they need."  and Boris Johnson on 18th March " we will make sure that pupils get the qualifications they need and deserve for their academic career"

 

Objective 2: ("grade distribution") : "We will also aim to ensure that the distribution of grades follows a similar pattern to that in other years"  ("comparable outcomes")

 

The submission explains why these are incompatible, and why Objective 1 is inherently inflationary i.e. leading to over-estimation.

 

These incompatible objectives were then perpetuated by Ofqual and the exam boards.  ASLC and others promoted Objective 2, giving detailed step-by-step instructions with Ofqual publishing the necessary information to ensure quantitative accuracy.

 

Realising that there was likely to be significant over-estimation, Ofqual developed a model so that schools following objective 2 would have the fine-detail of the grading of their CAGs (e.g. where there was clustering so that very similar candidates received the same grade) carried into the final grades, whilst this would not be possible where there was significant over-estimation.

 

But when the CAGs arrived at the end of June, this model had to be abandoned because of the degree of over-estimation, which for very technical reasons, meant that the objective 2 schools would actually have been disadvantaged. Because the exam boards needed to have final details of the model in early July, there was not time to try any further refinements.

 

The submission shows the quality and thoroughness of the technical work by the teams at Ofqual and the exam boards, and how unfair it is that the "algorithm" has become a convenient punchbag.

 

One of the deep frustrations around the whole process was that potentially it could so easily have worked.  Without the perpetuated confusion over Objective 1 and Objective 2, if the majority rather than minority of schools had set constrained CAGs (i.e. two-thirds rather than one-third), then for the majority of students their awarded grades would have been close to their CAG.  And the nuanced grading around clustering would have removed some of the anomalies, as the nuanced grading would have carried through into the awarded grades because of the overall restraint.

 

 

 

2 incompatible objectives set from the beginning

20th March - Gavin Williamson  https://www.gov.uk/government/news/further-details-on-exams-and-grades-announced

Objective 1 ("individual grade") : "students are awarded a grade which fairly reflects the work that they have put in". after 18th March in Parliament " We will work with the sector and Ofqual to ensure that children get the qualifications that they need."  and Boris Johnson on 18th March " we will make sure that pupils get the qualifications they need and deserve for their academic career"

 

Objective 2: ("grade distribution") : "We will also aim to ensure that the distribution of grades follows a similar pattern to that in other years"  ("comparable outcomes")

(Boris Johnsonhttps://www.gov.uk/government/speeches/pm-statement-on-coronavirus-18-march-2020)

 

why are these incompatible?

Imagine you are the teacher of a class of 30 pupils and you are 75% confident for each pupil that they will get a grade 4 or above, so in line with Objective 1 ("individual grade"), you give them all (100%) a CAG of 4 or above

 

But, in practice, if the exams had gone ahead, then there is always variation in exam performance on the day, so typically 75% of them would actually have gained a grade 4 or above , and 25% would not.  So for Objective 2 ("grade distribution"), only 75% of the class should get a CAG of 4 or above

 

I devised this illustration in May and shared it as widely as possible.

 

This example shows that Objective 1 is inherently inflationary and therefore in breach of Ofqual's remit to maintain standards over time (done through comparable outcomes).  This contradiction was never addressed by Ofqual as an organisation, and indeed, in the aftermath, exacerbated its own contradictory stance by trying to use the Head of Centre declaration as a "stick" against schools which had submitted realistic grades!

 

continued incompatible messaging

Ofqual continued this incompatible messaging with their consultation and announcements, with the exam boards continuing as well. e.g. (Head of Centre declaration fits Objective 1, Guidance to Heads of Centre  below - fits Objective 2)

Summer_2020_Awarding_GCSEs_A_levels_-_Info_for_Heads_of_Centre_22MAY2020.pdf

https://www.gov.uk/government/publications/gcses-as-and-a-level-awarding-summer-2020

"previous results in your centre in this subject – these will vary according to a number of factors, including prior attainment of the students, but our data shows that for most centres any year-on-year variation in results for a given subject is normally quite small" (pp. 6-7)

5 See for example https://analytics.ofqual.gov.uk/apps/GCSE/CentreVariability/ 

J:\My Documents\djbwork\exam20\national\coronavirus\articles\Head of Centre declaration.gif

and the instruction to Heads of Centre about the Declaration (p.17) "In reviewing these centre assessment grades, the Head of Centre should consider how the distribution of centre assessment grades compares with grades achieved by the centre in previous years"  This is the only factor cited on p.17 out of the many listed on pp. 6-7, showing that Ofqual were giving it top priority (in spite of subsequently trying to disown it).

 

 

ASCL published two documents to set out principles and approach

Dave Thomson from FFT Datalab published a blog outlining the Transition Matrices approach on 19th & 20th March (see Annex 1). 

ASCL published two documents to set out principles and approach:

 

30th March - why it is ultimately fair to the whole cohort to follow Objective 2

https://www.ascl.org.uk/ASCL/media/ASCL/Help%20and%20advice/Leadership%20and%20governance/CV-Emerging-principles-and-guidance-regarding-teacher-assessed-frades,-summer-2020-30-March-2020.pdf

 

9th April - "how to do it"

https://www.ascl.org.uk/ASCL/media/ASCL/Help%20and%20advice/Leadership%20and%20governance/Coronavirus-Guidance-regarding-centre-assessed-grades-for-summer-2020.pdf

:  assign an objective mark using assessment data, and that will give rankings.  Calculate a grade distribution in line with previous years and national transition matrices, and assign grades accordingly, making fine adjustments at grade boundaries.  ASCL call this "constrained CAGs"  i.e. constrained so as to be non-inflationary overall, but using centres' fine-tuning (e.g. to deal with clustering of candidates).  This fine-tuning would mean that the right students would get the right grades,  and we can now see that up to half of students would have benefitted at AL.

 

The document also gave a staged approach and timeline to help schools with their planning.

 

 

In practice, schools divided into two clusters, those responding to Objective 1 in a variety of ways (we now know, about two-thirds) and those responding to Objective 2  (we now know, about one-third)See Annex 1 - FFT analysis of school data

 

These had been preceded by a blog from Dave Thomson, Chief Statistician of FFT Datalab on 19th and 20th March outlining the transition matrices approach. https://ffteducationdatalab.org.uk/2020/03/awarding-grades-in-2020/

 

Schools have become very familiar with using the DfE national Subject Transition Matrices as a very transparent and effective way of comparing subject performance at a school level with that at a national level, taking prior attainment into account.  It was this familiarity with a very sound way of constructing grade distributions which meant that schools would be able to deliver fair grading in line with comparable outcomes, if the "rules" were set correctly and explained clearly.  It would be critical to ensure that over-estimations were brought back into line without disadvantaging those which were already in line.

 

The concept of using a mark as the starting point rather than starting with the grade was a vital aspect of the ASCL approach.  Indeed, it was clear using this careful thinking that actually the grades allocated by teachers were of relatively much less importance than the ranking.  Unfortunately, this insight was not widely appreciated at the time, although people have slowly come to appreciate it as events have unfolded.

 

It was really unfortunate that so much of the explanatory material from Ofqual, the exam boards and others started from deciding on the grade for the student, and then the rankings with in the grade, rather than an objective mark (which cuts through much of the  bias issue) leading to rankings, then from a grade distribution to grades.

 

 

CAGs v standardisation and "prediction"

Ofqual published Consultation document (3rd April announcement, then 15th April)

Ofqual published initial statements on 3rd April, followed by the formal Consultation document on 15th April https://www.gov.uk/government/news/ofqual-seeks-views-on-gcse-and-a-level-grading-proposals-for-2020

 

The 2 incompatible objectives were embedded in the Ofqual Consultation:

"students’ grades will instead be based on evidence of their likely performance in the exams had they gone ahead….the grade the student would most likely have received had the exams taken place" p.6  [my emphasis]

 

"While teachers are well equipped to rank order their students, some centres will inevitably be slightly more generous and others slightly more severe than the average when they are determining centre assessment grades. As far as possible, such inconsistencies will be corrected by the exam boards when they standardise all centre assessment grades."  (p.7)

 

Ofqual make it clear that giving weight to CAGs will be unfair as it would lead to differences in the standards applied by different centres not being brought into line, and secondly it is likely that this would produce results that were overall too lenient.

"Alternatively, an approach placing more weight on statistical expectations could determine the most likely distribution of grades for each centre based on the previous performance of the centre and the prior attainment profile of this year’s students. It could then use the submitted rank order to assign grades to individual students in line with this expected grade distribution." [my emphasis] pp28-29

 

i.e. the CAGs themselves do not figure in this statement !  so this was clear in print from Ofqual from April 15th, and yet, as we will see, this was not taken on board by many people and organisations including Ofqual itself as an entire organisation.

 

"Firstly, such an approach would reflect research evidence about the likely accuracy of centre assessment grades versus rank orders. Research suggests that while around half of centre assessment grades are likely to be accurate, a third are likely to be too generous and a sixth too pessimistic" p.29

 

QUESTION - to what extent do you agree or disagree that using an approach to statistical standardisation which emphasises historical evidence of centre performance given the prior attainment of students is likely to be fairest for all students? (p.29)

 

Unfortunately, this section and question set up a false dichotomy between "inflationary CAGs" and a "calculated grade".  Although the standardisation model as described refers to grade distribution, because it does not take account of the CAGs, it is in practice calculating an available grade for each student by prescribing a fixed number of each grade to the centre

 

From an early stage, ASCL were proposing a "best of both worlds" combination, which became known as "constrained CAGs".  This gave centres the flexibility to assign grades taking account of particular clusters of students or individual "outliers", whilst keeping key measures such as average score and %9-4 in line with the needs of comparable outcomes, i.e. non-inflationary.

 

"Predicted grades"

Various articles and bodies looked at the outcomes of "predicted grades" in other situations e.g. Schools Week article on 3rd April looking at predicted grades ("84% within one grade, 31% too positive, 29% too negative"), https://schoolsweek.co.uk/coronavirus-gcse-analysis-gives-confidence-to-teacher-assessment-plan/ and Ofqual's own consultation document.  However, these failed to take account of the purpose to which these grades were being put, and whether the prediction had any impact on the awarded grade.  Where there is an incentive to over-predict because there may be a benefit to a student, as in UCAS predicted grades, a very different story emerges

"UCAS - chapter-8-2019-end-of-cycle-report.pdf"

https://www.ucas.com/file/292726/download?token=wswAnzge

"In 2019, 21% (31,220) of accepted 18 year old applicants met or exceeded their predicted grades, a decrease of 3 percentage points." i.e. almost 80% of predictions were one or more grades above.

 

Ofqual External Advisory Group on Exam Grading

The purpose, terms of reference and membership of the Group were in Annex C of the Consultation document.  It met 5 times in all, with its first meeting on 2nd April ahead of the Ofqual announcement, and its last meeting on Tuesday 11th Aug, the day before the A-level results were available for schools to download.  It was made clear that advisory group members were able to speak freely about the approach being taken to aid understanding (as the Scottish U-turn had just taken place), unaware of what was to  happen that evening with the Secretary of State announcement.

 

 

ASCL webinar & ppts

Duncan Baldwin, Deputy Director of Policy at ASCL, gave a well-attended ASCL webinar on 29th April.   Duncan had worked tirelessly from the outset, thinking through the issues and liaising closely with Ofqual.  Publishing the two ASCL papers early (referred to above) had ensured that the moral message that grades in line with previous years were actually in the best interest of ALL students, and also giving the "how to calculate" message

 

He also used the analogy of cancelled Olympic marathon and how to rank the runners (which would give the medallists) as a brilliant way of highlighting some of the issues which were being discussed at the time, esp. around "improving schools", but putting it in a different context.  He also did the ranking and grading exercise allocating GCSE grades to the leadership of ASCL, which showed why the grades themselves were essential in dealing with clustering and outliers.

He reiterated:

  Why is starting with a ‘mark’ helpful?

 

The video is available at https://vimeo.com/413110422 and the ppt slides at:

https://www.ascl.org.uk/ASCL/media/ASCL/Help%20and%20advice/Accountability/ASCL-centre-assessed-grades-Duncan-Baldwin.pdf

 

Potential bias became a significant theme during the process, and so it's important to highlight how, from the outset, the ASCL approach cut through much of the debate by starting with an objective mark.  As we will see later, in fact, the model did not disadvantage students from any particular group relative to 2019. It's important to stress the vital difference between "actual worsening from 2019" as opposed to a "relative disadvantage against other groups", the small number of situations where there was no alternative to giving the benefit of the doubt to the student, thus creating for those students a relative advantage in those infrequent cases,

 

Burning issues during the Ofqual consultation period (15th Apr - 25th May)

"Improving schools"  ("rapidly improving" / "trajectory")

This issue had been specifically raised as a Question on p29 of Ofqual consultation.  The key issue was whether there was objective external evidence which could be used to decide if a school's results in 2020 would have been higher than in 2019  (or indeed lower, but no-one raised this as an issue!).  This generated much emotion and debate, but no one was able to describe a process which could be applied nationally within the time and resource constraints.  So it was accepted by representative organisations that the results for 2020 should reflect some kind of average of previous years.  See Annex 4 for analysis of changes in Progress 8 from one year to the next.  This shows that there was a tendency on average for schools which made an improvement one year to show a decrease the following year, and there were under 40 schools showing a jump of 0.5 in P8 English and P8 Maths from 2018 to 2019.

 

Year 10 pupils getting grades

Initially, the Consultation had proposed not giving Year 10 pupils (and others) grades because of the difficulty of getting reliable estimates, but this was reversed in the light of pressure from schools.

 

Technical points

Exam boards and Ofqual focus on grades and outcomes; schools focus on progress

In the last two decades, there has been an ever increasing use by schools of progress (Value-Added) measures to see how pupils are making progress taking into account their starting point, and this has been echoed at a national level by the move in 2016 to Progress 8 as the main accountability measure for schools.  This fits in with the subject transition matrices approach.  The national information for this can be found in the Downloads from the DfE Performance Tables, which give the Progress 8 information for each school, as well as much other data.

https://www.compare-school-performance.service.gov.uk/schools-by-type?step=default&table=schools&region=all-england&for=secondary

 

Exam boards on the other hand are focussed on the outcome, the grade as that represents the standard and ties in with their brief to maintain standards over time.  In practice, using prior attainment information at both GCSE (using KS2) and AL (using GCSE) helps to give a statistical clarity.  Ofqual and exam boards also tend to focus on cumulative percentages, esp. at "judgemental" grade boundaries (AL - A/B and E/U, and GCSE - 7/6, 4/3 and 1/U with 9/8 as a special case).  Ofqual Analytics https://analytics.ofqual.gov.uk/apps/GCSE/CentreVariability/

is an excellent source of data comparing the outcomes from one year to the next. These tend to show a fairly symmetrical distribution around zero, with similar numbers of schools recording increases in a given subject as numbers recording decreases.

 

For more details, see Annex 5, which has the contents of a Guidance document I prepared for ASCL, issued on 11th August, just ahead of the A-level results intended to help people get a sense of the two different approaches, why they are fundamentally the same even though they appear to operate differently, and why there might be small differences in particular circumstance

 

Ofqual advice on calculating grades and "previous years" (15th May)

"Making grades as fair as they can be: advice for schools and colleges"

https://ofqual.blog.gov.uk/2020/05/15/making-grades-as-fair-as-they-can-be-advice-for-schools-and-colleges/

"using the centre’s historical results For GCSEs, it will consider data from 2018 and 2019"

 

ASCL had continued to liaise closely with Ofqual, as it was recognised that it was in everyone's interest for the CAGs to be not too far from the standardised grade actually awarded.  Given the timeline for centres to be working on finalising the calculations for grades and that half-term was from 25th - 29th May, so schools needed to know at least a week ahead of half-term what would count as "previous years".  However, the full Consultation Response was taking time to draw together and would not be published in time for schools to use in their calculations. So Ofqual agreed to write a blog giving the key factual information, which was published on 15th May.

 

This shows how Ofqual were giving their full support to enable (but not require) schools to calculate their grades in line with the standardisation model

 

In all of this, it was fully accepted that Ofqual could not require schools to calculate grades in a particular way, but they could give the necessary key information about how the standardisation model would work so that schools could emulate the calculations using information available to schools i.e. the DfE subject Transition Matrices, the KS2 prior attainment and the VA from "previous years".  The expectation was that that if schools submitted realistic estimates, then the fine detail of the grading to individual students could be used to benefit those students whilst keeping the overall figures non-inflationary.  The expectation was also that those schools submitting generous estimates would be standardised back without disadvantaging those schools which had submitted realistic grades, but that process would necessarily lose any detail. Thus individual students in those schools which had submitted realistic grades would benefit, and therefore there was a tangible incentive to submit realistic grades.

 

Both Objectives and approaches were being disseminated by exam boards and other organisations

During this period, the exam boards and other organisations were issuing material promoting both Objectives: 

Objective 1: "individual grades", or even talking about "Grade Characteristics" as if GCSE grades were criterion-related (example from Pearson)

On the right is the electronic signature from AQA as an example of this "We'll be making sure students still get the grades they deserve this summer"

 

Objective 2: ("grade distribution") Equally, they promoted the ASCL approach - see example from OCR

 

This was unfortunate, as it was important to promote "constrained CAGs" instead, as otherwise there would be an outcry in the summer when the exam board grades were below the CAGs submitted according to Objective 1.

 

The message was generally recognised, but action tended to be to add information about the Objective 2 approach rather than stepping back from Objective 1, because there was too much organisational momentum towards the dissemination of material focussing on student level grades rather than the grade distribution.

 

 

 

ASCL publish step-by-step guidance to calculate grade distributions in line with Ofqual "previous years"  19 May 2020

Centre-assessed grades 
Ofqual’s blog on making grades as fair as they can be published on 15 May, gives more insight into the model which will be used to standardise centre-assessed grades (CAG). The key pieces of information are: 

These principles are consistent with the guidance  ASCL released on the CAG process which emphasises the importance of using evidence to construct a mark in order to create the rank order and understand where candidates are clustered. 
 
Many schools are using the ASCL data toolkit to understand the performance of individual subjects. This is hosted at ascl.smidreport.com Colleagues at SMID have kindly agreed to make this available to schools free of charge during the CAG process. They have adapted the toolkit to allow schools to model their results so that they are in line with Ofqual’s parameters whilst retaining flexibility to match their students. They have also streamlined the process to allow more schools to use the system. 
 
Supporting ASCL Technical Guidance is available here. The deadline for submitting grades to the exam boards is 12 June.   

 

Schools submit CAGs (12th June 2020)

CAGs had to be submitted by schools to the exam boards by June 12th, although in the few cases where a centre had a subject where some candidates were entered with one board and some with another, the deadline was 19th June to allow for collation between the boards.  This meant that the actual complete data needed to be collated by the exam boards and then sent off to Ofqual in the last week of June.  Only then would Ofqual have a chance to look at the actual CAGs, which meant that there was a very tight window before the full details of the modelling would need to be signed off in order to give the exam boards time to do all the coding and checking to be able to do the awarding.   In the meantime though, once the deadline of June 12th was reached, FFT were able to publish ….

 

June - FFT Datalab reveal extent of over-estimation (12th June)

https://ffteducationdatalab.org.uk/2020/06/gcse-results-2020-a-look-at-the-grades-proposed-by-schools  

Looking at data from 2,000 schools. " the average grade proposed for 2020 [CAG] is higher than the average grade awarded last year [2019]. In most subjects, the difference is between 0.3 and 0.6 grades". The % grade 4 or above would increase from 71% to 81% in English language.

 

These findings were widely covered in the media e.g. https://inews.co.uk/news/education/gcse-a-level-exams-2020-millions-proposed-grades-cut-generous-predictions-england-450236

https://schoolsweek.co.uk/schools-score-pupils-nearly-a-full-grade-higher-in-some-subjects-study-suggests/

 

This means that the full extent and seriousness of the situation was widely known in the middle of June. 

As Schools Week put it in their report:

5. The exams regulator now has a ‘hugely complex’ task

So what happens now?

If these grades are similar to those submitted, Datalab said it’s likely that Ofqual and the exam boards will have to apply statistical moderation to bring them down.

But they add this will be a “hugely complex task, the likes of which have never been done before”. “Without any objective evidence on the reliability of grading at each school, the most difficult part will be finding a way of doing this fairly for pupils in schools which submitted lower results, when some other schools will have submitted somewhat higher results,” the report added.

 

This information confirmed the warnings given in May about what would happen with the perpetuation of Objective 1 ("individual grade") approaches.

 

The graphs also give an indication as to the proportion of schools submitting realistic grades in line with Objective 2  (1903 mainstream secondary schools included out of 3,100 overall)

 

change in percentage points

no. schools

 

%

cumul %

-17.5 to -12.5

3

0.2%

0.2%

-12.5 to -7.5

16

0.8%

1.0%

-7.5 to -2.5

106

5.6%

6.6%

-2.5 to 2.5

527

27.7%

34.3%

2.5 to 7.5

642

33.7%

68.0%

7.5 to 12.5

377

19.8%

87.8%

12.5 to 17.5

153

8.0%

95.8%

17.5 to 22.5

46

2.4%

98.3%

22.5 to 27.5

21

1.1%

99.4%

27.5 and above

12

0.6%

100.0%

 

1903

100%

 

 

Looking at the cumulative percentage figures, you can see that one-third of schools submitted figures which were close to or below last year's figures.  Typically, as you can see from Ofqual Analytics (where the columns are 0 to 2.5, etc), one-third of schools are -2.5 to 2.5; the problem arose from the enormous skew upwards

 

 

June - Ofqual develop nuanced model

Chapter 6 in the Ofqual Interim Technical Report  (published 13th Aug) https://www.gov.uk/government/publications/awarding-gcse-as-a-levels-in-summer-2020-interim-report

describes the various models which were developed in May and June once agreement had been (correctly) reached to use "meso-standardisation" where centre-level statistical estimates are used to standardise each centre.  "Macro-standardisation" (operating at high level) would have seen centres submitted realistic grades penalised along with those submitting generous ones in order to meet comparable outcomes, and "micro-standardisation" estimates are formed based on the characteristics of individual students would not have been accurate enough given the weak correlations in many cases, or indeed the absence of prior attainment information. These two were correctly rejected.

 

Extensive testing of the various models took place (described in Chapter 7), and "Direct Centre-level Performance (DCP)" was selected as being the most accurate and resilient.  It is also the most similar in principle to the transition matrices approach and therefore most likely to be in line with school estimates.

 

The technical team at Ofqual and the exam boards took this over-estimation into account for their modelling - see Annex M in the Report

"In the absence of authentic CAG data at the time of development, and to understand the impact of CAGs with different profiles, it was necessary to simulate these data. For the purposes of simulation here, a value of [increase of] 0.3 [grades] have been applied to all centres." (p. 296) [using the figure from FFT Datalab above]

 

In order to give the fairest model whereby the nuances in the grading from schools submitting realistic grades, they developed a nuanced model to use CAGs where possible within the constraint that key measures e.g. % grade 4+ remained fixed - see Annex M.

 

end of June - when actual CAGs arrive at Ofqual, the nuanced model has to be abandoned

When the actual CAGs arrived at the end of June, they were fed into the nuanced model, but because of the degree of over-estimation, the grades at the Objective 2 schools would actually have been disadvantaged, so the nuanced model had to be abandoned.  Because the exam boards needed to have final details of the model in early July, there was not time to try any further refinements. 

"However, following completion of the testing described here, provisional CAG data became available to better understand the impact of applying the approach using real data" p. 308

"Importantly, .. it should be noted that this effect [inadvertent severity].would only impact on those .. whose CAGs were nearest to the statistical prediction [i.e. Objective 2 schools]. Those with far more generous CAGs [i.e. Objective 1 schools] would not be disadvantaged" p. 309

 

Time to look at actual CAGs when confirming model very limited

As referred to above there was actually a very tight window between Ofqual receiving the actual CAGs (last weeks in June) and needing to sign off the standardisation process. The "Requirements " were issued to the exam boards and published (with the exception of Annex E - the precise details of the standardisation process) on 7th July

https://www.gov.uk/government/publications/requirements-for-the-calculation-of-results-in-summer-2020

 

Ofqual Summer Symposium

https://www.gov.uk/government/publications/awarding-qualifications-in-summer-2020#summer-symposium

 

This was postponed from 2nd July to 21st July.  It was an important moment and opportunity for Ofqual to quash some of the rumours and mis-reporting, and to give the facts about the model and likely outcomes relative to 2019, as well as the difficulties arising from the wide range of over-estimation and likely consequences.

 

Outline of model and process

The animation in slide 18 gave a good picture of the approach from a grade distribution perspective, and then using the rankings (and NOT starting from the CAGs) to generate grades which were then tweaked from CAGs to get the final grade.  This is in essence the ASCL approach.

J:\My Documents\djbwork\exam20\national\Ofqual\2020_Summer_symposium_slides_amended_210720 - slide 18.gif

It was ironical that in the bitter aftermath some if Ofqual were still attempting to justify the Objective 1 approach, and indeed attack schools which had followed Objective 2, and yet that ASCL approach was embodied in Ofqual's own slide.  The accompanying video and animation, probably produced a couple of weeks before, still focussed on starting with the grades, so you can see the variation in understanding within Ofqual as an organisation.  This was recognised by Roger Taylor in his evidence to the Select Cttee (referenced below in more detail) " What we now realise is ….".  It was deeply disappointing that the organisation did not then put that realisation into practice and act accordingly on behalf of the students irrespective of which approach their centre had taken.

 

 

Comparison between 2019 grades and submitted CAGs

Slide 12 gave figures comparing the cumulative percentages of candidates at each grade for 2019 and for the submitted CAGs.  This method of presentation, although standard for exam boards, led to some confusion amongst people, and it is also very hard to see the impact at each grade.  I used the JCQ published data to fill in the gaps in the GCSE 2019 figures, and produced two graphics one for AL and one for GCSE

 

J:\My Documents\djbwork\exam20\national\ASCL\slide 12 AL graphic.gifJ:\My Documents\djbwork\exam20\national\ASCL\slide 12 GCSE graphic.gifIn both cases these are cumulative percentage graphs, but the percentage figures for each grade are calculated and presented.  So for example, 7.8% at AL were awarded an A* in 2019, whereas there were 13.8% A* CAGs.  The centre column then shows the difference between the wto using hatched shading, where the pair of colours come from the colours of the grades involved.  So 7.8% is the percentage of candidates getting A* in both 2019 and CAG, but 6% of candidates were also given a CAG of A* and yet would end up with A.  That in turn means that 17.7% - 6% = 11% of candidates would have gained an A in 2019 and as a CAG in 2020.  This proportional over-estimation is starkest at grade B, where of the 27.2% given B as CAG. only 13.8% (about half) should have received one to be comparable with 2019.

 

Adding up the overlap grades, it's only just over half of students who "should" get the CAG they were awarded if comparable with 2019.  This highlights the degree of over-estimation and the resulting challenges.  (This interpretation is accurate, but the very small percentage of students who went in the opposite direction or where there were two grades of change will cause the precise figures to be very slightly different).

 

There is a similar story at GCSE, where it is even starker at grade 4.  Of the 16% awarded a CAG of grade 4, only 6.4% "should" get it, i.e. 40%, under half.

 

slide 13

The vast majority of centres have submitted optimistic Centre Assessment Grades; they will include:

Impossible without exams to distinguish between these three groups of centres.

 

This slide accurately summarises the different categories of centres, and the impossibility of distinguishing between them on the information available.

 

Equalities analysis

Slides 21-26 show clearly that "differences in outcomes for students with and without particular protected characteristics and of differing SES (socio-economic status) are similar to those seen in previous years".  This was the intention from the outset, that the differences should remain similar to those in previous years.  It is unfortunate that the public comparison has tended to be between CAGs and 2019, where the reality is that the CAGs had much variation in them, potentially systematic, so it is not surprising if there were differences.

 

Dealing with small numbers of candidates entered in a subject at a centres

This was a topic which was given much thought, but in the end the right decision was made for the sake of the candidates in each situation.  It too has led to much confusion and misreporting.  Note that this is NOT "small centres".  It covers the very specific situation where for a particular subject at a centre there is a "small" number of candidates, so the centre itself could have quite a large number of candidates but in a few subjects (often Music, German, Latin, etc) have just a few candidates (see FFT analysis FFT - AL results 2020: Why independent schools have done well out of this year’s awarding process

https://ffteducationdatalab.org.uk/2020/08/a-level-results-2020-why-independent-schools-have-done-well-out-of-this-years-awarding-process/

 

The methodology and testing and options are described in the Interim Technical Report in 8.4 Centres with a small entry in a subject.  The approach of using the CAGs for very small entries (<5), and then a tapered (5 < 15) was the only fair one left, and the numbers used seemed not unreasonable in practice.  People often are unaware that there is not a strong correlation between prior attainment and outcome either at GCSE or AL with substantial variation - See Table at the end of Annex 5 on "Comparing the awarded grades in June 2020 with other calculations"

 

U grades

It’s incredibly difficult to estimate whether a pupil’s most likely grade would be a U. Candidates getting U grades often do so for reasons other than teaching, ability etc; quite often those grades are a function of the lives they are living, and often it's randomly on the day whether they turn up and actually sit the exam. In this sense I believe the CAG process is fairer to those students. Ofqual should have allowed an increase in the pass rate at grade 1 or higher and reduce the number of U grades.

 

"Top grades at AL"

In retrospect, it would also have been worth having a specific focus on "top grades" at AL especially where they were critical for admission to universities who had made very demanding offers in the expectation that a certain percentage would fail to get the grades.  For example, in 2019 Cambridge  https://www.undergraduate.study.cam.ac.uk/apply/statistics

made a total of 3,560 "home" offers, of which 2,672 (75%) met the offer.  Given that these are likely to be statistically exceptional candidates, especially from smaller state schools, it is unreasonable to expect the model to be able to assign the grade accurately.

 

 

Overarching frustration

One of the deep frustrations around the whole process was that potentially it could so easily have worked.  Without the perpetuated confusion over Objective 1 and Objective 2, if the majority rather than minority of schools had set constrained CAGs (i.e. two-thirds rather than one-third), then for the majority of students their awarded grades would have been close to their CAG.  And the nuanced grading around clustering would have removed some of the anomalies, as the nuanced grading would have carried through into the awarded grades because of the overall restraint.

 

 

Aftermath following decision to use CAGs (late August)

The sequence of events from the evening of Tuesday 11th August onwards has been well-documented, and so I am not covering it in this submission.  I will just pick out some of the key issues during that period relating to the more technical aspects already highlighted in this submission.

 

Misunderstanding re AL VA grades compared with outcomes

As you can see in can see in Annex 1 - FFT Analysis, ALL types of centres on average gained higher grades than in 2019, because of the agreed leniency in using CAGs for very small entries and other "benefit of doubt to learner" decisions led to an overall rise.  Because 6th Form Colleges tend to be larger, they are less likely to benefit than centres with smaller numbers.  But they were still higher than 2019.

 

There was a very challenging issue in using GCSE prior attainment information for AL in June 2020 which was not present for GCSE using KS2.  This arose from the change in the point scoring system in different subjects from 2017 to 2019 alongside the transition from A*-G to 9-1.  The DfE published AL (Level 3) VA transition matrices for AL in 2019 using GCSE in 2017, when just English Language, English Literature and Maths had moved to the 9-1 scale.  The other subjects still on A*-G were scored on a 8.5 - 1 scale with a non-linear link.  However, in June 2018 nearly all subjects had transitioned to 9-1, leading to a rise in GCSE points, but in a way for there which there was not an overall conversion formula.  The only way to be able to do an adjustment would be at an individual student level with their individual subject results.

 

Different MIS and analysis providers coped with this is different ways.

 

Significantly, the Sixth Form Colleges Association on 16th August published a paper "SFCA-briefing-on-three-years-analysis-final-version1.pdf" which seemed to imply that for Sixth Form Colleges their VA had dropped in virtually all subjects.  It is hard to reconcile this with the outcome figures (e.g. in the FFT analysis) which show that there was actually an increase in outcomes for Sixth Form Colleges, albeit smaller than other smaller centres.  Perhaps an adjustment was not made for the change in GCSE scoring from 2017 to 2018?

 

 

Schools Week and other articles

At the final meeting of the Advisory Group, it had been made clear that advisory group members were able to speak freely about the approach being taken to aid understanding.  I felt that in the light of what had transpired with the move to CAGs, and the consequent relative disadvantage for students in schools which had submitted realistic grades, that it was necessary to try and bring some clarity to the situation, and to push for steps to undo any actual disadvantage to candidates, whilst recognising that politically it was impossible for a wider re-submission to be allowed.  I have also sought to be meticulous in not referring to actual discussions in the Group, and only citing public documents esp the Interim Technical Report.

 

Extracts from the articles are in Annex 2 - Articles

Friday 21st August - article in Daily Telegraph -

Friday 21st August - Schools Week - Schools that followed advice to deflate grades must now be given appeal route - Opinion Piece (David Blow)

The decision to use centre assessed grades means many schools are now suddenly in an unfair position and coming under attack from parents and students. Why?

 

Because they were the schools which heeded the call from education secretary Gavin Williamson to ensure that distribution of grades followed a similar pattern to that in other years, i.e. following the calculated grades model. These schools are now in an unfair position in comparison with others, which instead focussed on individual pupil grades – leading to those youngsters to get higher grades.

 

There were two incompatible objectives set from the beginning….

 

Friday 21st August - Schools Week - Government facing exams challenge over schools ‘advised’ to deflate grades

At the same time, Blow – who sat on the Ofqual’s external advisory group on exam grading – has said the government must now allow an appeals route for such schools. He said those schools are now “in an unfair position in comparison with others, which instead focussed on individual pupil grades – leading to those youngsters to get higher grades. “In the interests of fairness to the students, centres whose CAGs were in line with the calculated grades should be allowed to re-submit their CAGs without having to include unrepresentative previous years’ performance in the calculation.” He said it would be “straightforward” for the schools to evidence this by “re-running earlier calculations”

 

Wednesday 26th August - Schools Week - ASCL writes to Ofqual over deflated grades’ fears

 

ASCL modest proposal, eventually rejected by Ofqual (late Aug \ Sept)

"The decision to use CAGs meant that Objective 2 schools were suddenly in an unfair position relative to Objective 1 schools, especially where they deflated their CAGs to take account of the previous years' performance so as to benefit from the more nuanced grading.

 

In the interests of fairness to the students, centres whose CAGs were in line with the calculated grades should be allowed to re-submit their CAGs without having to include unrepresentative previous years' performance in the calculation.

 

This will not affect large numbers, and will have a very small impact on overall national grades, especially when compared with the big jumps seen in the national figures for June 2020 using CAGs. "

 

House of Commons Education Select Committee hearing (2nd Sept)

https://committees.parliament.uk/event/1755/formal-meeting-oral-evidence-session/

https://committees.parliament.uk/oralevidence/790/pdf/

 

Senior representatives from Ofqual were questioned by members of the Committee.  Some of the key exchanges are in Annex 3  These include the implicit public recognition by Ofqual that there were two incompatible objectives described by using a variant of the example I gave to show easily the difference between the individual grade focus and that on grade distribution.  The differences and unfairness were raised by Apsana Begum with an answer by Julie Swan.  Roger Taylor, when challenged by the Chair, said that there was a fast-track process for getting changes, although this was rejected by exam boards when schools tried to use it.

 

 


Annex 1 - FFT Datalab analysis of data

FFT Datalab provide an invaluable source of data and distribute freely into the public domain.  As researchers, they have access to the National Pupil Database, and so can do detailed analysis at an anonymised pupil level.  Through FFT, there is a data service, used by around two-thirds of the secondary schools, which provided vital information as the process eveolved.

 

19th & 20th March - Awarding grades in 2020

https://ffteducationdatalab.org.uk/2020/03/awarding-grades-in-2020/

 

But can there be any quality control in the system?

Ofqual already has the answer- the much unloved approach of comparable outcomes. This assumes that if the prior attainment (at Key Stage 2 for GCSE, GCSE for A-level) of pupils entering a subject doesn’t change, then the grades awarded won’t change either.

Ofqual could produce an indicative range of grades for each school to award in each subject. This would be based on three things, taking GCSE as an example:

  1. Last year’s Key Stage 2 to GCSE transition matrices
  2. The list of pupils entered in a subject, together with their Key Stage 2 results
  3. Historic Key Stage 2 to GCSE value added data

 

None of this is perfect. No system of awarding is perfect, even exams can result in some pupils being awarded the “wrong” grade.

There will be winners and losers. This method may not work well for schools with small cohorts. Some groups of pupils may be disadvantaged. Schools entering a subject for the first time will have indicative grades based on the national transition rates. Departments that have genuinely been improving in 2020 may not have that improvement recognized.

But this could be one of the least unfair options given where we are, particularly (as some have suggested) if pupils have the option to resit in November (assuming life is returning to normality by then).

 

12th June - 3 blogs on the grades proposed by schools 12 Jun 2020

 

1. GCSE results 2020: A look at the grades proposed by schools

https://ffteducationdatalab.org.uk/2020/06/gcse-results-2020-a-look-at-the-grades-proposed-by-schools/

 

Between 28 April and 1 June, FFT ran a statistical moderation service which allowed schools to submit preliminary centre assessment grades they were proposing for their pupils. In return they received reports which compared the spread of grades in each subject to historical attainment figures and progress data.

In this blogpost, we’ll take a look at some of the main findings from the service, based on the data of more than 1,900 schools – over half of all state secondaries in England – which had submitted results when the service ended on 1 June.

 

Well, at the top level, this year’s teacher-assessed grades are higher than those awarded in 2019 exams. In every subject we’ve looked at, the average grade proposed for 2020 is higher than the average grade awarded last year. In most subjects, the difference is between 0.3 and 0.6 grades.

Starting with the subjects that almost all pupils sit, the average of all the teacher-assessed grades in English language comes out as 5.1 – that is, a little above a grade 5. That compares to an average grade of 4.7 last year. For English literature, a slightly smaller increase in average grade is seen, from 4.8 last year to 5.0 this year, while in maths the average proposed grade for 2020 is 5.0, compared to 4.7 for 2019.

Looked at another way, were these proposed grades to be confirmed, the share of pupils awarded a grade 4 or above would increase from 71.4% to 80.8% in English language, from 73.7% to 79.0% in English literature, and from 72.5% to 77.6% in maths.

 

 

2. GCSE results 2020: The relative severity of proposed grades

https://ffteducationdatalab.org.uk/2020/06/gcse-results-2020-the-relative-severity-of-proposed-grades/

 

 

3. GCSE results 2020: Proposed grades and variability in results

https://ffteducationdatalab.org.uk/2020/06/gcse-results-2020-proposed-grades-and-centre-variability/

 

We might wonder what the aggregate effect on schools’ results would be if the proposed grades we looked at were awarded this summer. The following charts shows how much each school’s results would change by between 2019 and 2020 if that were the case on a subject-by-subject basis.

 

 

But if these grades are anything like those submitted to the exam boards, then Ofqual will have a tricky task bringing this year’s results in line with previous years’. Without any objective evidence on the reliability of grading at each school, the most difficult part will be finding a way of doing this fairly for pupils in schools which submitted lower results, when some other schools will have submitted somewhat higher results.

 

 

13th August - A level results 2020 - why independent schools have done well out of this year's awarding process/

https://ffteducationdatalab.org.uk/2020/08/a-level-results-2020-why-independent-schools-have-done-well-out-of-this-years-awarding-process/

 

How outcomes have changed at different centre types.

The charts below show changes in attainment from 2019 to 2020 broken down in this way, at grades A*-A, and A*-C, respectively.