Education for 11-16 Year Olds Committee
Corrected oral evidence: Education for 11-16 Year Olds
Thursday 30 March 2023
11.05 am
Members present: Lord Johnson of Marylebone (The Chair); Lord Aberdare; Lord Baker of Dorking; Baroness Blower; Baroness Evans of Bowes Park; Baroness Garden of Frognal; Lord Knight of Weymouth; Lord Mair; Lord Storey; Lord Watson of Invergowrie.
Evidence Session No. 2 Heard in Public Questions 14 - 26
Witnesses
I: Dr Michelle Meadows, Associate Professor of Educational Assessment, Department for Education, University of Oxford; Sharon Hague, Senior Vice President, Pearson School Qualifications; Tim Oates CBE, Group Director of Assessment Research and Development, Cambridge University Press and Assessment; Gavin Busuttil-Reynaud, Director of Operations, AlphaPlus.
29
Dr Michelle Meadows, Sharon Hague, Tim Oates and Gavin Busuttil-Reynaud.
Q14 The Chair: Good morning. Welcome to this evidence session of the Committee on Education for 11-16 Year Olds. I welcome our four panellists and thank them very much for joining us today.
Before we get going, I will say that a transcript of the meeting will be taken and published on the committee’s website. You will have an opportunity to make corrections to that transcript should that be necessary.
I will start by quickly asking you each to introduce yourselves, not to make an opening statement as such but just to describe your role at your organisation. We will start with you, please, Sharon.
Sharon Hague: Good morning, everyone. I am managing director of school qualifications and assessment at Pearson. Pearson is the world’s largest education business, supporting over 160 million learners of all ages across 200 different countries.
I am responsible for the development and delivery of assessments, qualifications and teaching resources for schools globally. In the UK, this includes our suite of GCSEs and international GCSEs, which are delivered to over 1 million learners per year. Outside of the UK, I am responsible for delivering high-stakes assessments in the USA, Australia, Egypt and a number of other countries.
Gavin Busuttil-Reynaud: Good morning. I am from AlphaPlus. We are part of AQA Education. I am the director of operations and I am also project director for national assessments in Wales.
AlphaPlus is effectively a projects-based business that works across the assessment landscape. We work in schools and with professional bodies and in the vocational sector. We work in the UK and we work internationally, but we have two major statutory assessment programmes that are for online digital adaptive assessments covering years 2 through 9 in schools in Wales and Scotland, which are pertinent to the discussions today.
Tim Oates: Good morning. I am head of research at Cambridge University Press and Assessment. It is a non-teaching department of Cambridge University and contains three awarding bodies, one of which is domestic, OCR. Another is international. We work in about 140 countries. That gives us the privilege of working across different education systems and we will perhaps examine some international comparisons of qualifications this morning.
Personally, I have been involved in reform of qualifications assessment and the curriculum since the 1990s. I have been through successive reviews and changes in curriculum and assessment. In 2010 I was privileged to lead the expert panel advising the 2010 Coalition Government on national assessment and the national curriculum.
Dr Michelle Meadows: Good morning, everybody. I am associate professor of educational assessment at Oxford University. My research focuses on qualification design and the washback on the education system of any particular design.
Q15 The Chair: Superb. Thank you very much. I will kick things off by asking about GCSEs. Are they fit for purpose, particularly now when pupils must stay in education or training until the age of 18? I will ask each of you, please, to give an answer to that, starting with Sharon.
Sharon Hague: Pearson recently conducted a future of qualifications and assessment inquiry where we surveyed over 6,000 students, teachers and parents. We involved politicians and a whole range of stakeholders including employers.
We found from that inquiry, in answer to your question, that broadly, yes, GCSEs are fit for purpose but there is a strong desire for evolution within the current system rather than revolution. Some 75% of parents and students valued qualifications including GCSEs because they provide an externally validated, independent assessment of students’ performance. With over 50% of students moving institution at the age of 16, these qualifications are important in supporting students and families to make the right choices for their progression pathway but also for institutions in their selection processes.
That is not to say there is not room for improvement. There was a strong appetite to consider, for example, different forms of assessment. The last set of reforms very much favoured terminal, end-of-course, written examinations. Among teachers, over 70% responded and said that they would like to see some forms of continual assessment where appropriate and that it could be effective.
There was a desire to explore the potential for using technology and also a desire for more choice in the range of subjects that are on offer. In particular, six in 10 teachers would like to be able to offer more vocational subjects. There were also concerns expressed about continual resitting post-16 and a desire to look at alternative, valid qualifications for the assessment of English and maths post-16.
Broadly, the answer is yes, but with some real opportunities to evolve the current system to continue to make it even more effective.
Gavin Busuttil-Reynaud: AQA is the largest provider of GCSEs and so I consulted with my wider team within AQA on this and they came up with the same clear point and the statistics. Only 38% of learners stay in the same institution at 16 and everyone else moves to a different environment. Clearly, the need for some form of evidence of achievement and performance is absolutely necessary at 16 and GCSEs fulfil this purpose extremely well. Therefore, from that point of view, it is certainly fit for that purpose.
The question that came to my mind when I read this is which purpose you have in mind. One of the immense pressures that is placed upon GCSEs is that they carry many purposes, from the purpose an individual learner uses it for in their progression, as well as the accountability and performance measures of the institutions they are in, and then some additional gatekeeping or hurdling such as we see in English and maths around the grade 4 and grade 5 boundary as a minimum standard threshold that then washes through into further education policy.
I am particularly interested in that because, for example, in the formative assessment work that we do in Wales, we test learners on exactly the same scale on an adaptive bank in every year of their education up until the start of their GCSEs and so we know exactly their learning trajectory and overall performance. We have a multitude of measurement points. That is pretty accurate and it would be relatively hard to cheat. It would be easy, for example, to extend that into a straightforward assessment of their capability and whether they meet that grade 4/5 threshold boundary and remove the backwash pressure from GCSEs.
It is quite a subtle question. You have to dig into it to see what is important. I spoke to the chair of a MAT who said, “Yes, GCSEs are fit for purpose. Please do not make kneejerk changes to the system because the system could not cope with that kind of change at this point.”
Tim Oates: The questions that have been framed by the committee are well aimed. It is good to be able to provide targeted responses to them.
On reflection, if I were to disagree with any of those statements that have been made so far or are coming, I will, but I agree with many of the points that have been made and I will not repeat the facts and figures about the system that have been given so far. They are all correct. I will try to add to what has been said. I will answer the question by also giving a different question: can I imagine a world without GCSEs? I will come on to that in a moment.
I want to build on what has been said by asking how we should judge this question of whether GCSEs are fit for purpose. What kind of criteria should we use to reach some kind of conclusion about fitness for purpose? All qualifications need to meet criteria on validity, reliability, utility and impact. That is the area you are in when you ask this fitness for purpose question.
On validity, let us look at GCSEs. They match the national curriculum. They deliver the national curriculum in key stage 4. They are the main instrument. The national curriculum is general in key stage 4 and key stage 3, compared with the detail in the primary national curriculum. It was a deliberate decision in 2010 to use GCSEs as the major structuring instrument for the curriculum in key stages 3 and 4. I am simply stating what the decision was. GCSEs are the guiding instrument for curriculum content.
They include skills, contrary to what we hear from many people. You have to analyse stuff, you have to write stuff, you have to apply things and you have to undertake high-level problem solving. That is what you need to do to do well on a GCSE. They require diligence, concentration and reading for understanding. If you do not go through those processes, you will not do well.
They possess high predictive validity. They predict performance at successive stages in education. They are reliable inasmuch as they are some of the most researched qualifications in the world and they have massive back-up of extraordinary technical expertise from the regulator, where Michelle worked previously, and in the boards. There are issues around grade classification and that has been pointed out in the research literature. We could certainly look at that. Again, reform and technical improvement is necessary.
On utility, they are useful in progression decisions. They are low cost, interestingly enough. It is about £45 for a single GCSE and about £120 for an A-level. I looked at the kinds of examinations and tests we had 100 years ago. They benchmarked for inflation at about £400 to £500 each. GCSEs are quite cheap compared with the technical quality that is delivered. They carry a broad range of purposes and they provide capstone assessments for things when pupils are not studying after 16.
On impact, they have positive washback in some regards: the domain to be learned, the depth of treatment. They have negative washback in instrumental and narrow learning in how they are delivered in some schools.
Briefly, can I imagine a world without GCSEs? This is the whole abolition argument that appears in the press frequently. I can imagine a world without GCSEs, but the whole landscape would look very different without them. That is important. Perhaps I do not need to think about a world without GCSEs in the abstract because I can look at other nations that do not have them and to our own past when we did not have them.
There are a couple of things here. When Keith Joseph originally conceptualised GCSEs in 1985, it was implemented in different ways. We had 100% coursework GCSEs, 100% exams and everything in between. We later had modular. We have a lot of evidence on the deficits and benefits of these different models and we can draw on that experience in reflecting on it. I will not go into that in detail now.
Looking around the world, we have done this because we want to contradict this statement—Lord Watson and I discussed this recently—that only we have high-stakes exams at 16. We just asked what the best performing systems are in the world. There are 21 of them. All of them have high-stakes assessment at 16 and 14 of 21 have exams at 16. In refining what we do, certainly some of those jurisdictions such as Estonia have exams in fewer subjects and we will come on to that in a moment.
I will finish in just a second. There are two points I want to make on whether I can imagine a system without GCSEs and, if we do not have them, whether the whole landscape would look different. We need to understand that education is a complex, interrelated system. Content, pedagogy, assessment and accountability all fit together and GCSEs are a part of that complex ecosystem. We have GCSEs. Therefore, we have only three to four A-levels. Therefore, we have three-year degrees. That relationship and that ecosystem is very important. Therefore, if we undertake reform, we need to think about the way these things are impacted.
I will finish and say that it is important to have continuous improvement, technical review and improvement of GCSEs. In the course of the discussion this morning, I am sure we will go on to what the detail of those refinements could be.
Dr Michelle Meadows: You have asked quite a specific question, which is about fitness for purpose. This incarnation of GCSEs was designed with the specific purposes that the Government laid down in mind and so they include, for example, the reliable indication of attainment—not the valid indication, but the reliable. There was an emphasis there. That word was quite specifically chosen. There was also, as we have already discussed, the use of GCSE outcomes in school performance measures.
Those purposes that were set down lead to particular design. They lead to examination rather than teacher assessment, for example. They lead to linearity rather than modularity because modular examinations are much more difficult to ensure comparable routes through.
Therefore, on fitness for purpose and whether they are fit for these purposes, absolutely. They were designed very much with them in mind. Could we imagine a different set of purposes and a different emphasis? Yes, and that would lead to a different kind of design.
Q16 The Chair: They were interesting answers. Thank you all for those. I will come in a minute to the members of the committee who have indicated they want to come in, Lord Knight and Lord Baker.
Perhaps you could give us a bit more of a feel for some of the refinements that you imagine might be possible. Also, Dr Meadows, in response to what you were saying about imagining different purposes from those the Government have prescribed, what could those usefully be? Perhaps we will start with you to continue that last thread of thought that you closed with.
Dr Michelle Meadows: Use of outcomes in performance tables means that there has to be a high emphasis on examination, but let us imagine a world where perhaps we had a different accountability framework, for example, where outcomes played a much smaller role, a wider quality framework, or perhaps we use other indicators of attainment. Then you can imagine that we would be open to more innovative and different forms of assessment. We might have an emphasis on authenticity of assessment, more related perhaps to future experiences. We might have more emphasis on validity and therefore we might be interested in teacher assessment and so on. It would open up new opportunities.
On modularity, for example, which Tim alluded to, if you speak to teachers about modularity, often they will tell you that it works well for certain kinds of students and that perhaps it is more motivating to have feedback along the way. Actually, the research evidence using hard data rather than qualitative opinion does not seem to support the inference that some groups would do better.
The point is that, with different purposes, we could revisit some of these decisions. It is all about what you want to prioritise. If you want to prioritise high reliability and high comparability between different routes through to a qualification, it leads you in a particular way. A different set of priorities around motivation and engagement would lead us elsewhere.
The Chair: Great. Thank you. Could I ask the other three panellists to comment on whether we have the right purposes and, if not, what they should be?
Tim Oates: It is good that we have opened up the discussion about purpose. Everything that Michelle has outlined points us again to this issue about a complex set of relationships between different aspects of the systems. The assessments that we decide on as a nation have an enormous washback effect into pedagogy and the behaviour of institutions. The accountability arrangements frame this. Those have been elaborated. They were elaborated significantly in 2010. Previously there were specific and narrow performance targets and those had quite adverse consequences in the behaviour of schools. We have to see it as a complex ecosystem in the way that both Michelle and I have emphasised.
On improvement, Chair, we will come to the issue of digital assessment and so on, but let us stay with purposes. I want to come on to Sweden in just a moment. It is an important case study.
If we think about the purposes, the utility and impact of qualifications, what GCSEs currently support in purposes and the effects they have in schools as an instrument of public policy, they are a clear specification of programme content. They often have high-quality linked learning materials. They provide a statement of standards. They give an indication of the depth of treatment of content. They are a support to progression and the decisions of selectors. They provide a motivation for learners. We know that from the research. They provide quality assurance of education for the state, for parents and for others.
These are important functions and they are important purposes. At the moment, GCSEs are pivotal in providing those functions to the state, to individuals and to parents. If you change them or remove GCSEs—and I am open to reform and evidence-driven technical improvement—you need to make sure that those purposes and functions are supported. The risk in some elements of previous reform is that you change something and you suddenly find that one of those things is not being delivered.
In Sweden that was absolutely the case. They put all their eggs in the basket of competition between schools delivering every purpose for education and the improvement of educational quality. Over a 10-year period—and it is there in the official reports from institutions like the Economic Institute for Sweden and in the research reports from others—they relied on teacher assessment. They had a significant increase in the grade outcomes. They went up. Everything was looking wonderful. There was an almost mirror decline in the international surveys of educational attainment. That is a serious problem to get into. They found that they had relied on teacher assessment without a whole series of other safeguards in place. Sweden is now looking at introducing more testing, not removing it.
This idea that you have complex functions and you have to make sure various instruments support those functions carefully and in a measured way is the takeout from this bit of the discussion.
Gavin Busuttil-Reynaud: I find it quite interesting when people make sweeping statements about what GCSEs do or do not do and how they are or are not examined. For example, only 42% of all the GCSEs AQA offer have only written assessment. A whole range of oral examinations, test-based, controlled assessments, multiple-choice tests, coursework, portfolio evidence and so on are used across the range of GCSEs.
Part of the problem is that people forget that GCSEs are compensatory assessments and that the outcome that learners receive at the moment of all of their qualifications is a single grade. It is boiled down to one number and everything is seen as part of that. For example, when many of the skills aspects became covered specifically by non-examined assessment, to remove the incentive to game the system, many elements of non-exam assessment were removed from grading. It was a requirement that it was done but it did not contribute to the grade. People have no visibility that it is being done and they have no idea of the capability of certain skills that a learner might have because it does not appear anywhere in the transcript or outcome that is achieved.
There might be something to examine there about the suite of qualifications that people sit and the transcript they receive. It is highly instructive that when you look at things like high-quality automotive apprenticeships, for example, developed in association with employers, you get a transcript that gives you different scores on different scales for your ability to work in a team, your ability to work safely in an industrial environment, which is a pass/fail threshold, and your academic understanding of the underpinning theories that are applied in the industry you are in. They are all different axes of measurement. You do not get just a single grade as a result of achieving that apprenticeship. A lot of the skills that people talk about would not sit comfortably within a classic grading environment, but that is not to mean they are not taught or delivered in excellently performing schools.
Sharon Hague: As the other members of the panel have described, it is incredibly complex. I have run high-stakes assessments for nearly 25 years and before that I was a teacher for 10 years. If you change one part of the system, it has knock-on effects elsewhere. I have a couple of things for you to consider.
For me, it is quite simple. We want the curriculum and the qualifications to inspire a love of learning in young people. We want to give them the knowledge and the skills that will prepare them to function in society and to add to our economy. In looking at GCSEs as they currently are, we found from our inquiry that there are some opportunities to consider. As Gavin said, there are other assessment models within the GCSEs currently. We can consider whether it is appropriate that we make a more valid assessment of young people’s knowledge and skills by considering other alternative assessment models within the existing GCSEs.
There are opportunities within the existing framework to look at the content of some particular subjects. For example, Pearson is currently working with a wide group of stakeholders and experts to look at the current content within the design and technology qualification at GCSE and to look at whether we can review the content to introduce more elements of sustainability and of design thinking skills within the content of that qualification. Is there the opportunity to bring data science and informatics into GCSE and A-level maths? There are those sorts of opportunities within the existing framework.
We heard from schools and teachers that they would like some more flexibility and choice in the offer that they give their young people at 16. I am sure we will come on to it, but there is also great opportunity in the use of technology and building the digital skills of young people, which we are all aware they will need to be successful in the future.
Q17 Lord Knight of Weymouth: Before I ask my question, this is the first time I will have spoken in public session and so I have to read out my relevant interests. I am chair of the board of trustees at the E-ACT multi-academy trust. I am director and co-owner of Suklaa Ltd, which is an education consultancy, and there is a list of clients in my register of interests here at Parliament. I am a non-executive director of Century-Tech, of Macat International and of Education Ventures Research Ltd. I am a visiting professor at the Institute of Education for University College London. I am currently chairing inquiries into apprenticeships for Engineering UK and into Ofsted for the National Education Union. In May I become chair of the Council of British International Schools.
Michelle, I come to you principally because of your previous role as a deputy at Ofqual, the regulator. I am interested in the extent to which, when you are given guidance and instruction by the department, Ofqual has to be mindful of safeguarding against grade inflation. This was prior to Covid, which sent everything a bit wild.
Dr Michelle Meadows: Absolutely, Ofqual’s key remit is to ensure that any changes in grade outcomes are reflected in a change in performance and an underlying change in students’ attainment.
Lord Knight of Weymouth: If grade inflation is a pivotal issue, how much of the way that we grade GCSEs—and A-levels, for that matter—as public examinations is norm-referenced and how much of it is criteria-based referencing?
Dr Michelle Meadows: That is a fantastic question, not least because at the moment, as an academic, I am doing a piece of work for Qualifications Wales on exactly this issue.
Norm referencing is a term that is used to describe the system by some people. It is absolutely not norm-referenced. The terminology we would probably use for how grades work in normal times, not pandemic, is attainment referencing. What is meant by that is a mix of statistical information but, importantly, senior examiners looking at the student work—with script scrutiny. It is that combination of statistics that tell us about the profile of the entry for a qualification and also examiners looking at work.
The way in which Ofqual regulates is that if outcomes change from what is statistically predicted, exam boards need to flag that to the regulator and they need to provide evidence as to why. One reason why might well be that the examiners have seen a change in performance. That happens quite infrequently and so I do not want to be disingenuous about this, but there is that opportunity. The idea that it is simply statistically determined is not true. The time when it might be statistically determined is in the very first years of a new qualification that—
Lord Baker of Dorking: May I suggest answers are a bit shorter? Otherwise, we will never get through.
Lord Knight of Weymouth: Lord Baker, can I just finish my questioning? If I get a 4 this year, say, is it the same standard as if I got a 4 last year?
Dr Michelle Meadows: That is absolutely what the system seeks to achieve. The underlying attainment for a grade 4 is kept consistent over time. The only time when an exception is made is when, for example, there has been upheaval in the system and, therefore, the benefit of the doubt is given to candidates.
Lord Knight of Weymouth: The charge made, which is where I am going to, is that this system is about hurdling and about filtering people for post-16 and for university qualifications and that certain people, inevitably disadvantaged people, are destined to fail because a certain proportion has to fail to prevent grade inflation. How do you respond to that criticism?
Dr Michelle Meadows: I did a specific piece of work to look at how grade boundaries had changed over time. If what you were saying there was correct and the approach to awarding grades was suppressing true improvements in performance, you would expect to see over time, on average, that grade boundaries would increase year on year.
We saw that in the data as soon as a qualification was reformed—in the early years. Then it levelled off and for GCSE the grade boundaries slightly declined, suggesting that candidates are being given a little bit of the benefit of the doubt, if anything. Through data analysis, I suggest that the charge that it is capping outcomes is not supported.
The Chair: Tim Oates wanted to come in.
Tim Oates: I will respect Lord Baker’s frustration and be quick. Also implicit in your question, Jim, is whether inflation is pernicious. The answer is, yes, it is. Therefore, you need mechanisms for recognising whether it is there and you need mechanisms for suppressing it. It is important at A-level because people in different years apply to institutions and are compared one with another and if you have inflation, that is a problem. It is pernicious, for example, in the case of Sweden, where if you have a 1% inflation element in each year, over 10 years you have a significant decline of national educational outcomes.
Lord Knight of Weymouth: What if the quality of teaching and the quality of schooling is improving? Would you not see that?
Tim Oates: That a different question. I defer to Michelle on this. The system has to be capable of responding to an overall improvement in education, just as it needs to be able to detect whether there has been a decline.
The Chair: Jim, are you done?
Lord Knight of Weymouth: I am done. I could go on for ever.
Q18 Lord Baker of Dorking: I think I am right in describing you all as believers in the status quo in both the curriculum and assessment in that—
Tim Oates: No.
Lord Baker of Dorking: Wait just a moment. I will ask you some questions. Are you aware that there have been eight reports over the last year that have all concluded that the curriculum and the assessment system are not fit for purpose? Did you read the first one by Sarah Fletcher of the HMC? Are you aware of her recommendations? She said there is no critical thinking, no creativity and no imagination in GCSEs, which is absolutely right. Did you read the report of the Select Committee on Youth Unemployment of this House? There were 88 recommendations. Do you remember any of those? Do you remember any of them? Silence supreme.
Gavin Busuttil-Reynaud: No, I was waiting for the Chair’s guidance.
Lord Baker of Dorking: There were 88 recommendations that severely criticised the curriculum and said it should be Progress 5 and not Progress 8, the assessment system is not appropriate and there should be a complete change of concentrating apprenticeships on 16 to 24 year-olds. These are all fundamental questions. Did you read the Institute of Government’s report on schools? Do you remember what that said? It said that in fact—
The Chair: Lord Baker, let us give the panel a chance to come in on some of those questions.
Lord Baker of Dorking: All right, fine. I will come back. The last one was Tony Blair’s report. Did you all read that as well? You talked about the purpose of education, Mr Oates. Do you think the purpose of education is to try to find a job for students when they leave school?
The Chair: Who are you directing that to in particular?
Lord Baker of Dorking: Mr Oates.
Tim Oates: I will take that last question first and of course you asked a series of questions about whether we have reviewed the reports and so I will come on to—
Lord Baker of Dorking: I would like that question answered now, please.
Tim Oates: Absolutely. I was just saying I will take them in that order. Education has a variety of purposes. We describe it very much as having a threefold focus. It is important for individuals, important for society and important for the economy.
It is right that we have an education system that has a general route and a vocational route. It is right that the general route feeds people into the economy at the appropriate level and allows them to study things that are good for society and for them as individuals. Likewise, it is important that we have a good, high-quality vocational route that provides the skills, knowledge and understanding that we need for the economy. At the moment we have a well-developed general education route and a far less developed vocational education route compared with our international competitors.
Lord Baker of Dorking: The three of you have not mentioned vocation or skills. You mentioned skills but the rest of you did not at all. If you are interested in youth unemployment, what is the level of youth unemployment at the moment, Mr Oates?
Tim Oates: I do not know a precise figure for youth unemployment but we do—
Lord Baker of Dorking: I am amazed if you are an expert on curriculum assessment that you cannot know what the level of youth unemployment is. I will tell you what it is.
Tim Oates: I did not finish, but never mind.
Lord Baker of Dorking: It is 9%. In the north-east of England, it is 20%. In towns like Stoke-on-Trent and Sandwell, there is 20% youth unemployment. This is the curriculum that you are defending and saying is marvellous, the no-skills curriculum. Let me ask you, if you talk of skills, when industrialists talk to us, they say they want employability skills. What would you describe as employability skills?
Tim Oates: There are a lot of questions in there. I will try to take them in turn.
On the lack of knowledge of the precise level of youth unemployment, I have been particularly concerned with the number of young people who have moved into state support. We have been particularly concerned about the number of young children who have vanished from social statistics altogether. We track those in a great deal of detail because we are concerned about the adverse impact of the pandemic.
I am concerned to improve the vocational route and have compared vocational education and training systems across the world for the last 20 years and have a great deal to contribute on how we can improve our vocational system.
On defending the current national curriculum, the purposes of education at primary and lower secondary level are important. We have improved our international standing for numeracy and literacy since 2010. We buck the international trend with that improvement.
Lord Baker of Dorking: With great respect, Mr Oates, the number of disadvantaged children today is about 300,000. It was the same number in 2010.
The Chair: Sharon wants to come in.
Sharon Hague: I am slightly nervous about coming into this conversation, but I will.
As part of our inquiry, we talked to employers and they told us that only 20% believed students were well prepared with the right skills and knowledge for work, only 30% believed that students were fully prepared in digital literacy and 22% believed that students had problem-solving and critical thinking skills. It starts to home in and be a little bit more specific about the skills that might be missing currently among young people.
I am certainly not defending the status quo. It is a recognition that to completely reform the system puts an incredible amount of pressure on teachers. It causes a great deal of anxiety for students and parents and is incredibly costly in terms of the structural changes. It is something to be thoughtful about. Having been through a couple of major reforms over the course of my career, you cannot overestimate the level of support that teachers, students and parents need for a dramatic change in the system, hence our recommendation to evolve.
There are opportunities to evolve some of the subject content. We heard from teachers and school leaders that they would like more choice, including being able to offer more vocational subjects and qualifications. There are opportunities to introduce new disciplines that reflect more closely what students are required to do. Perhaps we should look at a digital curriculum. Do we understand what digital skills children should have at different stages of their learning journey? We have to be mindful of not adding more and more into the system. Perhaps a way of giving schools some more choice or updating within the current framework is a way we could proceed that creates less risk and turbulence in a system that—
Lord Baker of Dorking: Just before winding up, you mentioned data skills. Are you aware that since 2016 the amount of teaching of computing and data skills in the 11-16 schools has dropped by 43%? You are nodding. You do know it?
Tim Oates: Yes, we do.
Lord Baker of Dorking: That is a disgrace. Your wonderful curriculum does not allow for data skills or vocational skills of any sort in it, quite frankly.
Tim Oates: I would also like to return briefly to this issue of whether we have reviewed all the available reports. I am fortunate. I have a large research group at Cambridge. We have maintained for many years the register of change, which looks at change in the education system in all aspects of policy and qualifications. That is available to the committee. We have maintained that since 2000. With the reports that have been produced over the last three years, members of my team review each and every report and produce syntheses of them. That can, again, be available to the committee.
Lord Baker of Dorking: I have one last question to each, a short question—
The Chair: Hold on, Lord Baker. Let Tim finish on the first one and then Sharon Hague wants to come in.
Tim Oates: I will finish on computing. Computing science was included in the national curriculum. It has a specification. There is a computing science GCSE. We have looked—
Lord Baker of Dorking: Only 10% take GCSE computing science.
Tim Oates: I know the statistics. We presented them and discussed them with the Government only this week. Simon Peyton Jones, formerly of Microsoft, and I have presented a series of strategies to change that situation around. We are aware of the statistics. We have analysed it in detail. We think it needs remedying and we have proposed some remedies.
Q19 Lord Baker of Dorking: I have one simple question. A yes or no from each of you, please. Is it more important to master French irregular verbs or to learn coding?
Dr Michelle Meadows: I will go for coding.
Tim Oates: It is not necessarily a choice in the system. You can do both.
Gavin Busuttil-Reynaud: Either, depending on your interests, I hope.
Sharon Hague: Could I make one comment, if you will allow me, on GCSE computer science? I hear the comments that you make and in fact Pearson is the only awarding organisation to provide an onscreen assessment in computer science where children are coding in the examination and being assessed on the quality of their coding. We are not the most popular. We need to put other things into the system to support the aims that we are trying to achieve, not least the kind of support for schools and training.
In answer to your question, it depends on the individual child and the progression pathway that they intend to pursue, and so having choice in the system and schools being able to work with families to choose what is most appropriate for their young people is important.
Lord Baker of Dorking: Currently, they do not have a choice.
Tim Oates: Yes, there is choice.
Q20 Lord Watson of Invergowrie: I have two questions. One is specific and the other is more general.
Tim, in your statement you asked the question what criteria should be used and you listed four, one of which was reliability. I am concerned about the reliability of GCSE grades. When Dame Glenys Stacey gave evidence to the Education Select Committee, as you will be aware, last year, she said that exam grades are reliable one grade either way. That might not matter if it is between 1 and 2, but if it is between 4 and 5 it does matter. That goes to the question of reliability.
Michelle, when you gave evidence to the Select Committee, I am sorry to throw your words back at you but you said that “96% of GCSE grades are accurate plus or minus one grade”. My question is to you specifically at this point. Does that imply that 4% of GCSE grades were two or more grades adrift? On last year’s figures, that is about 200,000 grades. What does that say for reliability?
Dr Michelle Meadows: Any measurement, even of a simple thing, has a level of unreliability in it. Even when we try to measure things we can see, there will be a level of unreliability in it. When you try to measure somebody’s attainment in a complex area such as English literature, to get 100% reliability would be technically pretty much impossible without the most extraordinarily long assessments and perhaps multiple assessments—with different views of that student’s ability. It is important that people do not put too much weight on any individual grade. Of course, once you get to an individual student’s suite of GCSEs—perhaps they will have done eight GCSEs in some cases—and you look at the whole, you probably have a reliable indication.
I know, unfortunately, that a lot of weight is placed on particular GCSEs for progression, maths and English being the obvious ones. In maths that is less problematic because the assessment in maths is generally highly reliable. In English that is problematic. This is not a failure of our GCSE system. This is the reality of assessment. It is the same around the world. There is no easy fix, I am afraid. It is how we use the grades that needs to change rather than creating a system of lengthy assessments.
Tim Oates: Chair, I will return to your point about areas of refinement and reform and where they should be focused. You have raised an area where considerable research says we should look in detail at how we grade our qualifications.
You are right, there is error in measurement in exactly the way that Michelle described. You have emphasised that the consequences of that error are concentrated around certain grade boundaries. It is crucial for progression when you decide on A-level choices and so on. Chair, there are alternatives to our current system in how we grade qualifications. Some have argued that we should have many more grades; others argue that we should have fewer. They all have consequences, but a technical debate should be had in that area that could improve our qualifications.
Sharon Hague: I have a couple of things to add. In support of what Michelle said, the Ofqual research referred specifically to subjects where examiners could have legitimately awarded different marks and the impact that that could potentially have on grades, if you assess choreography, for example, or creative writing.
As an exam board representative, we put an incredible amount of investment into assuring the reliability of marking. We have expert markers who mark for us every year. They are trained teachers and many of them have more than 10 years of experience in examining. The scripts are marked anonymously. Individual items are marked by different examiners to try to reduce any issues that we might find around reliability. There are training and ongoing checks. I can even see what time of day or night scripts are being marked. We can continuously monitor and assure.
One option is to reduce all our assessments to multiple choice, which would give a high degree of reliability. Also, fewer than 4% of GCSE grades were challenged last year. There are processes in place for schools to do that.
Lord Watson of Invergowrie: There are costs involved.
Sharon Hague: Well, not if we change the grade, but less than 1% of results were changed as a result of that challenge process.
In my work with the US system, interestingly—and this will feed into what we will come on to around technology—automated marking is used extensively. You may have one human marker and the scripts are marked once by a human and then, secondly, using AI. You can identify anomalies. You might have it marked twice by AI and then identify differences and use human scorers to check. The use of technology in the system is not just thinking about the actual test taking. There are opportunities there to continue to improve.
Lord Watson of Invergowrie: I will not go into it now, but part of the Times Education Commission’s report last year said that a lot of employers have lost faith in GCSEs, and indeed in A-levels, and they now do their own assessments. The question of grades must undermine that confidence. The point you have made about AI is interesting. We cannot pursue it now, but that might assuage quite a few concerns.
My broader question relates to the Pearson report to which you referred earlier and the seven recommendations. To quote two of them, “Shift wholesale curriculum and qualification reform to a model of continuous evidence-based improvement”, and, “Create greater diversity and representation in the curriculum that reflects young people’s lives to better engage them in learning”. That impacts on Lord Baker’s point about skills.
We are talking about assessment here, but I have not heard the term EBacc mentioned at all. For me, that hits a lot of the problem spots that restrict a lot of young people’s ability. Could you each give an impression of the EBacc as far as it narrows the curriculum and what the effects are? Whatever the grading system, whatever GCSEs we have—and we could have any number; I think I am right in saying that Pearson offers about 40 GCSEs in total—
Sharon Hague: Seventy-seven.
Lord Watson of Invergowrie: It is a big number. But it does not matter, if young people are restricted in which ones they can follow because of the pressures put on schools by EBacc. Could you each give me your impression of EBacc and what effect it has on the current system?
Gavin Busuttil-Reynaud: I am not an expert on the EBacc, but effectively it is part of the accountability framework. This is the point of GCSE design as such. The accountability framework has changed over time. It started out as a blunt five A* to C, which drove one set of behaviours, and it has been refined over time. It is currently in the EBacc. It does not relate to the design of a GCSE as such at the moment.
I do a lot of education work in Wales, where they have the Welsh Baccalaureate, which is a slightly different beast. It is interesting that even within that—and they are revolutionising their curriculum and revising it so that it is cross-cutting, uses thematic inquiry and has a lot of emphasis on skills building—at the heart of it they will still have a framework of GCSEs to demonstrate command of knowledge as well as other qualifications that they plan to develop to demonstrate the skills that they hold. They are certainly not planning on throwing out their whole existing system.
That is what I was talking about earlier. There is a need for caution as an engineer. You innovate and improve the working machinery you have. You do not switch it off and get a new one. EBacc was essentially that, to my mind. It was an attempt to tweak and improve the system. I am sure Tim will be able to talk extensively about its performance and whether there could be further improvements.
Tim Oates: Yes, although I will be quick. We have to think about what is actually a legal requirement on schools. Academies are not obliged to teach the precise detail of the national curriculum, but they are still governed by the general law on education. They are obliged to teach a broad and balanced curriculum.
The EBacc remains discretionary. It is a target. Schools by and large have only a proportion of their young people entered for the EBacc and meeting that criterion of a particular combination of GCSEs. There is often a percentage outside that criterion. A particular school that I am closely connected with has a relatively low EBacc entry, but it has exceedingly high value added. This is important for the community. It is an effective school in the outcomes that are achieved for young people.
This points to an important thing. Accountability arrangements have been elaborated since 2010. Prior to that, as I alluded to earlier, the more unitary measures like five grades A* to C in GCSE were driving quite adverse behaviour in schools. The elaborated accountability arrangements that we have now are much more sensitive to schools meeting their kids’ needs. Some have their needs met by having a combination of qualifications that are not within the EBacc specification. This is important.
The elaboration of accountability measures that you hinted at, Lord Baker, whether it is Progress 5 or Progress 8, are discussions that we can have. The accountability measures can be readily changed should adverse washback effects into curriculum and the behaviour of schools be detected. That is a good feature of our system.
Q21 Baroness Garden of Frognal: Thank you very much indeed. I worked for City & Guilds for over 20 years on vocational qualifications and I came across Tim on many occasions then. Of course City & Guilds was all about assessment. We mainly accredited adults. We left it to BTEC to do the schools, but we did at one stage have a schools programme that I was in charge of called CPVE. It was wonderful. I would go round schools and I would see these little kids absolutely engaged in motor vehicles, care, catering or whatever it happened to be, discovering that learning was relevant and fun. These kids would never pass any GCSEs. I was delighted to hear Sharon say that the important thing was to inspire the love of learning. GCSEs do not do that now.
I also was once upon a time a linguist. I happen to love French irregular verbs, Lord Baker, but it is much more important that people can speak languages. Languages are disappearing from the curriculum. If we want to be an international power, we have to be able to speak other people’s languages.
How can you bring back into the curriculum things that kids will enjoy and find relevant? Music, art and drama have disappeared from many of our state schools. How can you square that with the current GCSE requirements? Tim, we have crossed swords on many occasions in the past, in a friendly way.
Tim Oates: I was going to say, yes, we were not crossing swords. We were discussing important matters.
The problems that you have identified and outlined around balance in the curriculum and the focus on certain subjects are not a consequence of the types of assessments we use. I have been talking over the last week about the language that is often used in schools for the decisions they make about the balance and the breadth in their curriculum. They often say, “We are prevented from doing this”, or, “We are forced to do this”, and often that language is inaccurate. It is not a legal requirement on them. I have interviewed schools where heads and middle leaders say, “The EBacc is forcing us down a particular route. We have to drop these subjects. We have to close these departments”. I have interviewed staff in identical schools and identical circumstances who say, “We enter for the EBacc but we are not forced to narrow our curriculum. We are not forced to do certain subjects”.
We have to think about the system in the way in which we described it in the first round of questions. It is a complex system in which accountability, the messaging from the Government and other organisations, assessment, the national curriculum, inspection, funding and teacher shortages all have a role. They all combine.
The things that you were describing as problems in the system have better remedies than changing our GCSEs or changing the mode of assessment. It is an important public policy point that I make here.
Baroness Garden of Frognal: The fact is that music, art, drama and languages are dropping from most school syllabuses. How do you reintroduce those things?
Gavin Busuttil-Reynaud: I have been working on a GCSE review panel in Qualifications Wales. I have colleagues who were working on the language panel and there is exactly the same issue about encouraging learning. They are looking at innovating the design and what they reward so that, for example, you can choose to try to achieve a conversational level in three different languages because you are interested in their communications value rather than trying to become expert in a single language, its grammar and so on. They are going back to effectively a unitised and credit-based approach that values the love of learning and the value of languages. That comes down to that kind of behaviour happening only if people have an outcome that has some kind of transcript and description that allows for a multiplicity of different things.
One problem is saying that we value all these various things like certain skills and then trying to make them all GCSEs and putting them all on the same scale. Many of these things are not scaled. If you are an employer and you look at people’s group-working attributes, it is not a scale of where you sit in terms of better and worse. Employers ask people to sit various psychometric and aptitude tests because they want to know someone’s characteristics. They are not good or bad. They are attributes of who they are and how they behave. You want to put teams together that will be able to perform well. You could not just stick them into a graded scale. There is a problem here that cannot all be solved by GCSEs.
Sharon Hague: From my perspective, it is about making sure that the curriculum and the content that young people are learning is modern, relevant and engaging.
You mentioned design and technology just then and the decline in numbers. The team at Pearson spotted that that was happening to design and technology. We looked at the content and we talked to experts and stakeholders. We are making some recommendations around how that might evolve to become more relevant to young people so that it includes sustainability, design thinking and the skills and knowledge that they will need to be successful.
There are opportunities for the range of subjects covered and making sure that that reflects the needs of employment and society to ensure that young people feel motivated and engaged to study. There are perhaps vocational subjects and enabling students to have that choice at 14 to 16. At Pearson we have a whole raft of high-quality level 2 qualifications that, again, would give young people the opportunity to sample different industries.
I know I keep coming back to it but, on the use of technology, we did a trial last year where we offered our international GCSE on screen. Over 75% of the children who came out of that exam said, “That was great; I felt I did well”. They came out motivated and engaged after having taken the assessment using technology. I am not saying that that is necessarily appropriate for everything, but it reflects more their day-to-day experience.
Finally, make sure that assessments and the content studied are inclusive so that there are no unintended barriers for children with disadvantaged backgrounds or making assumptions about cultural capital that they may not have. They are my suggestions.
Q22 Lord Storey: I need to quickly come back to Tim on the EBacc. The Government have set targets for the number of schools they want to have the EBacc. When you have the mechanism of the educational state driving this and when you have huge budgetary pressures where schools say, “If I do an EBacc, I can get rid of some of my creative subjects and save money”, that is the problem and that is the issue. That is not my question. I just make that comment.
I want to ask a quick question to Gavin. He does not have to give me the answer now. You talked about written examinations, but we gather evidence in all sorts of other ways through portfolios and oral examinations. What percentage is that? My experience of going into secondary schools is that it is very small indeed. Perhaps you could write to me and let me know.
Here is my big question. We are probably the most tested nation in the world from primary right through to secondary. Of course, during the coalition time, the big country that we waved the flag for was Finland. Finland is far better than we are on the PISA tables in reading and maths and science and, more importantly, better in pupil well-being. They do not have these high-stakes—I hate that term—examinations.
Imagine a world where you individually are the benevolent Secretary of State for Education. With all your commercial experience of selling products, with all your academic experience, as Secretary of State, what system of assessment would you bring in? Would you trust teachers, perhaps, in that system? Michelle?
Dr Michelle Meadows: Thank you for my new role.
Lord Storey: You will be there for five years at least and so do not worry.
Dr Michelle Meadows: I would not start with qualifications and assessment. We spend far too much time changing what comes at the end of education and not thinking enough about the inputs. I would not start here at all.
I would have to be there a long time because, as we talked about a few times, assessment systems are culturally embedded. They are context-driven and complex. It is about how they integrate with the whole of the education system. I would not begin with qualifications. I would begin with professionalising and supporting teachers, giving them the confidence, changing the way in which we judge quality so that it is more rounded. Then I would come around to assessments that would give me far more opportunities with the techniques I might use.
Lord Storey: You might have got the job. Sharon?
Sharon Hague: I totally support Michelle’s comments. The outcomes of the qualifications are the output of two years or more of study. I agree completely that that is not the first place to start. You then have to look at what you are assessing. I would expect the system to have perhaps more of a mix of different assessment methodologies, because the nature of what you are assessing needs to be valid as well as reliable if you are delivering it at scale.
Gavin Busuttil-Reynaud: The single most important recommendation that I read in all the various reports that were on the reading list for this event was to create a single, unique digital identity for each learner at the start of their education and make everything they produce as part of education attachable to that identity so that they can take it with them and demonstrate what they can do.
My answer is that I would like to see a broad and rich variety of assessment and I would not want it all to be a terminal assessment at GCSE. I would like them to be able to demonstrate and show everything that they can do as they have gone through. I would like the graded and the non-graded. The Duke of Edinburgh’s Award, which lots of young people take to demonstrate the breadth of their skills and what they can do, is not graded. You either get it or you do not. Grading itself works well in some situations and not in others. We need that record so that people can take things with them all the way through and it will give also a longitudinal picture of the learner.
The other point was made, importantly, about the forgotten third of learners and people who have precious little opportunity when they leave. We are longitudinally testing people—and I know people say we are testing too much—in Wales between years 2 and 9. They are entirely formative assessments to provide individualised feedback to those learners, but we see the problems of those who are not literate and numerate at GCSE. The seeds of that are planted long before GCSE. We know who they are. We cannot fix it with any amount of tinkering to the GCSE system. We see that the whole cohort makes the same progression right through primary school and they get to secondary and that bottom 25% stops dead.
Tim Oates: I will come in a second to what the system should look like or how we can improve it.
People say, “Never press Tim’s Finnish button”, because I have examined the Finnish system for many years. There are a couple of insights from it. Far more formal tests are used in Finnish primary education than in English primary schools. The reasons why are important and salutary in what you have just said about formative assessment. Finnish teachers use these formal tests. They say, “We do not much like doing it, but we know it is important because it enables us to detect children who are at risk of falling behind”. They are used for the formative purpose that you describe and for the purposes of achieving equity. That is important because it says that these highly dependable tests have an important role. It tells us that we should attend to utility and impact and use assessments in the right way. That is important from the Finnish lesson.
The second and final bit of insight from Finland is that entry to upper secondary in Finland is highly competitive and the high-stakes teacher grades in the urban areas really count. There is a great deal of parental, student and community anxiety about the accuracy of the grades that are given to Finnish students. The lower secondary to upper secondary progression issue is important in Finland and important in the transfer from primary to secondary. That is important for the record.
On educational nirvana and what we could see, you probably want to come on to the technical revisions that we could make to assessment in this country, particularly using the new technologies. There are a lot of assets around that, which we will come on to.
The Chair: Lord Knight is about to steer us straight on to that and then I will come to Lord Aberdare and Baroness Evans.
Q23 Lord Knight of Weymouth: The question we have been waiting to come on to is when we will move away from everything being done in large sports halls on tiny desks with paper and pen.
I am interested. Assuming we have played around with purpose so that we are more interested in validity than reliability and we are perhaps adjusting a few things around accountability to free up the way that you do assessment, what is the opportunity of technology? What is it both in formative and in summative assessment terms? How does it differentiate between academic, technical and creative subjects?
Sharon Hague: Thank you. First, use of technology in assessment, as you will be aware, is already here. We deliver hundreds of thousands of BTEC assessments on screen every year. We have the Pearson Test of English, which is a fully automated assessment of spoken English, written English, reading and listening skills. It is taken in a secure environment, marked completely using technology and the candidates receive their results within 48 hours. I talked about the pilot and our international GCSE in English, to which we had a fantastic response particularly from the students, which was heartening. I have talked about our GCSE in computer science. We have had more than 4,000 students take online mocks. In the professional world we have Pearson VUE, which is a network of secure sites where a whole raft of professional tests are taken by adults.
Lord Knight of Weymouth: You can do it and you do it. Is it just the regulator applying the handbrake and stopping you getting on with it?
Sharon Hague: I think there are a number of things. The other one I will mention, which is the interesting one, is Egypt, where we delivered more than 14 million high-stakes onscreen assessments. All of the young people were issued with iPads with all of the teaching and learning content and took the assessments on screen. It can be done and it can have real benefits.
When I look at other systems, for example in the US where I have mentioned we are delivering a lot of high-stakes onscreen assessments, there is a choice made available, so it is not a sudden “Let’s do everything on screen”. It is a transition. However, the US has a much higher proportion of schools that have devices per child and the children can take those home. Children in the very small number of households that do not have access to the internet in the US are also given digital cards so they can access data. That is a big area for us to consider as a UK system.
We need to learn as well if we are applying it to GCSEs because, of course, there will be risks. We have a low appetite for risk in making changes to the assessment and qualifications system, so we would need to make sure that we manage that transition smoothly. We would need to support schools and teachers as well because it creates different things that have to happen in schools in terms of administering, but ultimately it could provide fantastic flexibility and move us away from all students necessarily having to take a written exam in an exam hall at the same time and give schools and students much more flexibility about when they are taking assessments.
Lord Knight of Weymouth: Thank you. Gavin, this is obviously an area where you have done a lot of work.
Gavin Busuttil-Reynaud: Exactly. The low-hanging fruit for technology is around efficiency and it is one area that I recommend we should investigate. In our knowledge-rich curriculum we do lots of testing of knowledge. We should do that MCQ on screen, but I argue that it should not be a sessional test at the end of a course. It should be a test-when-ready model, to get away from the inherent unfairness of what if you have a bad day and all the rest of it, so you come in a good frame of mind, sit down, take a test. That is the model that places will move to as we use more formative assessment as people work through schools. There are massive advantages for technology in assessment because there is a lower threshold on the burden of proof of 100% accuracy and 100% reliability, the weight that comes to bear on a single-time sessional assessment in GCSE.
Lord Knight of Weymouth: We had single-level tests that I think when I was the Minister got knocked on the head back in the noughties. There were issues attached to that with the amount of pressure it applied to teachers and the system. Do you think technology allows us to lessen those and essentially do more formative assessments that encapsulate, in a more granular way, the way you are talking about in Wales?
The Chair: In answering that question about formative, can you perhaps say whether ChatGPT will be an obstacle to the use of more formative assessment? Will students have an opportunity to interfere with that?
Gavin Busuttil-Reynaud: You can interfere with formative assessment, but the point you have to get across, particularly from the formative point of view, is why would you bother cheating? I have lots of conversations with teachers about it. Well, we have adaptative assessment, so if you cheat, you will get harder questions and you will have to cheat better and better. All you will do is get a report at the end that suggests things that you appear to be able to do well and things that you should probably work on. That will be patently rubbish if you got someone else to do your homework or you cheated.
If we look at things like ChatGPT that are generating open input, it is quite interesting and I would turn it around. I would generate a ChatGPT answer to a question and then tell someone to critique that response. That is a very high-order skill that you are testing there, so it is about building that kind of technology into your testing. Once you go beyond efficiency, I think there are some very interesting opportunities in technology for assessment to assess the skills that at the moment do not happen. We talk a lot about work skills, the classic things. If you are in a work environment, the first thing you do is produce something, give it to someone else and ask for criticism and feedback. They probably have a discussion with you. You reflect on that, you produce a revision, you then send a second version out and so on. That is all about setting up a structure for how you would do something.
QCA, Edexcel and Goldsmiths University did all of this in GCSE between 2005 and 2009. It was the E-scape project. It was technology-enabled so you had this structure and this framework for doing this in a craft design technology setting. You used tech devices to take pictures of what you were doing and show what you had shared and it recorded who gave what feedback and so on. It generated a naturally occurring evidence portfolio, which went through and then it was graded using a comparative pairs judgment system using work from Alastair Pollitt that put it on a scale.
That is the kind of thing that is the modern, new skills—well, the skills have been around for ages, but it is the modern approach to an assessment that I think is excellent. It only happens when you have technology because that is the only way you can separate out who has done what. I think there are huge opportunities for things like that going forward.
The Chair: Thank you. We may need slightly shorter answers because I still have a number of people to come in. Who has not come in yet? Tim, have you come in on this? Very briefly, please.
Tim Oates: I will be quick. We have looked at the history of cheating in examinations. There was very interesting cheating in the Chinese civil service exams in the fifth century. You have to make sure that you are getting the response from the person in whom you are interested. That has obtained throughout the history of assessment; it will obtain forever. There are issues of authenticity and making sure that you are getting inside the person’s head, inside the person’s performance. We are in a strange time with generative AI.
Research shows that the speed of writing and the speed of reading are extremely important for your educational progression. We are taking the foot off the accelerator with writing. As we move to digital assessment, we are not emphasising the acquisition of keyboarding skills, so kids are being held back from being able to express what they think and what they know. The AQA work that is going on at the moment on the implementation of digital assessment in schools shows that school readiness is a very important issue: the availability of technology and the management of that technology.
I observed a lesson in a school recently that was technology enhanced; every kid had an iPad, the smart board showed each and every child’s working simultaneously. It is literally the best lesson I have ever observed and I have observed hundreds of lessons around the world. That lesson was focusing kids on questions. Those questions were very similar to the questions that appear in a GCSE. They probed the kids’ understanding. They were being used in the right way in probing the kids’ understanding and supporting them where they had misconceptions.
The Chair: I should stop you there because I need to bring in other members. Michelle, I will come back to you, if I can.
Q24 Lord Aberdare: I have been having great difficulty getting my mind around all of this, but I must say that some of the answers to Lord Storey’s questions were very helpful. I will target my questions mainly at Gavin and Sharon.
We have heard a lot about the complexity and interconnectedness of the system. We have heard about the importance of an evolutionary approach, which I very much agree with, and we heard from Prime Minister Michelle that we should start at the beginning rather than the end, which I also agree with. My question is: how are we going to get from here to there and how are we, as a committee, going to come up with some thoughts about a desirable way of doing that?
Sharon said at one point that the answer to one of Lord Baker’s questions depended on individual children and the pathways that were appropriate for them. It seems to me that the current system, including the assessment elements of it, works pretty well for academically minded children and works pretty well for academic pathways. It does not work—as I think Gavin said—for a significant body of more vocationally directed children and it does not work for employers. You have come up with some very good ideas about the range. I think you called it a rich variety of assessments, some graded and some not graded. Of course this all feeds into school accountability as well. It is quite easy to do it with GCSEs; that is probably one of the advantages.
How do we get from where we are now with a GCSE-focused system to one that allows that rich variety of assessment, graded and non-graded, and its use to assess the results that the schools have achieved for the whole range of their pupils, not just those who are going on to university?
Gavin Busuttil-Reynaud: Taking lessons from Wales, the single most important thing is that the state has to do things that have to happen at a systemic level. The Welsh Government organised for the high-speed digital connection of all of their schools, funded that and made sure it happened. They also provided funding through the local authorities for devices and it was basically mandated that there would be onscreen assessment. It was the schools’ responsibility to make sure that they had made provision for devices and training of teachers so it was going to happen. They now have a sort of digital inspection framework for schools whereby they look from starting at just their physical infrastructure of cabling and provision of internet, on through the sophistication of the set-up of their network, the capability of the technical staff that they have, the devices they use and so on. That has to happen as an enabler. The reason why we all use computing devices in business is because it is mission critical. It has to be made mission critical and mandated for schools and that has to be a state activity.
The second key enabler is providing the persistent single digital identity that cuts across multiple systems and multiple platforms that the Government provide. They provide that out to the different providers that are working with those learners within the digital space; you have access to the same active directory. You know which each learner is and that is persistent and follows those pupils from their entry to nursery all the way through to their completion of further education. I think once you have those enablers in place, you have the basics to decide to innovate pretty much wherever and however you choose to do so. I think the natural place is then for the use of formative and IT-based assessment but, more importantly, the whole education journey to be digitised through the course of someone’s schooling.
Sharon Hague: There are a couple of additional things to be considered, which we have alluded to in the discussion. One is making sure that young people have the digital skills that they need and that those develop and build throughout their education—so perhaps a recommendation to articulate not just devices and the digital infrastructure that is needed to support a move towards using more technology but what young people need to be able to access and perhaps identifying at different ages and stages what skills they would expect them to deliver, being mindful, of course, of the pressure on schools and adding more and more to a school’s list of things that it has to deliver.
There are processes in place for individual subjects to modernise them. The rules and guidelines around general qualifications, particularly about what should be contained within a particular subject, sit with DfE and the high-level decisions about assessment structure sit with DfE, a combination of DfE and Ofqual, but there are processes in place for individual exam boards or exam boards collectively to identify opportunities to modernise the content and build some of the skills or change the assessment mode. That is certainly incumbent on us and I have given examples of where we are taking a lead on that.
The other area is how we encourage and facilitate the whole system, that choice for schools, because ultimately this will enable greater personalisation.
Q25 Baroness Evans of Bowes Park: I want to go back. A number of you have mentioned international approaches, lessons from Sweden, mistakes we do not want to make, innovations that you are doing in Egypt and other countries. Are there any other international experiences, comparators, in any sense around assessment, obviously understanding systems are very different, that we should be thinking about? We do not want to reinvent the wheel but equally with all your experience looking outwardly it would be helpful to see if there are any other examples that you would like to point to that we should be mindful of.
The Chair: Michelle Meadows is keen to come in on this.
Dr Michelle Meadows: I will reference a piece of work that I was involved in while I was at Ofqual, which was looking internationally at how systems have moved to digital assessment. We interviewed colleagues from New Zealand, Israel and Finland about how they had gone about it. They had all done it somewhat differently. For example, New Zealand had a gentle opt-in approach, so if schools felt ready they could begin to take their high-stakes tests online. Finland had quite a different approach where they began with the teaching and learning, looking at how digital devices were used in teaching and learning and then looking at how the assessment would build on that.
They all did it differently but the take-home points from that work, across all the systems that we looked at, were, first, you have to have political will to make this happen, not least because things will go wrong. You have to have an appropriate risk appetite for this kind of change and, of course, because things will go wrong you have to have great disaster recovery plans and contingency plans. Of course you do.
The other thing is massive communication efforts, massive engagement, not just with schools but with parents and stakeholders beyond the education system. It is a big piece of work and it is not about any one individual exam board. I do not think any one individual exam board can make this move for general qualifications, GCSEs and A-levels. It has to be a system-wide move that knits well with the rest of the system.
Tim Oates: On commitment to the status quo versus well-evidenced appropriate reform, the history of GCSE, and I hope the future history of assessment in this country, is one of constant change, driven by high-quality evaluation. It is one of the characteristics of the English setting. We have a fantastic base of individuals and researchers in boards and in academia who are capable of critical, highly instrumental evaluation of our assessment. That is a good feature of our nation.
Technology enables certain things that we cannot do now with paper. We are very interested in the process of accumulation of evidence over time. We do that with medics during medical training, but there we can afford it because the outcomes are so critical that we are prepared to pay for multiple assessment of the same thing using different techniques over time, to be absolutely assured of the outcome. Digital allows certain things but, to come back to the issue of grade classification or grade misclassification, part of the problems that get concentrated in grades are because we have only one person judging one student’s outcomes. With digital we can not only assess a greater variety of types of things but we can allow a single thing to be judged by multiple different judges and that can decrease lack of reliability and improve the dependability of assessment. That can only be done through digital transformation of assessment.
Michelle mentioned the process that we should use for that change and other nations like Singapore—again, international comparisons—try out so many things, but they are not afraid to say, “We have tried this and it does not work”. That is really important, that different approach to change, which your report so nicely highlighted.
On the system and international comparisons, the APPG here this week heard a story from Norway about the way in which the Norwegian system operates. What was not said in that is how assessment operates for the equivalent of A-level in Norway. If you are a pupil, you do not know until the day before the exams whether you will be one of the 20% who are told the day before that you will be taking an exam in a subject; it is determined by lottery. Is it not interesting? It happens in that nation and parents and kids comply. We can learn a lot from these international comparisons about what we need to do at system level to adopt appropriate assessments.
Q26 Lord Baker of Dorking: I will go back to the question you were asked a few moments ago about what effect ChatGPT will have on all of us. Mr Musk wants it stopped, he is so worried about it, but I do not think we can stop it and I think it will have a profound effect upon all our lives and assessment. Already, large companies are using artificial intelligence to build up a history of potential employees. They will use ChatGPT. It will become much more intense, much more complicated. They will find out whether they scored a goal in the games they played, all sorts of funny things that we do not know. ChatGPT will give them that information. If it starts in the primary school, you will be able to build up a picture of a child, what books have they read, what games have they played, how do they do maths, how do they do English and so on, what places have they visited. You will be able to build up a whole picture for each child, probably within the next two or three years, and that is a form of assessment at the end of the day. Do you have any ideas on this? It is very early days but I think that will happen.
Sharon Hague: I think it presents some risks in the short term to our current models of assessment. Just this week the Joint Council for Qualifications issued additional guidance for teachers and school leaders on how to assure themselves that the work that was being submitted for assessment is the candidate’s own.
In the longer term, it offers great opportunity. It is not so long ago that we were discussing how we would accommodate digital photography in the assessment of a photography GCSE or A-level. We are able, as a system, to evolve and take into account developments in technology. We have talked about test taking but I do not think we should underestimate the opportunities in how we operate the assessments. The AI used in the US also identifies potential malpractice and safeguarding issues in candidates’ work. We can improve the quality of marking in the longer term, as I have described. We can automate manual checks. We can potentially get results back to students much faster. Of course there will be challenges to work through, but it presents also great opportunity for the future.
The Chair: Does anyone else want to come in on ChatGPT?
Tim Oates: We similarly have issued a statement about the ethics and the technical requirements for AI within education and the impact of the changes that are occurring. It has huge educational assets. We are very supportive of this issue of accumulation of evidence over time. Everything we have said today has been framed as “There will be a downside and there will be an upside”, and I want to emphasise that too.
I absolutely advocate the use of high-quality questions throughout education right the way through, and hence the kind of technology that we have developed to make test papers available in the course of key stage 3 and key stage 4, a mocks service, is so important. It prepares kids for examinations and makes them familiar with the kinds of questions they will experience in high-stakes examinations. That is good and questions are important because they probe thinking and they stimulate thinking. It is clear that we will create digital environments in which these questions will be massively available and will support learning and assessment.
Where are there some downsides? There is a risk in a “surveillance society” where people feel that absolutely everything they do is being monitored for high-stakes purposes. That is quite an issue. People need to feel able to reveal their misconceptions, to make mistakes, because so frequently we know from high-quality education that that is where the learning occurs. Again, as we begin to move to these digitally supported systems—and we are exploring all of them across the board, the kinds of practical approaches that Sharon has outlined—we must closely monitor the impact that it has on the behaviour of teachers, on their workload, and on the behaviour of kids and their sense of where they are in learning.
Dr Michelle Meadows: Just quickly on the accumulation of evidence, partly why we can have this debate about GCSEs is because the evidence around GCSE is so transparent. We know so much about the reliability and validity because it is published. We explore it; we open it up for criticism. I worry about the accumulation of evidence where we do not know an awful lot about the quality of that evidence. I think that is worth bearing in mind.
The Chair: We are now pretty much out of time. I thank our four witnesses very much for coming in and giving us their time. It has been a fascinating session and we will take great notice of your contributions. Thank you very much.