Written Evidence Submitted by Dr Alvin Wilby, FRAeS, Visiting Professor, Complex Socio-Technical Systems, University of Bristol

(C190050)

 

Personal statement

I am a physicist / mathematician by background, with extensive experience in systems engineering.

I have recently retired from a CTO role in a major industrial organisation involved in Aerospace, Defence and Transportation.

I am continuing to support the University of Bristol, as a visiting professor, on topics related to the safety of autonomous systems.

By chance, I was asked informally to review the CovidSim modelling that was instrumental in setting the UK “Lockdown” policy.

From an engineering perspective, there seem to be serious deficiencies in the quality of the modelling – both in terms of the modelling assumptions and the inability to prove that the software correctly implements the model.

This potentially means that policy makers will believe the model to have greater utility for supporting policy decisions than is justified.

This note makes some suggestions for improving the evaluation of such models, to ensure future policy decisions are as robust as possible, given the inevitable uncertainties in data and epidemiology.

The work conducted (detailed report available on request) was unfunded and is independent of any of the existing Covid modelling groups.

 


Summary

The CovidSim model developed by Neil Ferguson’s team at Imperial College was influential in determining UK government policy (“Lockdown”) in response to COVID-19[1] (“Report 9”).

The UK government approach has been to follow scientific advice, as channelled through organisations such as SAGE[2].

But of course scientific advice, particularly in the early stages of a pandemic, when fundamental characteristics of the disease are not well understood, will inevitably encompass a spectrum of opinions, rather than a single source of truth” (validated evidence).

Policy makers, particularly without a technical background, are therefore faced with a profoundly difficult challenge in deciding on a course of action.

There is a risk that mathematical sophistication is, mistakenly, assumed to correlate with predictive power.

In fact, the CovidSim model suffers from a number of deficiencies:

The consequence of these deficiencies is that the predictive power of CovidSim is no better than a very simple SIR[3] model, for which the importance of key uncertainties (reproduction rate etc) would be more evident.

It seems likely, therefore, that policy makers had an inadequate understanding of the limitations of the modelling.

From a safety critical engineering perspective (and I would argue that life or death policy decisions affecting millions of people are, in some sense, safety critical) I believe that  decision support models should not be scientific research tools, but should be well engineered and validated as “fit for purpose.

I recommend that modelling used to support future policy decisions with life and death outcomes (or potentially catastrophic economic outcomes) should be:

Evidence

i) Software Complexity

Industrial software intended for safety critical applications is developed and tested following well defined standards and processes.

Whilst these are tailored for specific sectors (e.g. industrial[5], automotive[6], rail[7], aviation[8], ) they follow common principles:

       A “Safety Integrity Level” (SIL) is defined – how “safe” does the software need to be (considering the likelihood and gravity of unsafe events)

       Software development and test processes/tools are tailored to the SIL (so the software is no more onerous/expensive to develop than is justifiable)

       The organisations and individuals involved are formally accredited – to ensure they are suitably organised, qualified and experienced for the task

       Independent scrutiny (typically from a regulatory body) is applied throughout the development cycle (planning, development, test, maintenance)

In the UK, the law also requires the risks associated with production software to be “As Low As Reasonably Practicable” (ALARP[9]). In broad terms, if investing in further safety would save lives for a cost of less than a few £M, you should invest in improving safety.

The CovidSim model predicted a potential fatality count (Report 9) for the UK of approximately 510,000 people. So we want to know that this is not unreasonably pessimistic (there are cost and mortality issues associated with “Lockdown”[10]) or too optimistic.

If the consequences of getting something wrong are measured in many thousands of lives, we should consider adopting some safety critical principles, even if they are “lighter weight to reduce cost and increase speed of development.

In an academic research environment, there will generally not be this level of rigour, because the software is used to “play” with concepts and produce academic papers, not to provide to third parties to use in a safety critical environment (although some larger academic organisations adopt more structured approaches[11]).

The CovidSim model did not appear to follow any formal development process, is written in C++ without adopting a recognised safety oriented coding standard (e.g. MISRA C++[12]), and has modules with cyclomatic complexity[13] that are “off the chart”:

Figure 1 – Cyclomatic Complexity vs Number of Lines of Code forCovidSim modules (Log Scale)

 

An informal guide to interpreting cyclomatic complexity[14] is given in Table 1 below (my emphasis).

Table 1 Cyclomatic Code Complexity

Complexity

What IT MEANS

1-10

Structured and well written code that is easily testable.

10-20

Fairly complex code that could be a challenge to test. Depending on what you are doing these sorts of values are still acceptable if they're done for a good reason.

20-40

Very complex code that is hard to test. You should look at refactoring this, breaking it down into smaller methods, or using a design pattern.

>40

Crazy code, that is not at all testable and nearly impossible to maintain or extend. Something is really wrong here and needs to be scrutinised further.

From an engineering perspective, we would not assume CovidSim is of acceptable software quality and it would certainly fail accreditation against any of the safety standards referenced previously.

Of course, CovidSim was not the only model used – but it seems that SAGE was faced with the task of trying to “take an average” across several models of unknown integrity, without really understanding their limitations.

Going forward, the committee might wish to consider whether it is appropriate to establish principles and standards for modelling that will be used for critical decisions affecting the health and livelihoods of the UK Population.

Recognising that things may develop quickly, developing robust generic models, and modelling capabilities, could be considered as essential contingency planning.

Contingent capability could draw on industrial organisations, as well as academic organisations, to gain access to best in class software development capabilities and data sources.

ii)              ”Distance Kernel”

At its simplest, SIR modelling assumes an homogenous “herd” of individuals that are in one of three states:

       Susceptible

       Infected

       Recovered (removed from the population of susceptible people)

A mortality element can be added without changing the fundamental model (e.g. a percentage of those infected will die).

A distance kernel attempts to introduce some dependence on physical distance between individuals.

An early version of the CovidSim approach was used for modelling the Foot & Mouth outbreak, leading to the policy decision to adopt a mass culling approach.

With this disease, it is known that transmission can occur via aerosols, so some sort of distance function, where proximity implies a higher degree of risk, seems plausible, although the modelling did attract some scientific criticism[15].

The CovidSim model has provision for several different kernel functions - power, exponential, Gaussian, Step, power exponential. However, there is no evidence that these are validated against observed human traffic patterns.

At a national scale, the geographic separation of population centres can be contrasted with, for example, the interconnectedness of the rail network:

A close up of a map

Description automatically generated

Figure 2 – Geographic and Network Perspectives on the distance between population centres

So the geographic view implies a significant distance between e.g. London and Birmingham or Glasgow, but from a transport perspective, we see they are closely linked.

At a local level, similar criticisms apply. If I were to interact with neighbours in my street, that would most probably be by accident in the local supermarket, some distance away. Friends and family are scattered around the UK. My office, prior to retirement, was 50 miles away,…

In the CovidSim approach, modelling parameters appear to be set at a national scale, then (with random variations) mapped onto approximately 10 million local “cells” containing a few hundred to several thousand people.

So the modelling of the distances between cells might look very detailed, but actually has no real understanding of human traffic patterns.

It seems unlikely that modelling at this scale can be useful without drawing on sources of real time data, of which suitable anonymised mobile phone data would be most valuable.

By way of illustration, the image below shows results from a scientific study of a cholera outbreak, using phone data to understand population movements in response to the outbreak[16]:

A close up of a map

Description automatically generated

Figure 3 – Mobile phone tracking of population movement during a disaster (Ref 16)

The authors note that:

These results suggest that the speed and accuracy of estimates of population movements during disasters and infectious disease outbreaks may be revolutionized in areas with high mobile phone coverage.”

The committee might wish to consider the utility of modelling at different scales (national, regional etc) and the data sources required if such modelling is to have adequate predictive capabilities.

As well as mobile phone data, CCTV and Automatic Number Plate Recognition can all contribute to a detailed understanding of the actual “pattern of life” (recognising, of course, the trade off with civil liberties).

iii)              Stochastic or “Monte Carlo” Modelling

The CovidSim model purports to add fidelity by modelling at a fine grain level, where the properties of each geographic “cell vary randomly according to defined probability distributions.

As the parameters vary randomly, the results of running the model can vary. Typically, the model is run several times, with different random number “seeds”, and some sort of “ensemble average” is taken as the most likely prediction.

The early stages of the model are “calibrated” against observations.

There are a number of problems with this approach:

       Observations may not be accurate – e.g. whilst there is not a robust testing regime, or if there are significant numbers of people that are mildly infected or asymptomatic, the numbers could be very wrong (e.g. if you assumed R0 = 2.4 and it was nearer 1.2, you would over-estimate the death toll by a factor of 3)

       The average of a set of random trials will be close to the answer you would get using mean numbers and a simple model – so have you gained any meaningful fidelity? In particular, if the fine grained detail simply reflects unfounded assumptions (e.g. that a distance kernel applies) then Monte Carlo is just a complicated way of playing back your assumptions – it doesn’t add any meaningful predictive capability.

       The model may contain errors – but if you have enough parameters to tweak, you will always be able to calibrate it, so it looks like a convincing fit to observations. It has not necessarily helped your understanding of the phenomenology involved.

These points are illustrated by the figure below, which calibrates a trivial Excel model (red line) to match the death rate curve in Report 9 (black line).

Figure 4 -Red Line is trivial Excel model (see Appendix) CovidSim report is Black Line


Considerations

Some of the modelling capabilities available to support policy makers in the early stages of the Covid pandemic were not fit for purpose.

Recognising that there was a very short timescale for decisions, and therefore no time to properly validate or accredit the models, the committee may wish to consider how to build a robust national contingent modelling capability, suitable for future pandemics or other national emergencies.

The experiences of organisation such as NAFEMS[17], working to professionalise engineering simulation, may bring relevant insights.

The committee may wish to consider:

       How to ensure that the limitations, relevance and integrity of scientific models are better understood by policy makers.

       How to use “What If” modelling to expose critical assumptions and data dependencies (often, it will be more important to improve data quality[18], rather than simply running more models using the same basic assumptions).

       How academic capabilities can be augmented by industrial capabilities in software development, modelling and data analytics.

       Whether to develop a framework of standards/good practice for model development

       Whether to establish a framework for independent review and accreditation of critical models.

The committee may wish to consult The Royal Society regarding their RAMP[19] initiative, which should provide valuable “lessons learnt”.

Existing trade associations (e.g. TechUK[20]) and innovation networks (e.g. the Digital Catapult[21]) could facilitate the development of frameworks, in order to support the development of an enduring national capability.

 

 

 

 

 

W A Wilby


Appendix               Excel Model

 

A screenshot of a cell phone

Description automatically generated

 

A close up of text on a white background

Description automatically generated

 

 

(8 July 2020)

Page 9 of 9

 


[1] “Report 9” – Neil M Ferguson, Daniel Laydon, Gemma Nedjati-Gilani et al. Impact of non-pharmaceutical interventions (NPIs) to reduce COVID-19 mortality and healthcare demand. Imperial College London (16-03-2020), doi:https://doi.org/10.25561/77482

[2] https://www.gov.uk/government/groups/scientific-advisory-group-for-emergencies-sage

[3] Susceptible, Infected, Removed (or Recovered)

[4] “Keep It Simple, Stupid” The KISS principle states that most systems work best if they are kept simple rather than made complicated; therefore, simplicity should be a key goal in design, and unnecessary complexity should be avoided (https://en.wikipedia.org/wiki/KISS_principle).

 

[5] IEC 61508 Functional Safety of Electrical/Electronic/Programmable Electronic Safety-related Systems

[6] ISO 26262 Road vehicles – Functional safety

[7] IEC 62279 Railway applications - Communication, signalling and processing systems - Software for railway control and protection systems

[8] DO-178C Software Considerations in Airborne Systems and Equipment Certification

[9] https://www.hse.gov.uk/risk/theory/alarpglance.htm

[10] e.g. J-Value analysis Thomas, Philip Nanotechnology Perceptions 16 (2020) 16-40

[11] Washbrook, A et al (2018) Continuous software quality analysis for the ATLAS experiment

DO  - 10.1088/1742-6596/1085/3/032047 Journal of Physics: Conference Series

[12] https://www.misra.org.uk/Activities/MISRAC/tabid/171/Default.aspx

[13] T. J. McCabe, "A Complexity Measure," IEEE Transactions on Software Engineering, vol. SE-2, no. 4, pp. 308-320, Dec. 1976, doi: 10.1109/TSE.1976.233837

[14] https://dev.to/designpuddle/coding-concepts---cyclomatic-complexity-3blk

[15] Kitching RP, Thrusfield MV, Taylor NM. Use and abuse of mathematical models: an illustration from the 2001 foot and mouth disease epidemic in the United Kingdom. Rev Sci Tech. 2006;25(1):293‐311. doi:10.20506/rst.25.1.1665

[16] Improved Response to Disasters and Outbreaks by Tracking Population Movements with Mobile Phone Network Data: A Post-Earthquake Geospatial Study in Haiti

Linus Bengtsson ,Xin Lu,Anna Thorson,Richard Garfield,Johan von Schreeb

Published: August 30, 2011https://doi.org/10.1371/journal.pmed.1001083

[17] NAFEMS is the International Association for the Engineering Modelling, Analysis and Simulation Community.

[18] Fundamental principles of epidemic spread highlight the immediate need for large-scale serological surveys to assess the stage of the SARS-CoV-2 epidemic

Jose Lourenco, Robert Paton, Mahan Ghafari, Moritz Kraemer, Craig Thompson, Peter Simmonds, Paul Klenerman, Sunetra Gupta

medRxiv 2020.03.24.20042291; doi: https://doi.org/10.1101/2020.03.24.20042291

[19] https://royalsociety.org/topics-policy/Health%20and%20wellbeing/ramp/

[20] https://www.techuk.org/

[21] https://www.digicatapult.org.uk/