Written Evidence Submitted by Sense about Science

(RRE0089)

Summary

The declaration of a reproducibility or replication ‘crisis’ has not, over the past decade, been the vehicle for change that some imagined. It has not been responsible for substantial improvements in the quality and transparency of research or in the wider public and policy understanding of how reliable the results of research are.

The improvements that have occurred have been targeted and specific, such as increasing the publication of clinical trial results by universities and companies, which has been addressed by our AllTrials campaign, by this committee and more recently by others. Or the introduction of academic rewards for collaboration and other alternatives to the volume of research output, as recently seen at Utrecht University in the Netherlands. Some important targeted innovations remain frustratingly overlooked alongside hand wringing about the ‘crisis’, such as Registered Reports, where journals accept manuscripts based on protocols rather than results.

Why is this? On the one hand the talk of crisis has proven too narrow as a basis for addressing the wider research culture problems that constantly undermine the reliability and transparency of research, most notably a system that rewards output volume over quality and collaboration. On the other, it has proven too broad for action, masking the specific phenomena that make it hard to reproduce results and which require meaningful, precise measures by named parties.

‘Replication crisis’ was a great klaxon ten years ago. It urged researchers to self-reflection. But its continued use risks becoming a counsel of despair: too big a problem to address particular research practices or to help the public, reporters and decision makers understand issues of quality and reliability, and too small a lens to tackle the systemic problem of the research reward system.

The committee could use this opportunity to rectify the shortcomings of the reproducibility focus, and instead refocus efforts on specific priorities for improving and understanding quality and reliability; and also to acknowledge that much research does not bear reproducing.

Background

Sense about Science is an independent charity that promotes the public interest in sound science and evidence. We equip people to ask good questions, we equip researchers to answer them in human, and we work with both to achieve transparency about evidence and a constructive environment to discuss research.

We have a strong track record over two decades in engaging wider society with issues of the reliability of research findings, and in encouraging transparent, publicly assessable practices by researchers. These include:

The first public guide to peer review, ‘I Don’t Know What to Believe’[1], and the Quality and Peer Review programme, as part of which Sense about Science convenes an international series of expert-led workshops for early career researchers, equipping them with the skills to initiate discussions about research quality.
AllTrials [2], the global campaign to increase clinical trials transparency in the interests of patients, which achieved international regulation and compliance on the publication of all clinical trial results [3] and on which we previously gave extensive written and oral evidence at the request of this committee.
The world’s first public guide to quality and reliability in data science [4].

This submission draws on our experiences and reflects our concern that while talk of the reproducibility crisis has focussed attention on the conduct and publication of research, it has not necessarily enabled addressing it and may in fact have made it less clear what needs to be done.

The issues in academia that have led to the reproducibility crisis

Improvements in research quality and transparency is a more useful discussion than reproducibility. For example, two Artificial Intelligence programmes trained on the same biased data will produce the same biased results. Or as we saw over the 2000s, repeated Hormone Replacement Therapy studies that did not randomise through a clinical trial were perfectly able to reproduce the same errors in selection bias.

Let’s more usefully talk about quality and reliability:

Transparency in protocols, analyses and data

Where there is insufficient transparency about methods and protocols, there is not enough information for other researchers or for users of the research to assess the reliability of a research output, so the question of reproducibility is moot. At the start of the AllTrials campaign in 2013, it was estimated that less than 50% of clinical trial results on the medicines in use were being published [5]. Without openness about data, methods and protocols, clinicians and regulators could not conclude that evidence from a clinical trial was robust enough to support the introduction of a new drug, which led to deaths from dangerous medicines [6], wasted resources and the ethically dubious practice of enlisting fresh volunteers in repeating clinical trials that had previously failed [7]. AllTrials recognised that this represented a series of failures by universities, pharmaceutical companies, regulators and funders and moved rapidly (and quite successfully) to identify distinct, targeted actions.

Training is needed in what constitutes high quality research

The Committee’s 2018 report on Research Integrity was clear that information on the availability and quality of training is patchy [8]. The lack of training in good-quality research has real-world consequences, such as aggressive promotion during the pandemic of unreviewed research published on pre-print servers [9].

This occurs in a context of rewards and incentives that pit quality and reproducibility against the need to “publish or perish". A 2020 study of researchers in the UK by the Wellcome Trust [10] showed that:

43% of respondents agreed that their institution/workplace places more value on meeting metrics [eg high production of outputs or intense focus on journal impact factor] than it does on research quality.
23% of junior researchers have felt pressured by their supervisor to produce a particular result.
Only 46% of survey respondents agreed they had a clear understanding of what their workplace considered compromised research to be.

Lack of training contributes to poor practices in research and transparency [11]. This echoes the experience of our Quality and Peer Review programme, which trains 200-300 early career researchers a year through competitive entry. They tell us they don’t know what to make of issues such as misconduct, fraud and bias and would like to develop the confidence and skills to engage society in discussions about what makes good-quality research.

Knowledge among decision-makers on how to scrutinise data

Concern about the reproducibility crisis has co-existed with very little concern about scrutiny of the growing applications of data science and machine learning across research and society. The problems here are manifold:[12],[13]

data used to train a model or AI to answer a question may be different to what the data was collected for
excluded/missing variables and assumptions made in designing a model are not explained
the reliability of an AI, ie. how often it produces consistent, reproducible recommendations, is not understood by those involved in implementing those recommendations.

A model or AI’s recommendation can be reproducibly bad, which is a fundamental problem with the “reproducibility crisis”. Understanding where the data has come from and what assumptions have been made must precede questions about whether findings are reproducible or not.

What policies or schemes could have a positive impact on academia’s approach to reproducible research?

AllTrials has shown how a complex problem can be tackled with a multi-faceted approach involving multiple stakeholders to improve clinical trials transparency, in this case governments, funding bodies, pharmaceutical companies, national health agencies, research institutions and journals. The Committee played an important role in this “joined-up” approach when, in January 2019, it undertook to scrutinise universities that did not do enough to improve clinical trials transparency within six months. The result of this was that reporting among British universities and NHS trusts increased from 48.1% that month to 63.9% in June 2019 [14]. It is worth acknowledging progress as above, because our experience with AllTrials is that acknowledging good behaviour has an incentivising effect on others to improve their own standards. The Committee could investigate the idea of funding bodies making transparency of data and methods a condition of securing future funding, to reduce the risk of unreliable results being used to make life-changing decisions without due scrutiny of their quality.

The Committee could consider the case of Utrecht University, which in 2021 adopted its Recognition and Rewards Vision [15], to foster a collaborative culture, creating an open and honest environment that allows researchers to challenge, critique and speak out. In particular, the university’s model recognises openness in the sharing of data, as well as transparency, public engagement and accountability, and prioritises quality over quantity.

An initiative in publishing that is a rather obvious way to improve transparency, research methods and reporting is Registered Reports [16], where a protocol is submitted and peer-reviewed before the results are known. If the protocol is accepted, then the results are published provided the research follows the approved protocol, irrespective of what they are.

There is an urgent need to address the absence of systems for quality, reliability, transparency and scrutiny in data science, modelling and machine learning, much of which does not fall under the usual systems of scrutiny and peer review and for which there are few standards in operation. We have worked internationally with expert groups and user communities, including currently joint production with the EU Commission and Parliament of a guide to scrutinising reliability in use of models for decisions. These initiatives have produced three questions that might usefully replace reproducibility in the pursuit of reliable research:

Where did it come from?
What assumptions are being made?
Can it bear the weight being put on it?

(October 2021)

[1] Sense about Science (2005). Available at: https://senseaboutscience.org/activities/i-dont-know-what-to-believe/

[5] Ross J.S., et al. Trial Publication after Registration in ClinicalTrials.Gov: A Cross-Sectional Analysis. PLoS Med 6(9), e1000144 (2009). Available at: https://doi.org/10.1371/journal.pmed.1000144

[6] Cowley,, A.J., et al. The effect of lorcainide on arrhythmias and survival in patients with acute myocardial infarction: an example of publication bias. Int. J. Cardiol. 40 (2), 161-166 (1993). Available at: https://doi.org/10.1016/0167-5273(93)90279-P.

[7] Altman, D.G., and Moher, D. Declaration of Transparency for Each Research Article. BMJ, 347 (2013). Available at: https://doi.org/10.1136/bmj.f4796