Written Evidence Submitted by Dr James Andrew Smith and Jonas Sandbrink, University of Oxford
James Smith is a Senior Research Associate at the University of Oxford. He researches drug and device evaluation, is a Reproducible Research Oxford Fellow, and teaches open and reproducible science topics. Jonas Sandbrink is a Researcher at the Future of Humanity Institute in the University of Oxford. He researches dual-use potential of life sciences research and biotechnology, as well as fast pandemic response countermeasures including vaccine platforms. Both authors are investigating how open science interacts with misuse risks of biological research.
We anticipate that a proposed approach to improving reproducibility will be promoting open science. In general, we support this and feel that it is likely to improve reproducibility. However, since we expect other respondents to address it, and the topic has been covered extensively in the literature, it is not the focus of this document. Instead, we largely focus on risks that may arise through proliferation of open science and call for consideration of those risks in implementation of policies in this area.
Research with dual-use potential is concerning
Certain research, particularly in the life sciences, has the potential to be misused. Some aspects of virology can increase risks from accidental misuse of pathogens, for example through laboratory escapes, or deliberate misuse in the context of warfare or terrorism. The anthrax attacks and historic large scale biological weapons programme in the Soviet Union exemplify this. Controversial gain of function research, like that demonstration of airborne transmission of H5N1, exacerbate concerns. Though we focus here on biological research, other research areas, such as artificial intelligence (AI), may also present misuse risk. AI can, for example, be used to create deepfake video or audio that can be used for malicious purposes.
Information from research, rather than its physical products, increasingly represents a significant risk. Advances in biotechnology mean that techniques previously requiring extensive technical expertise are available to amateur scientists. Mail-order DNA synthesis is now effectively available and not all DNA synthesis requests are screened for misuse potential; the public availability of various influenza genomes is therefore worrying. As biotechnology advances, there is a need to ensure that information with dual-use potential is shared responsibly.
Open science may increase dual-use risk
Several open science practices might increase risks of misuse of research with dual-use potential. Open data and code are now often requested or required on journal submission or by funders. Code can be a tool that can be misused, and data, such as pathogen genome sequences, can inform bad actors or be used to develop concerning models. For example, several machine learning models for viral engineering, along with the data used to train them, are publicly available. Genomes for viruses that have been eradicated are available in public databases. Recipe-like instructions for laboratory procedures, i.e. open materials, are becoming more common, reducing tacit knowledge requirements and facilitating more widespread reuse. A paper providing step-by-step instructions to engineer SARS-CoV-2 exemplifies this. Though in the past journals may have played a gatekeeper role in reducing publication of research with dual use potential, the rapid rise of preprints challenges any model that relies on intervention at the publication stage.
Reproducibility and openness
There are several reasons that increased sharing is expected to improve reproducibility. It is particularly relevant to methods reproducibility, which is the provision of sufficient detail to enable study procedures to be exactly repeated, and which includes computational reproducibility, i.e. use of the same data and code to generate the same results. Open sharing of data, code and materials facilitates verification of this aspect of reproducibility: if information must be shared, it can be checked and issues identified.
Results reproducibility, i.e. obtaining the same results in an independent study, may also be influenced by increased sharing. Though there is no guarantee that sharing information will increase results reproducibility, it does facilitate its assessment. An independent study team may be able to determine more accurately what was done in the original study, thus reducing discrepancies attributable to changes in study conduct. Sharing here increases the ability to reuse the study materials.
A final point across both aspects of reproducibility, is that increased accessibility may increase the likelihood that issues with reproducibility are detected or assessed, as more researchers review the information.
Reproducibility without full openness
As identified above, in certain cases it seems inadvisable to publicly share all aspects of research projects. In these cases, what can be done? Limiting public disclosure will inevitably reduce accessibility to research. However, reuse and verification can still plausibly be facilitated through other mechanisms.
One possible solution is to encourage the use of access controlled repositories, as are used for sharing of patient data with privacy concerns. Data and code could be deposited into those repositories when they have dual-use potential. Access to parties could then be granted on the basis of an ethical review processes that examines the intended research use. Such access could be given to peer reviewers, who could assess computational reproducibility (i.e. verification) and trusted parties attempting to reproduce results (i.e. reuse). If the research output is a tool, its use could be governed in this way.
This approach is preferential to data, code or materials being ‘available on request’ from authors, as there is no guarantee in that case that anything will in fact be made available, even under appropriate circumstances. Because it may be inadvisable for all researchers to be given access solely to assess, for example, computational reproducibility, some mechanism that provides a guarantee that such reproducibility has been assessed might be valuable. This could take the form of a dedicated peer review report on that topic or a ‘badge’ indicating that it was done.
We are not necessarily suggesting that this is the final solution: more research is needed to identify promising strategies to mitigate the risks presented by dual-use research, and, more generally, to find the appropriate balance between reducing dual-use concerns and facilitating reproducibility.
Role of various stakeholders
Funders and institutions need to encourage and incentivise responsible disclosure of worrying information rather than public sharing in all cases. Tools that facilitate this should be available and funders and institutions could play a role in assessing existing tools and developing new ones if needed; our impression is that few suitable tools are currently available. Funders and institutions must also consider the dual-use potential of research during funding decisions and ethical review. Researchers too must consider this; increased awareness of the possibility of misuse may be valuable, and education on this could be facilitated by institutions and funders. Governments can play a role in requiring other stakeholders to consider this critical issue. Assessment of risks of any proposed solutions to the reproducibility crisis is essential.
https://www.ndorms.ox.ac.uk/team/james-smith-1. James’s postdoctoral position is funded by the National Institute for Health Research (NIHR) Oxford Biomedical Research Centre (BRC). The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR, or the Department of Health. Reach him at firstname.lastname@example.org
 James has received funding for this project from EA Funds via the Long Term Future Fund
 For example,: Munafò, Marcus R., et al. "A manifesto for reproducible science." Nature human behaviour 1.1 (2017): 1-9.
 Goodman, Steven N., Daniele Fanelli, and John PA Ioannidis. "What does research reproducibility mean?." Science translational medicine 8.341 (2016): 341ps12-341ps12.
 For an example of badges to incentivise open science practices, see here: https://www.cos.io/initiatives/badges