Written Evidence Submitted by Biomathematics & Statistics Scotland (BioSS)
Contributors: Helen Kettle, Nick Schurch, Mark Brewer
Biomathematics & Statistics Scotland (BioSS) is a mathematical and statistical consulting and research organisation comprising nearly forty experienced specialist academic staff. BioSS is a member of the SEFARI (Scottish Environment, Food and Agriculture Research Institutes) collective contributing research, consultancy, and training across agricultural, environmental and biological research organisations in Scotland and internationally. BioSS is recognised internationally for its collaborative work at the interface of the quantitative sciences, applied sciences and stakeholder policy, and for its independent statistical and modelling methodological research. As one of the leading European groups in these fields, we are keen to contribute to this call.
Statistics and modelling are core tools for modern science that underpin a great deal of the published scientific literature. At their core, statistical methods concern the quantification of the degree of certainty with which we can draw conclusions from data. This is intimately linked with the concepts of reproducibility (given a population, hypothesis, experimental design, experimenter, data, analysis plan, and code, you get the same parameter estimates in a new analysis), replicability (given a population, hypothesis, experimental design, and analysis plan you get consistent estimates when you repeat the experiment and subsequent analysis) and irreproducibility (the inability to recreate statistical results using fixed data and code).
A lack of critical statistical thinking throughout the scientific process is a large contributor to the reproducibility crisis across a broad range of applied sciences, and is often characterized by one or more of the following issues:
● a lack of appreciation in the difference between exploratory and confirmatory science;
● inappropriate or poor experimental designs that lack statistical power, or are strongly confounded, or which are fundamentally unable to test the hypothesis in question;
● inappropriate levels of replication;
● inappropriate selection/exclusion of data;
● a lack of appreciation of the assumptions inherent to the statistical methods being applied to data;
● a lack of understanding of the detailed meaning of critical statistical terms;
● arbitrarily chosen thresholds;
● an over-reliance on p-values and statistical ‘significance’ as labels of truth or correctness’
● inadequate exploration of uncertainty; and
● over interpretation of results.
Other submissions will probably discuss these issues in the context of applying statistical methods in other scientific fields. What is less visible, however, is how the fields of statistics and mathematical modelling are impacted by the reproducibility crisis. We have addressed the individual areas raised in the call, in this context.
1. The breadth of the reproducibility crisis and what research areas it is most prevalent in
Research into statistical methodology and bioinformatics has a strong track record in open science. Many key methods are available through an excellent suite of free and open-source software (FOSS, e.g., Github, CRAN, BioconductoR, scipy, numpy, scikit-learn, STAN, WinBUGs, Octave) and open data repositories (e.g., ensembl, NCBI, encode, etc). These channels improve the reproducibility and replicability of these data and methods through requirements for metadata, packaging structure, documentation and code testing. This facilitates independent assessment of the reproducibility and replicability of the methods and data, and the work that makes use of them.
Despite these excellent resources, irreproducibility remains widespread across methodological statistics research, applied statistics consultancy, mathematical modelling, and (to a lesser degree) bioinformatics. Statistical research is still being published without the code and/or data used to generate results being made openly available. Such studies are not easily reproducible. Even an assessment of their replicability is challenging, requiring both the skills and time to re-implementing the methodology. This issue is compounded by the widespread use of proprietary, closed-source, software packages (e.g., SAS, STATA, SPSS, Minitab, MatLab), or software that relies on closed-source proprietary libraries (e.g., IMSL Fortran Library). This impedes attempts to assess the reproducibility and replicability of published results that use these methods, and creates significant issues around equality of access and availability across the international scientific community.
Statisticians and bioinformaticians doing collaborative applied work or consultancy face considerable pressure to ‘produce results’, just like the scientists that work in these applied fields. This encourages a range of poor practices, including:
Opinions differ in the community as to which of these constitute poor practice, and which are more serious and should be considered malpractice, but all have significant negative impacts on the quality, reproducibility and replicability of the work and negatively impact the scientific literature.
2 The issues in academia that have led to the reproducibility crisis
As in other fields, the reproducibility crisis in statistics, maths modelling, and bioinformatics is a complex issue. In our opinion, three interrelated key drivers lie at the core of the problem:
2.1 The modern scientific funding and publication model
The way governments, charities and businesses fund scientific research is the most important driver of the reproducibility crisis. While funding for scientific research is necessarily limited and we fully appreciate the need for a competitive element in the process, there are issues with how this is applied to science as a collaborative, cooperative process. Most commonly, funders allocate resources by judging the ‘value for money’ of competing research proposals, and by assessing the ‘quality’ of the scientists applying. Stubbornly, publication in “high impact” journals remains the dominant marker of ‘success’ (despite efforts from some quarters to prioritize aspects of Open Science) and is the key metric used to compare researchers. This process encourages scientists to propose research that prioritises quantity over quality, to favour currently ‘hot’ research areas, and to provide exaggerated claims (both in terms of significance, impact, and novelty) for published work. This “publish-or-perish” environment incentivises researchers to retain control of key data or analysis methods, to try to gain a competitive advantage over competing researchers.
The situation is exacerbated by the private, for-profit, scientific publishers who gatekeep much of the scientific literature. These publishers have staggering profit margins, much of which originates from public sector science funding, including fees for making publicly funded scientific research publicly accessible via ‘open access’ papers. This encourages journals to lower standards of acceptance to fill ever-growing page counts across ever-increasing numbers of journals. Importantly, most of these journals do not have strong policies that would help drive open and reproducible research.
2.2 Lack of researcher time and resources
Making scientific research reproducible and replicable requires a time investment from researchers. Researchers are under considerable time pressure, driven by the funding and publication models outlined in 2.1. Consequently, researchers rarely have funded time available to invest in ensuring the reproducibility of their work, and often deprioritize these issues in favour of additional new research work. The lack of explicit ring-fenced funding for reproducibility within many grants is coupled with a lack of valued publication routes for “negative” results which, in many cases, would be useful and informative contributions to the literature. The leadership of the Scottish Government's Rural and Environment Science and Analytical Services Division (RESAS) deserves recognition here for explicitly funding researchers time for ensuring reproducible and replicable research in the upcoming 2022-2027 funding round.
2.3 Entrenched scientific leadership that is invested in the status quo.
Scientific leadership, across funders, research institutions and publishers, is dominated by individuals that have been, and continue to be, successful under the current funding & publishing paradigms. There is little incentive for these individuals to change these paradigms, and a strong incentive for them to continue the status quo. This is changing slowly with initiatives such as the eLife journal, led by high-profile successful scientists, placing increasing emphasis on reproducible research (see comment above on the forthcoming Scottish Government's RESAS science programme).
3 Addressing the reproducibility crisis:
3.1 The role of research funders, including public funding bodies
Research funders are key to addressing the reproducibility crisis. Funding bodies need to change ethos to focus on emphasising reproducibility, replicability, and quality of research, over quantity and “high impact” publications (especially since what constitutes “high impact” is discipline-specific). In our opinion, funders should adopt a zero-tolerance policy towards irreproducible closed research, coupled with clear financial support for the additional time and infrastructure costs associated doing open and reproducible science (including open access publication). This support should explicitly recognise that they expect less research to be done for a given amount of funding, giving scientists more scope and resource to make their research reproducible and hence favour quality over quantity. It should also include increased funding for collaborative and multidisciplinary research with a specific focus on statistical, modelling and bioinformatics collaborations. This will help to deliver robust and appropriate experimental design and statistical analysis, which avoids many of the pitfalls outlined previously. Funders should also recognize the value of alternatives to journal publication, such as pre-prints, reducing the amount of their funding that is captured by the scientific publication industry.
Within statistics, mathematical modelling and bioinformatics, this should include explicit long-term support for best-practice FOSS software development including the time required for code testing, documentation, packaging and publication. It should also fund extensive training across Open Science areas to engender long term culture change. Specific PhD courses on code-sharing and data management are necessary to equip the next generation of statisticians and modellers with the skills needed to embrace the culture of Open Science as a fundamental requirement of their work. This could form part of the syllabus of programmes such as the APTS training programme.
3.2 The role of research institutions and groups
Research institutes and groups have a vital twofold role to play in addressing the reproducibility crisis. In our opinion, the first (and arguably most important) role is to drive a culture change within the organization. This requires institutional management to shift expectations of staff productivity and performance, focussing on reproducibility and Open Science alongside publications and winning competitive funding. This is likely to require both hard requirements established through institutional policies (e.g., publication in open access journals, pre-printing, data sharing, open-source code publication, etc), and softer initiatives (e.g., reproducibility mentors/champions, Open Science community groups, reproducibility and replicability badges, and encouragement for widely collaborative research with outputs that become products of a wider community). Research institutes and groups also need to provide the infrastructure, training, awareness, and support to enable researchers to take the extra time and effort required for open science.
3.3 The role of individual researchers
Within many science disciplines there is considerable individual resistance to adopting Open Science research practices. It is commonplace for statisticians, mathematical modelers and bioinformaticians to work either alone, or in small groups. This often results in strong feelings of personal ownership and allows researchers to justify choosing closed-source and/or proprietary software and methods. Coupled with the problems engendered by the current funding and publishing paradigm (see 2.1) this often results in a reluctance to share code, models, and methods freely. Encouragement from funders, publishers, research institutions and groups, and peer pressure can all incentivise change, but ultimately Individual researchers are responsible for the quality of their research, including making it as reproducible as possible within their specific funding, time and infrastructure constraints. In our opinion, this requires researchers to shift their attitudes towards their work (which, we reiterate, is often publicly funded) from one of ownership to one of custodianship. We anticipate that such a shift would have broad benefits, including a focus on:
Availability of high-quality effective training for each of these aspects will be critical to individual adoption.
3.4 The role of publishers
Publishers have an important role to play in addressing the reproducibility crisis that begins with recognizing that they operate in a field that is a public good. Software bugs are extremely hard to avoid (particularly in academia where code is currently not routinely independently tested) and results stated in papers shouldn’t be accepted for publication without access to both data and code.
It is our opinion that all publishers should adopt policies that require all primary data (appropriately sanitized) and code to be published freely and openly alongside their publications. This should include replicable code for all data cleaning and manipulation, statistical analyses, and mathematical models. For mathematical modelling, model simulations should be treated as data and submitted alongside the publication. Publishers also need to reform peer review, a critical gatekeeper of scientific quality and rigor. Currently, peer review is poorly positioned to assess the reproducibility of scientific results. Even where data and code are provided, assessing reproducibility required a significant investment of time and journal editors are reluctant to require this from already-time-poor volunteer reviewers. A critical positive step would be for publishers to commit to publishing peer reviews alongside publications (as is already the case for some journals, e.g., F1000). This transparency ensures readers can assess the quality of the review process, including any efforts to reproduce and/or replicate the work. It also has considerable additional benefits, including encouraging objective reviews, increasing dialogue around a paper and, where reviewers’ names are given, giving reviewers credit for their efforts. Publishers might also investigate the feasibility and benefits of:
These suggestions require academics to have sufficient, specifically funded, time to undertake these tasks, either funded directly by the journals, or with ring-fenced funds in grants specifically for contributing peer review to the scientific community.
3.5 The role of governments
In our opinion, government's role in addressing the reproducibility crisis is to provide a clear, unambiguous, statement on the need for reproducibility and replicability across the science sector and to steer science funding bodies towards reproducible, replicable, Open Science research.
Governments are not one-step removed from this issue. The reproducibility and openness of statistical analyses, mathematical modelling, and bioinformatics are directly relevant for scientifically informed policy decisions. This has been dramatically highlighted by the Imperial College pandemic modelling during the COVID-19 crisis. Briefly, in March 2020, Imperial College researchers published a report on their pandemic modelling that is credited with being responsible for the UK government's subsequent change in strategy from “Herd Immunity” to “Lockdown”. The report’s conclusions were based on closed-source model code that, despite being used to inform policy for more than a decade, had never been externally reviewed. When requested for review, it emerged that the model codebase was in a poor state, and it took a concerted effort by experienced software engineers before it could be released to the public for review. Even then, the subsequent reviews of the model code were negative; the code contained critical bugs, was poorly documented, had no testing, and the results in the report were not reproducible. These issues were widely reported on and were eventually raised by MPs (David Davis and Steve Baker), calling into question the government's response, and their reliance on an unreviewed single model from a single group. Had the Imperial team’s code had been published when the first results from it were published, it is likely that the results from it would have been both more reproducible and more reliable for informing policy during the pandemic.
4 What policies or schemes could have a positive impact on academia’s approach to reproducible research
We have highlighted potential policy changes for relevant sectors in Sections 3.1, 3.2, 3.4, & 3.5. In addition, we would like to suggest establishing a national award scheme for Open Science, along the same lines as the Athena Swan, Stonewall, or Investors In People awards. Institutes and scientific organisations could apply for a grading for their Open Science status (e.g. bronze, silver, gold), representing a public statement reflecting their commitment to, and track record of, reproducible, replicable research. This could then be used in job adverts to attract high quality staff, and to help inform funding decisions for government or charity research funding calls.
5 How establishing a national committee on research integrity under UKRI could impact the reproducibility crisis
A national committee on research integrity could directly impact the reproducibility crisis by:
An important function would also be to provide visibility for instances where Open Science has a particularly positive impact on wider society, using these as examples to inspire others.
 Patil, Peng & Leek, 2016, doi: 10.1101/066803
 See, for e.g., Makin & de Xivry, 2019, doi: 10.7554/eLife.48175, Underwood, 1997, doi: 10.2134/jeq1998.00472425002700010038x, Siegfried, 2010, doi: 10.1002/scin.5591770721, Diong, 2018, doi: 10.1371/journal.pone.0202121
 Collberg & Proebsting, 2016, doiL 10.1145/2812803
 Nature Editorial, 2011, 10.1038/470305b
 Tijdink et al, 2013, doi: 10.1371/journal.pone.0073381
 e.g., Head et al,, 2015, doi: 10.1371/journal.pbio.1002106, Martinson, 2005, doi: 10.1038/435737a, Smaldino & McElreath, 2016, doi: 10.1098/rsos.160384
 De Vries, 2006, 10.1525/jer.2006.1.1.43
 Fidler & Wilcox, 2001, https://plato.stanford.edu/entries/scientific-reproducibility/, Baker, 2016, https://www.nature.com/articles/533452a
 Van Dijk, 2014, doi: /10.1016/j.cub.2014.04.039 and van Wesel, 2016, 10.1007/s11948-015-9638-0
 Grimes, et al, 2018, doi: 10.1098/rsos.171511 and Fanelli, 2010, doi: 10.1371/journal.pone.0010271
 Fortunato & Galassi, 2021, doi: 10.1098/rsta.2020.0079
 Kiai, 2019, doi 10.1038/s41562-019-0741-0 and Bornmann & Mutz, 2015 doi:10.1002/asi.23329
 Kim et al., 2020, doi: 10.7717/peerj,9924, Samota & Davey, 2021, doi: 10.3389/frma.2021.678554
 Sholler et al 2019, doi: 10.31235/osf.io/qr8c
 Rosenthal, 1979, doi: 10.1037/0033-2909.86.3.638 and Franco et al, 2014, doi: 10.1126/science.1255484
 Van Dalen, 2021, doi: 10.1007/s11192-020-03786-x
 E.g., https://warwick.ac.uk/fac/sci/statistics/apts/
 Armeni, 2021, doi: 10.1093/scipol/scab039
 Rowhani-Farid et al., 2018, doi: 10.12688/f1000research.13477.2
 Fang & Casadevall, 2015, doi: 10.1128/IAI.02939-14
 Sholler et al 2019, doi: 10.31235/osf.io/qr8c
 LeVeque, 2012, http://faculty.washington.edu/rjl/pubs/topten/topten.pdf
 Bravo et al, 2019, doi: 10.1038/s41467-018-08250-2; van Rooyen et al, 2010, doi: 10.1136/bmj.c5729; Smith, 1999, doi: 10.1136/bmj.318.7175.4; Ross-Hellauer et al, 2017 doi: 10.1371/journal.pone.0189311