Written Evidence Submitted by

The Centre for Genomic Pathogen Surveillance

(C190090)

Introduction

  1. The Centre for Genomic Pathogen Surveillance (CGPS) was established to inform pathogen control strategies on a local, national, and international scale.
  2. The CGPS is based at the Big Data Institute, University of Oxford. Through global partnerships with the US CDC, the European CDC, the World Health Organisation, Food and Agriculture Organisation, National Institutes of Health and Public Health England, Wales and Scotland the CGPS provides the data tools and infrastructure to help lead the fight against high risk microbial pathogens with a focus on antimicrobial resistance (AMR) – one of the biggest threats to global health security.
  3. The CGPS deliver a combination of structured population surveys and whole genome sequencing to generate high-quality, openly-available surveillance data which will identify high risk clones. The CGPS develops software tools and technologies that make data interpretation and interpretation accessible to all.

 

  1. The CGPS ensures data and tools are open access. Genomic data generated through research alliances are deposited in public repositories and shared via CGPS software platforms. Easy to use, intuitive software tools including Epicollect (http://five.epicollect.net), Microreact (https://microreact.org), Pathogenwatch (https://pathogen.watch) and Data-flo (https://data-flo.io) make data integration, visualisation and interpretation accessible to all.

 

  1. CGPS was pleased to be a partner in the UK consortium on COVID-19 genomics (COG-UK), which aims to link local sequencing centres with large scale facilities from across the UK to apply real-time genomic epidemiology to our understanding and ability to respond to the pandemic.

 

  1. Pathogen surveillance is a crucial part of early-stage and ongoing responses to COVID-19. The increased use of genomic methods and technology platforms have enabled real-time research and collaboration; have informed country policy commitments and action plans, and will create a strong framework for the use of genomic data methods and tools for future real time epidemiology.

 

  1. The sharing of proactive information sharing among scientists has been enabled by CGPS tools and platforms: these were created specifically to house and share data about virulent pathogens. This real-time information has been made widely available and used by public health agencies.

 

  1. The UK research and collaboration ecosystem supported by country policy commitments and action plans, and regional and international capacity and guidance, has created a strong framework for the use of genomic data sharing tools for pathogen surveillance during the COVID-19 pandemic.

 

Capturing high quality data during the crisis

The utility of Microreact and Data-flo in public health agencies

  1. Data analysis phylogenetics and delivery of insight for decision making are a key component and deliverable to end the pandemic. The CGPS team bring expertise in data handling, processing, linking and visualisation to enhance the delivery of data and interpretation using tools, they created, including Microreact, EpiCollect and Data-flo.

 

  1. The CGPS has worked with colleagues in Public Health Wales and Public Health Scotland to adopt Data-flo and Microreact. Post-installation support in the utility of these software applications has assisted implementation of these tools in order to streamline local data processing of COVID-19 associated metadata and visualisation of genomics data alongside descriptive data within multidisciplinary teams.

 

 

  1. Although the public versions of Microreact and Data-flo allow anybody to use these tools for free, governance rules mean that confidential, commercial, or personal data (e.g patient records or care home names) are not permitted to be uploaded to these sites. However, local installations allow
    1. Sensitive data to be kept within private networks such as NHSnet or public health agency computing infrastructure
    2. Authorisation of access to these data controlled via local authentication services such as Active Directory or LDAP

 

  1. Through local use of these applications, data collection from multiple sources followed by manipulation and aggregation has been possible, allowing automated creation of COVID 19 metadata before upload to the consortium servers on the MRC CLIMB platform (http://climb.ac.uk) hosted at Unis Birmingham and Cardiff. The ability to combine private metadata with phylogenies derived from SARS-CoV-2 genome sequences within a local installation of Microreact has allowed public health officials to communicate highly visual messages to personnel throughout the public health hierarchy. This has allowed a level of detail and geographic resolution that would not have been possible previously.

 

  1. The CGPS also contributed to the COVID-19 effort by building two completely new web applications
    1. Metadata Uploader: as part of the weekly collection of genomes and associated metadata a login-protected website allows sequencing sites without staff that have programmatic skills to upload the metadata easily. Working with CLIMB staff at the University of Birmingham, the website synchronises with the upload API hosted at CLIMB to ensure that the data is valid
    2. Pangolin: Colleagues at The University of Edinburgh (Andrew Rambaut et al) in the COG-UK consortium developed Pangolin (Phylogenetic Assignment of Named Global Outbreak LINeages), a software used to assign lineages to SARS-CoV-2 sequences. For those familiar with the UNIX command line the installation of this software is straightforward, however for those who are unfamiliar with the command line or without access to a UNIX computer, the CGPS developed an open source web-based application allowing users to:
      1. Assign lineages to genome sequences of SARS-CoV-2
      2. View descriptive characteristics of the assigned lineage(s)
      3. View the placement of the lineage in a phylogeny of global samples
      4. View the temporal and geographic distribution of the assigned lineage(s)

 

  1. Efficient and effective data connectivity is vital in creating a consolidated UK-wide COVID-19 dataset. Efforts to reduce data fragmentation and improve data connectivity will revolutionise our scientific understanding of the disease.

 

Contribution of R&D in understanding the transmission and epidemiology of SARS-CoV-2

COVID-19 Genomics UK Consortium (COG-UK)

  1. COG-UK is an innovative decentralised partnership connecting leaders in genomics and public health across the interface of government, public health and academia. COG-UK brings together NHS organisations, the four UK Public Health Agencies, multiple university hubs and academic institutes, to deliver rapid and large-scale sequencing and data analysis of the SARS-CoV-2 virus.
  2. Supported by £20 million funding from the Department of Health and Social Care, UK Research and Innovation (UKRI) and Wellcome, COG-UK is enabling decentralised sequencing through linking together multiple laboratories throughout the UK who are performing rapid and large-scale sequencing of the SARS-CoV-2 virus and generating the data used to understand its transmission and epidemiology.
  3. The true value of data comes from large aggregated datasets and the combination of datasets from different research cohorts and demographics can help to create a consolidated UK dataset.

Concluding remarks

  1. Collaborative, decentralised, open and transparent data generation and sharing approaches enabled the swift launch of Covid Genomics UK (COG-UK) to provide speedy real-world evidence to inform public health interventions.
  2. The free and unrestricted sharing of SARS-CoV-2 genomic data has been enormously important for public health efforts.
  3. The UK is a global leader in data science research and has some of the richest health data in the world. Greater connectivity and secure, anonymised streamlined data exchange between the health system and the research community has the potential for significant improvements to health research and public health.

 

 

 

(July 2020)