The Alan Turing Institute, Public Policy Programme
Submission to the Joint Committee on the Draft Online Safety Bill
Dr. Bertie Vidgen & Professor Helen Margetts
The Alan Turing Institute
The Alan Turing Institute is the UK’s national institute for data science and AI. Its mission is to make great leaps in data science and artificial intelligence research in order to change the world for the better. The Alan Turing Institute undertakes research which tackles some of the biggest challenges in science, society and the economy. For more information contact Dr. Bertie Vidgen.
The Online Safety Team is a new part of The Turing’s Public Policy Programme. We provide objective, evidence-driven insight into the technical, social, empirical and ethical aspects of online safety, supporting the work of public servants (particularly policymakers and regulators), informing civic discourse and extending academic knowledge. High-priority issues include online hate, harassment, extremism, misinformation and disinformation. We have three main workstreams.
We broadly welcome the shift from “online harms” to “online safety” in UK policymaking and regulation as it emphasises what we want to encourage (safety), rather than what we want to reduce (harm). However, it raises the longstanding problem of definitions and “what is a harm?” Currently, the term “harm” is used to describe both (a) online activity and content and (b) its negative effects, such as inflicting emotional or physical suffering. This is unclear and could cause real confusion. We advise that the singular conflicted use of “harm” is replaced with three terms.
In some cases hazard and harm are very closely connected. For instance, each time Child Sexual Abuse Material is shared or viewed it directly violates the rights, privacy and dignity of the abused child, recreating the exeperience of abuse. In this case, the hazard and the harm are exactly the same. In other cases, there is a far larger gap between the online activity and the harm. For instance, the mental health problems caused by bullying and harassment typically emerge from repeated interactions which take place over many weeks and months. The exact impact of each bullying interaction can be difficult to assess as they are insidious and cumulative. Unpicking the longer and more complex pathways from hazard to harm is crucial to fully understand the negative effects of hazardous content and activity.
Most platforms do not have the means or motivation to assess whether hazardous content and activity actually lead to harm being inflicted on users. Instead, they make assessments based on their likely impact. This leads to what we have previously described as a “harms paradox”: most online content which is taken down for being harmful may have not actually inflicted any harm, precisely because it has been taken down in anticipation of harm being inflicted. There is no easy solution to this paradox; assessing the hazard of content is intrinsically difficult and often involves making subjective assessments but only moderating content that is proven to have inflicted harm is both undesirable and infeasible. It remains largely unclear how platforms make assessments about hazards, harms and risk factors (and even whether they use frameworks similar to this). The OSB should mandate platforms to make this information public, with heightened reporting expectations for Category 1 platforms.
A clear tension in the Draft OSB is that, in practice, implementing the adults’ and children’s risk assessments (OSB: Part 2, Chapter 2, Section 7, 9-10) will, given the resource constraints faced by platforms, involve prioritizing some hazards over others, materially affecting which hazards people are protected against. This reality is not fully acknowledged in the Draft Bill. This is a problem as harm prioritization is intrinsically complex, and platforms may face conflicting pressures when making decisions, such as time-sensitivity, public perception, resource constraints, and the likelihood of harm being inflicted. Deciding what hazards to tackle will involve making numerous normative decisions which are likely to be contested and controversial. How, for example, should a platform compare hate speech against bullying? Or suicide and self-harm ideation against doxxing? We advise that platforms are mandated to not only explain how they define hazards and harms (see above), but also to explain their rationales behind comparative judgements, i.e. hazard prioritization, as well as any protocols for making decisions in exceptional periods of high pressure.
The law is an appropriate starting point for hazard prioritisation but, given its inconsistency and narrow scope, will not fully resolve this issue. This raises another point of tension in the Draft OSB: Codes of Practice are only provided for illegal terrorism and Child Sexual Exploitation and Abuse (OSB: Part 2, Chapter 4, Section 29, 2-3). We welcome both Codes but advise that similar Codes are developed for all hazards which can be punishable by law. The Carnegie Trust, working with a network of civil society and other organisations, has already developed an initial Code for hate crime.
Hazardous online content and activity is generally tackled by content moderation systems, implemented and managed by the host platform. The Draft OSB references a two-step process, stating that under the “duties to protect adults’ online safety” platforms have to ensure that they have clear and accessible Terms of Service, and apply them consistently (OSB: Part 2, Chapter 2, Section 11, 3). This is a high-level approximation of the content moderation process, which we translate into three steps: (1) set the policy, (2) identify content/activity that contravenes the policy and (b) handle the contravening content/activity. A weakness of the Draft OSB is focusing too much on how policies are set, rather than how they are implemented. We illustrate the importance of all three steps, and how they are interrelated, with a short discussion of how a hypothetical platform might handle online hate.
2.1 Setting the policy
The platform first creates the policy. Outside of illegal content and activity, platforms set policies based on their social values and the proposition that they offer to users. For example, Gab, Bitchute and Parler emphasize free speech and, accordingly, implement minimal content moderation. In practice, all platforms have to balance the tradeoff between (i) protecting users from harm and (ii) protecting users’ freedom of expression. This is essentially a decision about what degree of hazard is acceptable. For instance, prioritising protecting users from harm over protecting free expression means that even low hazard content will be considered unacceptable (see Figure 1). The spectrum of hazard tolerance should be reflected in the OSB, instead of treating online safety as a binary question of whether or not a platform is “protecting from harm”.
Figure 1, The hazard of different types of online hate
2.2 Identifying contravening content
Hazardous content and activity is typically identified by platforms through a mix of human- and AI-based approaches (see below). If the systems for identifying hazardous content are inadequate then the platform’s desired tradeoff between protecting from harm and protecting free expression will not be achieved, leading to one of two outcomes:
The Draft OSB gives comparatively little attention to the processes by which contravening content is identified. Ofcom should be given powers to interrogate the processes used by platforms, as we discuss below in relation to the use of technologies such as Artificial Intelligence.
2.3 Handling contravening content
Once contravening content has been identified it needs to be handled. In our previous work we identified 14 interventions which are known to be applied by platforms to hazardous content, including suspending content, banning the creators, limiting how many times content can be engaged with, and limiting how many times it can be shared. Each of these interventions apply different amounts of friction to constrain the content’s spread and impact.
The Draft OSB requires that platforms explain “how priority content that is harmful to adults is to be dealt with” (OSB: Part 2, Chapter 2, Section 11, 2). This needs to be better specified, and we advise that platforms are required to explain (a) which interventions they use to apply friction and (b) what criteria is used to justify different types of friction being applied. In principle, we advise that the degree of friction is based on the assessment of hazard, with content and activity assessed to be more hazardous being subjected to more friction -- although platforms may have other well-justified rationales. We also advise that Category 1 platforms should be mandated to assess how different interventions affect users’ freedom of expression and privacy, as part of their requirements under Part 2, Chapter 2, Section 12 of the Draft OSB.
Computational technologies, particularly Artificial Intelligence (AI), have been lauded as a way of achieving online safety because they are scalable, reproducible, improvable and fast. AI reduces the number of human touchpoints in the moderation process, minimising how many hazards trust and safety professionals are exposed to -- a growing concern given they routinely suffer from mental health problems and are subjected to poor work practices. AI is most often used to find hazardous content, and can also be used to automatically decide how content should be handled.
Our research identifies numerous problems when using simplistic or badly-designed AI to detect hazardous content, three of which we outline below.
As far as we can identify, the Draft OSB makes no mention of Artificial Intelligence, which we understand is so that the Bill does not go out-of-date and does not implicitly encourage use of a single technology. This is sensible but raises a clear problem given that how content is moderated matters, and human- and AI- based moderation systems are fundamentally different. To better understand the challenges of using AI (a very fast-moving area of research) Ofcom should be given specific powers to investigate and scrutinise the technologies that platforms are using for content moderation, with a mandate for AI.
Finally, we note that assessments of how AI is used in content moderation are by nature partial and incomplete as almost all platforms do not open up their models for external evaluation. Most research focuses on proxies; either academic models, which are trained on open source datasets, or the Perspective API from Google’s Jigsaw. There are powerful commercial and logistical reasons for not fully opening up all AI for scrutiny. It constitutes IP, reflecting tens of millions of dollars of investment, and often relies on some signals which cannot be publicly shared, such as users’ social network graphs. Nonetheless, if we do not identify ways of scrutinising the AI used by platforms then we will continue to not understand a key part of the content moderation process. Potential solutions which could be explored include use of TREs, releasing models in a sandbox, and granting permissioned access to simplified versions of models.
 For a longer discussion of definitions, see The Turing’s Response to the Online Harms White Paper.
 See Scheuerman et al.’s 2021 paper, A Framework of Severity for Harmful Content Online. Our points draw on forthcoming work by Vidgen, Cowls and Margetts (2021).
 See The Alan Turing Institute’s Report on online hate regulation for Video Sharing Platforms, commissioned by Ofcom.
 See Turing, Report on online hate regulation for Video Sharing Platforms.
 See The Law Commission, Modernising Communication Offences: A final report
 See Carnegie Trust, Draft code of practice in respect of hate crime and wider legal harms
 For a longer discussion of challenges which may lead to lower precision or recall in hazardous content detection, see our previous paper on Challenges and Frontiers in Abusive Content Detection.
 See Turing, Report on online hate regulation for Video Sharing Platforms.
 In 2020 it was reported that Facebook agreed to a $52 million settlement with its moderators.
 For more discussion of challenges of abusive content detection see Learning From the Worst and Challenges and Frontiers.
 See CAD: Introducing the Contextual Abuse Dataset.
 See Hatemoji, lead authored by Hannah Rose Kirk.
 See HateCheck, lead authored by Paul Röttger.