Supplementary written evidence submitted by Prof. Kalina Bontcheva, University of Sheffield, UK

 

 

The past few years have heralded the age of ubiquitous mis- and disinformation - aka fake news - which poses serious questions over the role of social media and the internet in modern democratic societies.  Topics and examples abound, ranging from the UK “Brexit” referendum and the 2016 US presidential election to medical misinformation (e.g. miraculous  cancer cures).

Social media now routinely reinforces its users’ confirmation bias, so often, little to no attention is paid to opposing views or critical reflections. Misinformation is often re-posted and shared thousands of times and sometimes even jumps successfully into mainstream media. Debunks and corrections, on the other hand, receive comparatively little attention.

The reasons behind this phenomenon are complex and lie at the intersection of:

  1. Online propaganda and “fake news”: State-backed (e.g. Russia Today), ideology-driven (e.g.  misogynistic or Islamophobic), or for-profit clickbait websites and social media accounts are all engaged in spreading misinformation, often with the intent to deepen social division and/or influence key political outcomes.  However, they should not be regarded as the sole source of online disinformation[1].
  2. Post-truth politics, where politicians, parties and governments frame key political issues in propaganda, instead of facts. Misleading claims are repeated, even when proven untrue by media or independent fact checkers. This has a highly corrosive effect on public trust.
  3. Partisan media: Today’s highly competitive online media landscape has resulted in poorer quality journalism and worsening opinion diversity, with misinformation, bias and factual inaccuracies routinely creeping in. Many outlets also resort to highly partisan reporting of key political events, which, when  amplified through social media echo chambers, can have acrimonious and divisive effects[2].
  4. Polarised crowds: As more and more citizens have turned to online sources as their primary source of news, the social media platforms and their advertising and content recommendation algorithms have enabled the creation of partisan camps and polarised crowds, characterised by flame wars and biased content sharing, which in turn, reinforces their prior beliefs (typically referred to as confirmation bias).

 

In this written evidence, I will elaborate further on questions i was asked as a witness by members of the Digital, Culture, Media, and Sports Committee (DCMS), as part of their enquiry into fake news.  The full transcript of my oral evidence is available here. Some of the headings below refer to question numbers from the transcript, where I was asked to provide further evidence in writing.

Q27: Potential Impact of Social Media Misinformation and Algorithms on Voting Behaviour

Paul Farrelly:..”I am not sure yet what evidence there is, in terms of definitive studies, as to whether it [online misinformation] actually changes voting behaviour.”

A small scale experiment by the Guardian exposed 10 US voters (five on each side) to  alternative Facebook news feeds [3]. Only one participant changed his mind as to how they would vote. Some found their confirmation bias too hard to overcome, while others became acutely aware of being the target of abuse, racism, and misogyny.  A few started empathising with voters holding opposing views. They also gained awareness of the fact that opposing views abound on Facebook, but the platform is filtering them out.

Q30: Russian Involvement in the Referendum

We analysed the accounts that were identified by Twitter as being associated with Russia in front of the US Congress in the fall of 2017, and we also took the other 45 ones that we found with BuzzFeed. We looked at tweets posted by these accounts one month before the referendum, and we did not find an awful lot of activity when compared to the overall number of tweets on the referendum, i.e. both the Russia-linked ads and Twitter accounts did not have major influence.

There were 3,200 tweets in our data sets coming from those accounts, and 800 of those—about 26%—came from the new 45 accounts that we identified. However, one important aspect that has to be mentioned is that those 45 new accounts were tweeting in German, so even though they are there, the likely impact of those 800 tweets on the British voter is, I would say, not very likely to have been significant.

The accounts that tweeted on 23 Jun were quite different from those that tweeted before or after, with virtually all tweets posted in German. Their behaviour is also very different - with mostly retweets on referendum day by a tight network of anti-Merkel accounts, often within seconds of each other. The findings are in line with those of Prof. Cram from the University of Edinburgh, as reported in the Guardian[4].

 

Journalists from BuzzFeed UK and our Sheffield  team  used the re-tweet  network to identify another 45 suspicious accounts, subsequently suspended by Twitter. Amongst the 3,200 total tweets, 830 came from the 45 newly identified accounts (26%).  Similar to those identified by Twitter, the newly discovered accounts were largely ineffective in skewing public debate. They attracted very few likes and retweets – the most successful message in the sample got just 15 retweets.

An important distinction that needs to be made is between Russia-influenced accounts that used advertising on one hand, and the Russia-related bots found by Twitter and other researchers on the other.

The Twitter sockpuppet/bot accounts generally pretended to be authentic people (mostly American, some German) and would not resort to advertising, but instead try to go viral or gain prominence through interactions. An example of one such successful account/cyborg is Jenn_Abrams. Here are some details on how the account duped mainstream media:

http://amp.thedailybeast.com/jenna-abrams-russias-clown-troll-princess-duped-the-mainstream-media-and-the-world

              “and illustrates how Russian talking points can seep into American mainstream media without even a single dollar spent on advertising.”

https://www.theguardian.com/technology/shortcuts/2017/nov/03/jenna-abrams-the-trump-loving-twitter-star-who-never-really-existed

http://money.cnn.com/2017/11/17/media/new-jenna-abrams-account-twitter-russia/index.html

A related question is the influence of Russia-sponsored media and its Twitter posts. Here we consider the Russia Today promoted tweets (RT published which ones they are here: https://www.rt.com/uk/413249-big-reveal-twitter-lists-just/) - the 3 pre-referendum ones attracted just 53 likes and 52 retweets between them.

 

We analysed all tweets posted one month before 23 June 2016, which are either authored by Russia Today or Sputnik, or are retweets of these. This gives an indication of how much activity and engagement there was around these accounts. To put these numbers in context, we also included the equivalent statistics for the two main pro-leave and pro-remain Twitter accounts:

 

 

Account

Original tweets

Retweeted by others

Retweets by this account

Replies by account

Total tweets

@RT_com -  General Russia Today

39

2,080 times

62

0

2,181

@RTUKnews

78

2,547 times

28

1

2,654

@SputnikInt

148

1,810 times

3

2

1,963

@SputnikNewsUK

87

206 times

8

4

305

TOTAL

352

6,643

101

7

7,103

 

 

 

 

 

 

@Vote_leave

2,313

231,243

1,399

11

234,966

@StrongerIn

2,462

132,201

910

7

135,580

 

We also analysed which accounts retweeted RT_com and RTUKnews the most in our dataset. The top one with 75 retweets of Russia Today tweets was a self-declared US-based account that retweets Alex Jones from infowars, RT_com, China Xynhua News, Al Jazeera, and an Iranian news account. This account (still live) joined in Feb 2009 and as of 15 December 2017 has 1.09 million tweets - this means an average of more than 300 tweets per day, indicating it is a highly automated account. It has more than 4k followers, but follows only 33 accounts. Two of the next most active retweeters are a deleted and a suspended account, as well as two accounts that both stopped tweeting on 18 Sep 2016.

 

For the two Sputnik accounts, the top retweeter made 65 retweets. It declares itself as Ireland based; has 63.7k tweets and 19.6k likes; many self-authored tweets; last active on 2 May 2017; account created on May 2015; avg 87 tweets a day (which possibly indicates an automated account);. It also retweeted Russia Today 15 times. The next two Sputnik retweeters (61 and 59 retweets respectively) are accounts with high average post-per-day rate (350 and 1,000 respectively) and over 11k and 2k followers respectively. Lastly, four of the top 10 accounts have been suspended or deleted.

Impact of Russia-linked Misinformation vs Impact of False Claims Made By Politicians During the Referendum Campaign

 

A House of Commons Treasury Committee Report published on May 2016[5], states that: “The public debate is being poorly served by inconsistent, unqualified and, in some cases, misleading claims and counter-claims. Members of both the ‘leave’ and ‘remain’ camps are making such claims. Another aim of this report is to assess the accuracy of some of these claims..

 

In our research, we analysed the number of Twitter posts around some of the these disputed claims, firstly to understand their resonance with voters, and secondly, to compare this to the volume of Russia-related tweets discussed above.

 

A study[6] of the news coverage of the EU Referendum campaign established that the economy was the most covered issue, and in particular, the Remain claim that Brexit would cost households £4,300 per year by 2030 and the Leave campaign’s claim that the EU cost the UK £350 million each week. Therefore, we focused on  these two key claims and analysed tweets about them.

 

With respect to the disputed £4,300 claim6 (made by the Chancellor of the Exchequer), we  identified 2,404 posts in our dataset (tweets, retweets, replies), referring to this claim.

 

For the £350 million a week disputed claim6 - there are 32,755 pre-referendum posts (tweets, retweets, replies) in our dataset. This is 4.6 times the 7,103 posts related to Russia Today and Sputnik and 10.2 times more than the 3,200 tweets by the Russia-linked accounts suspended by Twitter.

 

In particular, there are more than 1,500 tweets from different voters, with one of these wordings:

 

I am with @Vote_leave because we should stop sending £350 million per week to Brussels, and spend our money on our NHS instead.

 

I just voted to leave the EU by postal vote! Stop sending our tax money to Europe, spend it on the NHS instead! #VoteLeave #EUreferendum

 

Many of those tweets have themselves received over a hundred likes and retweets each.

 

This false claim is being regarded by media as one of the key ones behind the success of VoteLeave[7].

 

So returning to Q27 on likely impact of misinformation on voting behaviour - it was not possible for us to quantify this from such tweets alone. A potentially useful indicator comes from an Ipsos Mori[8] poll published on 22 Jun 2016, which  showed that for 9% of respondents the NHS was the most important issue in the campaign.

 

 

 

In conclusion, while it is important to quantify the potential impact of Russian misinformation, we should also consider the much wider range of misinformation that was posted on Twitter and Facebook during the referendum and its likely overall impact.

 

We should also study not just fake news sites and the social platforms that were used to disseminate misinformation, but also the role and impact of Facebook-based algorithms for micro-targeting adverts, that have been developed by private third parties[9].

 

A related question, is studying the role played by hyperpartisan and mainstream media sites during the referendum campaign. This is the subject of our latest study, with key findings available here[10].

Q21: High Automation Accounts in Our Brexit Tweet Dataset

Q21 Chair: Sure. Based on the work you have done, what proportion of accounts that are regularly active on Twitter are likely to be fake accounts?

 

While it is hard to quantify all different kinds of fake accounts, we know already that a study by City University identified 13,493 suspected bot accounts, amongst which Twitter found only 1% as being linked to Russia[11]. In our referendum tweet dataset there are tweets by 1,808,031 users in total, which makes the City bot accounts only 0.74% of the total.

 

If we consider in particular, Twitter accounts that have posted more than 50 times a day (considered high automation accounts by researchers), then there are only 457 such users in the month leading up to the referendum on 3 June 2016.

The most prolific were "ivoteleave" and "ivotestay", both suspended, which were similar in usage pattern. There were also a lot of accounts that did not really seem to post much about Brexit but were using the hashtags in order to get attention for commercial reasons.

We also analysed the leaning of these 457 high automation accounts an identified 361 as pro-leave (with 1,048,919 tweets), 39 pro-remain (156,331 tweets), and the remaining 57 as undecided.

Conclusion: Socio-Technical Solutions to the “Fake News” Problem

Please allow me to conclude by answering several questions on how can we address the “fake news” problem through new technology, policies, and other actions.

 

Promote Fact Checking Efforts

In order to counter subjectivity, post-truth politics, disinformation, and propaganda, many media and non-partisan institutions worldwide have started fact checking initiatives – 114 in total, according to Poynter[12]. These mostly focus on exposing disinformation in political discourse, but generally aim at encouraging people to pursue accuracy and veracity of information (e.g. Politifact, FullFact.org, Snopes). A study by the American Press Institute has shown that even politically literate consumers benefit from fact-checking as they increase their knowledge of the subject.[13]

Professional fact checking is a time-consuming process that cannot cover a significant proportion of the claims being propagated via social media channels.

There are two ways to lower the overheads and I believe both are worth pursuing: 1) create a national fact-checking initiative that promotes collaboration between different media organisations, journalists, and independent fact checkers, such as Full Fact; 2) fund the creation of automation tools for analysing disinformation, to help the human effort. 

Fund open-source research on automatic methods for disinformation detection

As discussed in my oral evidence already. there is emerging technology for veracity checking and verification of social media content (going beyond images/video forensics). These include tools developed in several European projects (e.g. PHEME, REVEAL, and InVID), tools assisting crowdsourced verification (e.g. CheckDesk, Veri.ly), citizen journalism (e.g. Citizen Desk), and repositories of checked facts/rumours (e.g. Emergent, FactCheck). However, many of those tools require further improvements the underlying algorithms, in order to achieve accuracy comparable to that of email spam filter technology. I argue that these should be made open source, to enable open science and verifyable algorithms, as well as to enable companies and platforms to experiment easily with the new technology.

     It is also important to invest in establishing ethical protocols and research methodologies, since social media content raises a number of privacy, ethical, and legal challenges.

 

Dangers and pitfalls of relying purely on automated tools for disinformation detection

Many researchers, including myself, are researching automated methods based on machine learning algorithms, in order to identify automatically disinformation on social media platforms. Given the extremely large volume of  social media posts, key questions are can disinformation be identified in real time and should such methods be adopted by the social media platforms themselves?

The very short answer is: Yes, in principle, but we are still far from solving many key socio-technical issues, so, when it comes to containing the spread of disinformation, we should be mindful of the problems which such technology could introduce:

        Non-trivial scalability: While some of our algorithms work in near real time on specific datasets such as tweets about the Brexit referendum - applying them across all posts on all topics as Twitter would need to do, for example, is very far from trivial. Just to give a sense of the scale here - prior to 23 June 2016 (referendum day) we had to process fewer than 50 Brexit-related tweets per second, which was doable. Twitter, however, would need to process more than 6,000 tweets per second, which is a serious software engineering, computational, and algorithmic challenge.

        Algorithms make mistakes, so while 90 per cent accuracy intuitively sounds very promising, we must not forget the errors - 10 per cent in this case, or double that at 80 per cent algorithm accuracy. On 6,000 tweets per second this 10 per cent amounts to 600 wrongly labeled tweets per second rising to 1,200 for the lower accuracy algorithm. To make matters worse, automatic disinformation analysis often combines more than one algorithm - first to determine which story a post refers to and second - whether this is likely true, false, or uncertain. Unfortunately, when algorithms are executed in a sequence, errors have a cumulative effect.

        These mistakes can be very costly: broadly speaking algorithms make two kinds of errors - false negatives in which disinformation is wrongly labelled as true or bot accounts wrongly identified as human and false positives, correct information is wrongly labelled as disinformation or genuine users being wrongly identified as bots. False negatives are a problem on social platforms, because the high volume and velocity of social posts (e.g. 6,000 tweets per second on average) still leaves  with a lot of disinformation “in the wild”. If we draw an analogy with email spam - even though most of it is filtered out automatically, we are still receiving a significant proportion of spam messages. False positives, on the other hand, pose an even more significant problem, as falsely removing genuine messages is effectively censorship through artificial intelligence. Facebook, for example, has a growing problem with some users having their accounts wrongly suspended.

Therefore, I strongly believe that the best way forward is to implement human-in-the-loop solutions, where people are assisted by machine learning and AI methods, but not replaced entirely, as accuracy is still not high enough, but also to aleviate the danger of censorsing information.

 

Establishing Cooperation and Data Exchange between Social Platforms and Scientists

Our latest work on analysing misinformation in tweets about the UK referendum[14], [15] highlighted yet again a very important issue - when it comes to social media and furthering our ability to understand its misuse and impact on society and democracy, the only way forward is for data scientists, political and social scientists and journalists to work together alongside the big social media platforms and policy makers. I believe data scientists and journalists need to be given open access to the full set of public social media posts on key political events for research purposes (without compromising privacy and data protection laws), and be able to work in collaboration with the platforms through research grants and shared funding (such as the Google Digital News Initiative). Such opportunities should be made available to all scientists, on a competitive basis, rather than following an approach of exclusive release of data to selected one or two academic institutions.

 

When it comes to understanding online misinformation and its impact on society, there are still many outstanding questions that need to be researched. Most notable is studying the dynamics of the interaction between all these Twitter accounts over time - for which we need the complete archive of public tweets, images, and URL content shared, as well as profile data and friend/follower networks. This would help us quantify better (amongst other things) what kinds of tweets and messages resulted in misinformation spreading accounts gaining followers and re-tweets, how human-like was the behaviour of the successful ones, and also were they connected to the alternative media ecosystem and how.

 

In summary, the intersection of automated accounts, political propaganda, and misinformation is a key area in need of further investigation, but for which, scientists often lack the much needed data, while the data keepers lack the necessary transparency, motivation to investigate these issues, and willingness to create open and unbiased algorithms.

Policy Decisions around Preserving Important Social Media Content for Future Studies

Governments and policy makers are in a position to help establish this much needed cooperation between social platforms and scientists, promote the definition of policies for ethical, privacy-preserving research and data analytics over social media data, and also ensure the archiving and preservation of social media content of key historical value.

For instance, given the ongoing debate on the scale and influence of Russian propaganda on election and referendum outcomes, it would have been invaluable to have Twitter archives made available to researchers under strict access and code of practice criteria, so it becomes possible to study these questions in more depth. Unfortunately, this is not currently possible, with Twitter having suspended all Russia-linked accounts and bots, as well as all their content and social network information. Similar issues arise when trying to study online abuse of and from politicians, as posts and accounts are again suspended at a very high rate.

Related to this is the challenge of open and repeatable science on social media data, as again many of the posts in current datasets available for training and evaluating machine learning algorithms, have been deleted or are not available. This causes a problem as algorithms do not have sufficient data to improve as a result and neither can scientists determine easily whether a new method is really outperforming the state-of-the-art.

Promoting Media Literacy and Critical Thinking for Citizens

Training citizens in the ability to recognise spin, bias, and mis- and disinformation are elements of media literacy. Due to the extensive online and social media exposure of children, there are also initiatives[16] aimed specifically at school children, starting from as young as 11 years old. There are also online educational resources on media literacy and fake news[17], [18], that could act as a useful starting point of national media literacy initiatives.

 

Increasingly, media literacy and critical thinking are seen as key tools in fighting the effects of online disinformation and propaganda techniques[19],[20]. Many of the existing programmes today are delivered by NGOs in a face-to-face group setting. The next challenge is how to roll these out at scale and also online, in order to reach wide audience across all social and age groups.

 

Establish/revise and enforce national code of practice for politicians and media outlets

As already discussed above, disinformation and biased content reporting are not just the preserve of fake news and state-driven propaganda sites  and social accounts. A significant amount also comes from media and factually incorrect statements by prominent politicians.

One high profile example is the false claim regarding immigrants from Turkey, which was made on the front pages of a major UK newspaper[21].

We analysed the referendum-related complaints upheld by the Press Complaints Commission and found in total 7 upheld breaches against the Sunday Express, The Sun, Daily Mail, and the Daily Express, as well as 7 cases of mediated breaches by the Sunday Express, the Sun, and the Express online.

The impact of widely known and influential claims made by politicians from both sides of the referendum campaign was already discussed above.

Therefore, an effective way to combat deliberate online falsehoods must address such cases as well, as they are widely shared through social media as well.

Governments and policy makers could help again through establishing new or strengthening existing codes of practice of political parties and press standards, as well as ensuring that they are adhered to.

These need to be supplemented with transparency in political advertising on social platforms and a review process for political advertising, in order to eliminate  or significantly reduce promotion of misinformation through advertising. These measures would also help with reducing the impact of all other kinds of disinformation already discussed above (i.e. fake news sites, Russian propaganda, etc).

Appendix: Information about Our Dataset of UK Referendum Tweets

 

We have collected around 17.5 million tweets up to and including 23 June 2016 (EU referendum day). The highest volume was 2 million tweets on Jun 23rd (only 3,300 lost due to rate limiting), with just over 1.5 million in poll opening times. Of the 2 million, 57% were retweets and  5% - replies. June 22nd was second highest, with 1.3 million tweets.

 

The 17.5 million tweets were authored by just over 2 million distinct Twitter users (2,016,896).

 

Some of our studies focused on a subset of these, covering the month up to and including June 23rd. Within that period, there were just over 13.2 million tweets, from which 4.5 million were original tweets (4,594,948); 7.7 million retweets (7,767,726); and 850 thousand replies (858,492). These were sent by just over 1.8 million distinct users.

 

The tweets were collected based on containing the following keywords and hashtags:

 

votein, yestoeu, leaveeu, beleave, EU referendum, voteremain, bremain, no2eu, betteroffout, strongerin, euref, betteroffin, eureferendum, yes2eu, voteleave, voteout, notoeu, eureform, ukineu, britainout, brexit, leadnotleave

 

We continued tweet collection post-referendum too, amassing over 93 million tweets between 24 June 2016 and 31 Mar 2017, when article 50 was triggered. A further 102 million tweets has been collected between 31 Mar 2017 and 18 Dec 2017.

 

So far, the bulk of our analysis has focused on the pre-referendum tweets, although we plan to continue our research on the post referendum period too.

 

March 2018


[1] https://www.nytimes.com/2018/01/25/opinion/russian-trolls-fake-news.html

[2] M. Moore and G. Ramsay. UK media coverage of the 2016 EU Referendum campaign. https://www.kcl.ac.uk/sspp/policy-institute/CMCP/UK-media-coverage-of-the-2016-EU-Referendum-campaign.pdf

[3] https://www.theguardian.com/us-news/2016/nov/16/facebook-bias-bubble-us-election-conservative-liberal-news-feed

[4] https://www.theguardian.com/world/2017/nov/14/how-400-russia-run-fake-accounts-posted-bogus-brexit-tweets

[5]The economic and financial costs and benefits of the UK’s EU membership. May 2016. https://publications.parliament.uk/pa/cm201617/cmselect/cmtreasy/122/122.pdf 

[6] https://www.kcl.ac.uk/sspp/policy-institute/CMCP/UK-media-coverage-of-the-2016-EU-Referendum-campaign.pdf

[7] http://www.telegraph.co.uk/news/0/eu-referendum-claims-won-brexit-fact-checked/

[8] https://www.ipsos.com/sites/default/files/migrations/en-uk/files/Assets/Docs/Polls/issues-index-june-2016-charts.pdf

[9] https://www.theguardian.com/technology/2017/may/07/the-great-british-brexit-robbery-hijacked-democracy

[10] https://gate4ugc.blogspot.co.uk/2018/03/brexit-disinformation-role-news-media.html

[11] https://www.parliament.uk/documents/commons-committees/culture-media-and-sport/180119%20Nick%20pickles%20Twitter%20to%20Chair.pdf

[12] https://www.poynter.org/2017/there-are-now-114-fact-checking-initiatives-in-47-countries/450477/

[13] American Press Institute. “New studies on political fact-checking: Growing, influential; but less popular among GOP”, 2015. https://www.americanpressinstitute.org/fact-checking-project/new-research-on-political-fact-checking-growing-and-influential-but-partisanship-is-a-factor/

[14] https://www.buzzfeed.com/jamesball/3-million-brexit-tweets-reveal-leave-voters-talked-about-imm

[15] https://www.buzzfeed.com/tomphillips/we-found-45-suspected-bot-accounts-sharing-pro-trump-pro

[16] https://lie-detectors.org/

[17] https://www.edutopia.org/blogs/tag/media-literacy?gclid=CLu42_mu6NQCFU1MDQod7NEG0Q

[18] https://www.nytimes.com/2017/01/19/learning/lesson-plans/evaluating-sources-in-a-post-truth-world-ideas-for-teaching-and-learning-about-fake-news.html

[19] Defending and Ultimately Defeating Russia’s Disinformation Techniques” Centre for European Policy Analysis, August 2016 https://cepa.ecms.pl/files/?id_plik=2713

[20] ttps://www.nytimes.com/2017/09/25/opinion/the-only-way-to-defend-against-russias-information-war.html

[21] Turkey poll findings were flawed – clarification, THE DAILY EXPRESS (Jun 19, 2016, 12:00AM), https://www.express.co.uk/news/clarifications-corrections/681097/Turkeypoll-findings-were-flawed-clarification; https://www.theguardian.com/commentisfree/2016/may/19/inaccurate-pro-brexit-infacts-investigation-media-reports-eu-referendum