Different social media circulate questionable information to different (quantifiable) extents
An insight from "The COVID-19 social media infodemic"
🧠 Idea
Different social media platform attract and circulate questionable information to different (quantifiable) extents
Source
The COVID-19 social media infodemic, 2020, Matteo Cinelli et al., Nature.
Methodology
The authors
Gathered posts and comments1 published between 1/1/21 and 14/2/21 and that contained at least one word in a set of keywords related to Covid-192 from several social media platforms (Twitter3, Reddit, YouTube and Gab4);
Considered only post and comments containing at least one URL pointing outside the source platform;
Classified posts and comments by dividing destination URLs into two categories (Questionable and Reliable sources) using a categorisation5 from MediaBias/FactCheck ;
Defined the amplification factor as the average number of reactions to a post;
Defined the coefficient of relative amplification as the ratio between the amplification factor of questionable sources and the amplification factor of reliable ones.
Results
The volume of posts and comments based on questionable sources varied by platform and specifically was
38% of Gab
10% of Twitter
6.5% of YouTube
4.5% of Reddit
Reactions towards questionable news displayed a similar growth dynamics6 as reliable news (although at a lower scale) on all platforms, except for Gab where the volume of reactions to questionable sources was 3x bigger than reactions to reliable ones;
The coefficient of relative amplification varied by platform and specifically was
3.9 on Gab
0.97 on Twitter
0.55 on Reddit
0.35 on YouTube
This implied that Twitter is the most neutral platform (i.e. a coefficient close to 1 indicates neither amplification nor reduction of questionable sources), YouTube and Reddit actively reduce the impact of questionable sources and Gab strongly amplifies them.
🗣️ Reactions
The scientific community positively endorsed the work (458 citations as of today);
Some cited the results as part of a larger body of evidence while advocating for government-backed fact-checkers in social media and stricter laws and regulations;
The result seems to have contributed to growing public distrust in the ability of government and corporation to reduce online misinformation;
In response, many advocated taming the issue through self-education about digital literacy and critical engagement with information sources;
Some reacted with greater scepticism towards information on social media in general
🤔 (my) Considerations
Robustness of the result
People generally already agree that the design, algorithms and audience of a social media platform significantly affect how information flows through it. However, it was nice to see a quantification of this concept using a simple and large-scale methodology.
The truth problem
However, it is clear that the result highly depends on the evaluation of sources of information by MediaBias/FactCheck (MBFC) which the authors take as ground truth of information quality. Regardless of my personal opinion, I can imagine that if someone was sceptical about this assumption, they would not trust the paper’s conclusions.
Regardless of the reliability of MBFC, this points to a larger problem. It seems that our society is increasingly lacking the ability to establish a common ground truth on which to build upon a common discourse. As official western institutions remain shy about the topic, perhaps due to a fear of a slippery slope from fact-checking to propaganda, civil society seems to approach the problem from multiple atomic perspectives that fractally reflect the differences that underlie the absence of truth problem in the first place.
Handling hot potatoes
Given these circumstances, social media platforms seem to have no choice but to face and manage the far-reaching consequences that the incentives that their design and algorithm create about online interaction and information sharing.
In what is becoming a complex ethical and political conundrum, platforms have probably realised that they cannot but take a stance, whether it is to be neutral or to actively censor disinformation, which brings with it many questions around how to evolve and enforce policies.
As Jack Dorsey mentioned on The Joe Rogan Experience in February 2019:
[…] when people open Twitter what are we incentivizing? […] what does the like button incentivize? what does the retweet incentivize? what does the number of followers and making that number big and bold incentivize? I'm not sure if we should incentivize anything but we need to understand what that is and I think right now we do incentivize a lot of echo chambers because we don't make it easy for people to follow interest and topics, it’s only accounts, we incentivize a lot of outrage […] because of the sum of the dynamics in the service not allowing a lot of nuance in conversation earlier on
Due to API limitations, the authors were able to collect Twitter data only starting from 27/1/21.
“Gab is an American alt-tech social networking service known for its far-right userbase”(Wikipedia)
The data collection yielded around 1.3M posts and 7.5M comments across the five platforms, with the vast majority of posts coming from Twitter (1.1M) and comments coming from YouTube (7M).
Keywords such as coronavirus, pandemic, corona outbreak, china, Wuhan, nCoV, IamNotAVirus, coronavirus_update, coronavirus_transmission, coronavirus news, coronavirus outbreak.
The authors defined as a questionable information source a news outlet to which MBFC attributed one or more of the following characteristics: extreme bias, consistent promotion of propaganda/conspiracies, poor or no sourcing to credible information, information not supported by evidence or a complete lack of transparency and/or fake news. Reliable information sources were defined negatively as those that do not show any of the aforementioned characteristics. This procedure yielded 800 questionable outlets and 1837 reliable ones.
The authors showed this result by demonstrating that the relationship between the cumulative number of posts and reactions of questionable and reliable sources was linear with a coefficient below 1, meaning that the difference in outreach between popular and unpopular posts and comments was similar regardless of the reliability of the source.