Collecting data for social media analysis regarding national security – methods and implications

FFI-Report 2022
This publication is only available in Norwegian

About the publication

Report number

22/00793

ISBN

978-82-464-3380-6

Format

PDF-document

Size

2.2 MB

Language

Norwegian

Download publication
Gard-Inge Rosvold Arild Bergh
Recent national threat assessments have highlighted the increase in state actors' use of social media to disseminate disinformation and undertake influence operations to damage democratic countries. There is also a significant increase in non-governmental actors' use of social media to spread misinformation in connection with crises such as the Covid-19-pandemic. Authorities responsible for national security will therefore need to analyse data from social me-dia to create a situational awareness as part of a larger threat picture. The Norwegian Defence Research Establishment (FFI) investigations on issues related to social media-based influence operations and disinformation have highlighted the demand for flexible analyses to meet this need. At the same time, it is necessary to undertake further research on social media as a part of hybrid threats. Both operational analyses and research require access to relevant data for study from social media. This report describes how to collect data from social media. The target groups are: i) Those who require data analyses (here called the customer), and ii) those who, through development or ad-ministration of databases, are responsible for the technical aspects of data collection (here called the supplier). The report may also be of interest to others who work with social media and hybrid threats. The focus here is on the technical and practical aspects of data collection from social media. It is beyond the scope of this report to discuss specific disinformation and influ-ence operations issues such as actors or approaches. For the customer target group, practical considerations are explored. The questions one wants answered by analysing data from social media will affect the amount of data to be collected. Here one will have to balance costs, in terms of time and money, with the level of details an analysis requires. It is particularly important to understand the relational nature of social media. The key concept of friends' friends and their social media posts as a source for information leads to a steeper growth in the data size than one may assume, a so-called exponential in-crease. It is not possible to give a simple answer as to what trade-offs to make; instead, the re-port focuses on highlighting the issues and illustrates some factors to consider. For the supplier target group the report provides a detailed description of how data collection software can be developed. The findings discussed here are based on a prototype data collec-tor for YouTube that was designed to explore issues relevant to social media data collection. YouTube's programming interface (API) is explored and the relational character of social media and the implications it has for data collection are discussed from a developer's perspective. The effect of social media quotas on data downloads and overall collection strategies is considered. Finally, the report examines the possibilities of transferring the YouTube prototype approaches to other social media, as well as providing suggestions for real-time data collection strategies.

Newly published