Skip to main content


Int J Public Health, 08 April 2021

Can Big Data Be Used to Monitor the Mental Health Consequences of COVID-19?

Nicola Julia Aebi,
Nicola Julia Aebi1,2*David De Ridder,David De Ridder3,4Carlos Ochoa,Carlos Ochoa3,5Dusan Petrovic,Dusan Petrovic6,7Marta FaddaMarta Fadda8Suzanne ElayanSuzanne Elayan9Martin SykoraMartin Sykora9Milo PuhanMilo Puhan10John A. Naslund&#x;John A. Naslund11Stephen J. Mooney&#x;Stephen J. Mooney12Oliver Gruebner,&#x;Oliver Gruebner10,13
  • 1Swiss Tropical and Public Health Institute, Basel, Switzerland
  • 2University of Basel, Basel, Switzerland
  • 3University of Geneva, Faculty of Medicine, Institute of Global Health, Geneva, Switzerland
  • 4École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
  • 5Institute for Environmental Sciences, University of Geneva, Geneva, Switzerland
  • 6Department of Epidemiology and Health Systems (DESS), University Center for General Medicine and Public Health (UNISANTE), Lausanne, Switzerland
  • 7Centre for Environment and Health, School of Public Health, Department of Epidemiology and Biostatistics, Imperial College London, London, United Kingdom
  • 8University of Lugano, Faculty of Biomedical Sciences, Lugano, Switzerland
  • 9Centre for Information Management, Loughborough University, Leicestershire, United Kingdom
  • 10University of Zurich, Epidemiology, Biostatistics and Prevention Institute, Zurich, Switzerland
  • 11Harvard Medical School, Boston, MA, United States
  • 12University of Washington, Department of Epidemiology, Seattle, WA, United States
  • 13University of Zurich, Department of Geography, Zurich, Switzerland


The COVID-19 pandemic has profound mental health consequences [1]. Yet, opportunities to monitor and mitigate mental health problems in this context remain scarce [2]. At the same time, nearly half of the world’s population (49%) now use social media and digital tools such as natural language processing have improved considerably, particularly for mental health [3]. Using these tools, researchers have identified and monitored signs of mental illness reflected in social media data including stress, loneliness, depression, or post-traumatic stress [4]. Such approaches, part of a growing field called digital epidemiology, could help identify populations in need of mental health support during the current pandemic. More specifically, sentiment analysis of content posted on popular social media platforms, combined with detection of spatiotemporal disease incidence changes could provide decision makers and public health experts with critical information to supplement traditional epidemiological data sources, and to inform the implementation of targeted mental health interventions [57].

Ethical and Legal Concerns of Big Data

Despite the promise of Big Data, it is important to acknowledge that these digital epidemiologic approaches also raise ethical and legal concerns, particularly with regards to consent, privacy expectations, data protection, and security. Social media users posting publicly may not have consented to being in a research study, and those suffering from mental illness may not have intended for their posts to reveal their health status. People may have shared their information via social media while in a temporary vulnerable state of mind, e.g., during a crisis or during a disease outbreak. In this case, they may not necessarily realize that what they share can potentially be collected and analyzed by third parties, either for relief, marketing, or scientific activities. Yet being identified as mentally ill might cause stigma in private life, at work, become a source of discrimination, and might affect access and use of healthcare services. These ethical issues are compounded by potential legal issues, including regulations regarding the security and protection of the data, and the malicious use of sensitive, health-related data by third parties. Therefore, methodologies, such as de-identification and anonymization, can ensure data protection and privacy by removing personal identifiers. Geo-masking or aggregation of spatial data are also applied to remove geographical attributes [8].

Methodological Concerns of Big Data

Research or interventions based on Big Data are subject to validity concerns. The theory underlying formal statistics typically assumes random sampling [9], but because e.g., social media users may not be representative of the general population in terms of demographics or socioeconomic factors, analyzing these data without accounting for the potential non-representativeness may result in selection bias and low internal and external validity [10]. Furthermore, when Big Data are missing key covariates, it may be difficult to account for the effect of confounding factors (sex, socioeconomic determinants, ethnicity). An additional important challenge concerns the assessment of the mental health outcome itself. While the development of advanced sentiment analysis function as a proxy for highlighting emotional distress in the digital sphere, this type of approach precludes any formal assessment of actual mental health outcomes and may result in distorted conclusions. Big Data is also prone to p-hacking (manipulation of data to achieve statistical significance) and harking (hypothesizing after the results are known), especially if the data contains many variables. Hence, a pre-registered analysis plan adds credibility. This plan should include an adjusted significance level, because very small effects may become significant by chance when working with Big Data. Finally, claims of causality cannot be made; therefore, data have to be interpreted carefully. Overall, the strict adherence to reporting guidelines is of utmost importance to overcome methodological concerns.

Strengths of Big Data

Despite these concerns, Big Data analysis may contribute to a more comprehensive understanding of the mental health consequences from the current COVID-19 crisis. Big Data are not only “long” (covering many individuals), they are also “large”, that is, they contain many variables that are already included or that can be easily extracted from these data [6]. The main strength of this approach, however, is the huge data volume made available even across national borders and health care systems. Thereby, dozens of millions of e.g., geo-referenced Twitter tweets, may be analyzed, substantially increasing the statistical power of spatial analyses linking mental health determinants, COVID-19 case counts or regulations, and sentiments of social media users in those locations [10]. Therefore, Big Data analyses could help identify regional differences and establish correlations with other factors such as incidence rates of COVID-19, lockdown strictness or other policies aimed at containing the pandemic, or hospital overcrowding. Analysis of big social media data in combination with spatial epidemiological approaches may further identify geographic hotspots of increased symptoms of mental health problems over time [7]. This in turn could provide key operational information to help implement appropriate mental health support and prevention measures. Moreover, real time monitoring of the mental health consequences of COVID-19 may help set up governments to respond rapidly and appropriately to changes in mental health status. Unlike formal epidemiological studies, the huge data volume and wide geographic coverage of Big Data surveillance come at limited costs and in real-time, making this approach an efficient use of resources. The main limitations are computational power, interpretability, and threats to generalizability.


We recommend the use of Big Data approaches to monitor mental health in the general population, especially in the context of heightened anxieties and threats to mental wellbeing owing to the COVID-19 pandemic, as there may be ways to leverage these novel data sources to help deliver targeted support to specific populations including those who are most susceptible to the impacts of the pandemic and resulting mental health consequences. Hence, Big Data hold potential to strengthen our mental health prevention systems in the context of a global public health crisis. There will be ethical and technical challenges, which will require careful and continued efforts to overcome, but these digital approaches can support multifaceted strategies including both modern technologies and traditional approaches.

Author Contributions

NA, DR, DP, and CO wrote the manuscript. OG acquired funding. OG, SM and JN conceptualized and supervised the study. OG, SM, JN, MF, SE, MS, and MP reviewed and edited the manuscript. All authors contributed to the article and approved the submitted version.


This work was funded by the Swiss School of Public Health (SSPH+) (to OG) through a mandate for a PhD course on Big Data in Public Health 2020 and is a direct outcome of this online seminar (SSPH + PhD course website).

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.


We thank Eva Furrer, Managing Director of the Center for Reproducible Science, University of Zurich for her thoughtful comments on the manuscript.


1.The Lancet Infectious Diseases. The intersection of COVID-19 and mental health. Lancet Infect Dis [Internet] (2020). 20(11):1217. doi:10.1016/S1473-3099(20)30797-0

PubMed Abstract | CrossRef Full Text | Google Scholar

2. Taquet, M, Luciano, S, Geddes, JR, and Harrison, PJ. Bidirectional associations between COVID-19 and psychiatric disorder: retrospective cohort studies of 62 354 COVID-19 cases in the USA. Lancet Psychiatry [Internet] (2020). 8:130–40. doi:10.1101/2020.08.14.20175190

PubMed Abstract | CrossRef Full Text | Google Scholar

3. Shatte, ABR, Hutchinson, DM, and Teague, SJ. Machine learning in mental health: a scoping review of methods and applications. Psychol Med (2019). 49:1426–48. doi:10.1017/S0033291719000151

PubMed Abstract | CrossRef Full Text | Google Scholar

4. Shaughnessy, K, Reyes, R, Shankardass, K, Sykora, M, Feick, R, Lawrence, H, et al. Using geolocated social media for ecological momentary assessments of emotion: innovative opportunities in psychology science and practice. Can Psychol Can [Internet] (2017). 59:47–53. doi:10.1037/cap0000099

CrossRef Full Text | Google Scholar

5. Naslund, JA, Gonsalves, PP, Gruebner, O, Pendse, SR, Smith, SL, Sharma, A, et al. Digital innovations for global mental health: opportunities for data science, task sharing, and early intervention. Curr Treat Options Psych [Internet] (2019). 6:337–51. doi:10.1007/s40501-019-00186-8

CrossRef Full Text | Google Scholar

6. Gruebner, O, Sykora, M, Lowe, SR, Shankardass, K, Galea, S, and Subramanian, SV. Big data opportunities for social behavioral and mental health research. Soc Sci Med [Internet] (2017). 189:167–9. doi:10.1016/j.socscimed.2017.07.018

PubMed Abstract | CrossRef Full Text | Google Scholar

7. Gruebner, O, Sykora, M, Lowe, SR, Shankardass, K, Trinquart, L, Jackson, T, et al. Mental health surveillance after the terrorist attacks in Paris. Lancet [Internet] (2016). 387(10034):2195–6. doi:10.1016/s0140-6736(16)30602-x

PubMed Abstract | CrossRef Full Text | Google Scholar

8. Swanlund, D, Schuurman, N, Zandbergen, P, and Brussoni, M. Street masking: a network-based geographic mask for easily protecting geoprivacy. Int J Health Geogr [Internet] (2020). 19(1):26. doi:10.1186/s12942-020-00219-z

CrossRef Full Text | Google Scholar

9. Mooney, SJ, and Garber, MD. Sampling and sampling frames in big data epidemiology. Curr Epidemiol Rep (2019). 6(1):14–22. doi:10.1007/s40471-019-0179-y

PubMed Abstract | CrossRef Full Text | Google Scholar

10. Mooney, SJ, and Pejaver, V. Big data in public health: terminology, machine learning, and privacy. Annu Rev Public Health [Internet] (2018). 39(1):95–112. doi:10.1146/annurev-publhealth-040617-014208

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: surveillance, digital epidemiology, spatial epidemiology, digital health geography, social media

Citation: Aebi NJ, De Ridder D, Ochoa C, Petrovic D, Fadda M, Elayan S, Sykora M, Puhan M, Naslund JA, Mooney SJ and Gruebner O (2021) Can Big Data Be Used to Monitor the Mental Health Consequences of COVID-19?. Int J Public Health 66:633451. doi: 10.3389/ijph.2021.633451

Received: 25 November 2020; Accepted: 02 March 2021;
Published: 08 April 2021.

Edited by:

Partnership Editorial Office, Frontiers Media SA, Switzerland

Copyright © 2021 Aebi, De Ridder, Ochoa, Petrovic, Fadda, Elayan, Sykora, Puhan, Naslund, Mooney and Gruebner. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Nicola Julia Aebi,

These authors share last authorship

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.