HINTS AND KINKS

Int. J. Public Health, 21 August 2025

Volume 70 - 2025 | https://doi.org/10.3389/ijph.2025.1608480

A Custom Keyword Tool for Improving the Quality of Social Media Monitoring on Vaccine Safety: A Proof of Concept

  • 1. Bucci-Hepworth Health Services, Pincourt, QC, Canada

  • 2. World Health Organization, Geneva, Switzerland

  • 3. Department of Translational Research and New Technologies in Medicine and Surgery, University of Pisa, Pisa, Italy

Article metrics

78

Views

4

Downloads

Background

Social media monitoring is one of several ways for health authorities to capture specific insights into population perceptions about vaccine safety [1]. Commercial and open-source tools are helpful for gathering data on socio-cultural, religious, and political trends, but also for detecting what is being said about vaccine safety and why populations are delaying or refusing vaccination [2]. Monitoring and tracking digital forums for target audiences and influencers and identifying misinformation provides additional understanding. However, despite the clear need, public health authorities globally are severely constrained in their capacity to effectively address the overwhelming volume and complexity of misinformation [3]. To complicate matters, international, national, and corporate infodemic management policies are imposing information users to change the way they share information online and on social media platforms. Recent examples of how information users are adapting include the use of memes and increasing the amount and speed of information disseminated between platforms. This rapid evolution of misinformation tactics and limited public health resources including insufficient staffing, and a lack of specialized digital capacity within many public health authorities, renders comprehensive oversight incredibly challenging. As digital information environments become more complex, existing tools for social media monitoring need to adapt to meet the needs of public health authorities that may not have the resources to undertake comprehensive social media listening [4]. For health authorities to truly benefit from better quality social media intelligence, innovations must be developed that are not only required but also accessible, adaptable and feasible to implement [5].

The Vaccine Safety Net (VSN), the World Health Organization’s (WHO) global network that facilitates access to trustworthy, science-based vaccine safety information, identified this challenge as an opportunity to contribute to a rapidly expanding area. The VSN consists of member websites seeking to achieve more effective ways for communicating through digital and social media analytics research [6]. The latter involves identifying high impact vaccine safety related issues on social media for predicting and pre-bunking misinformation, developing and testing social media messages, as well as assessing their relevance and impact using a commercial platform with a social listening tool. Leveraging the VSN’s expertise in social medial listening, we developed and tested a custom keyword filter designed for adaptable global implementation by public health authorities. This filter aims to address the significant challenge posed by the sheer volume and evolving nature of misinformation, which often overwhelm existing commercial and online generic filters and their ability to provide precise results. For example, many generic keyword filters rely on estimations and are capable of over-filtering (or under-filtering) valuable social media content and fail to capture relevant information. Social media content is highly contextual and generic keyword filters do not interpret the nuances of language (e.g., comedy, sarcasm, anger, etc.)

Objective

We sought to understand how to optimize social media searches on vaccine safety using a custom keyword filter for better quality search results. A proof-of-concept project whereby a custom keyword filter was designed using Kim et al.’s [7] conceptual framework and tested using a commercial social listening platform and open source artificial intelligence (AI) tools with the intent of analyzing the quantity of irrelevant relevant mentions retrieved from vaccine safety searches on X® in Canada, United States, Italy and United Kingdom.

Unfiltered keywords can yield large amount of irrelevant data [8]. Therefore, custom keyword filters are meaningful methods for improving digital and social media monitoring practices in response to constantly evolving information environments. For additional accuracy, we chose to first create and test a keyword filter with vaccine -related keywords. A vaccine safety keyword filter was subsequently created and tested to distill the information. We added artificial intelligence (AI) derived keywords to the filter, which expanded the social media datasets.

Methods

The custom keyword filter involves three steps: 1) frequency screening; 2) sampling; and 3) search implementation.

Frequency Screening

A list of candidate vaccine and vaccine safety keywords were pooled in collaboration with VSN members from Canada, Italy, United States and United Kingdom. Candidate keywords were selected considering native language of targeted countries, media reports, published literature and epidemiological events. The list of candidate keywords was applied to a six [6] month retrospective scan of X® conversations between January and June 2023 using a commercial social media monitoring platform. We tracked keywords that peaked on X® and developed a history of trending candidate keywords for this period. We used this dataset to identify the frequency of candidate keywords. Candidate keywords that had less than 30% of mentions per month were discarded from the keyword list. This frequency threshold was selected through team consensus for this proof of concept. This initial triage was used to identify vaccine and vaccine safety keywords used more regularly in X® conversations.

A data analyst was enlisted to assist with AI keyword identification. Additional vaccine and vaccine safety-related keywords were extracted from the original dataset using Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-trained Transformer 3 (GPT-3).

Sampling

The remaining candidate keywords were then screened for relevance. To do this, a real time dataset of X® keyword mentions of up to one [1] week was generated. We sampled two hundred (200) mentions for each candidate keyword, including reposts using a commercial sample operator. Three team members, experts in vaccination and vaccine safety, reviewed each sample to determine relevance. We assessed the relevance of keywords by classifying their sensitivity and specificity to “vaccine-related” and then “vaccine safety-related” words. Keywords were considered relevant if they were used in the context of vaccination conversations or was implicit in meaning.

Keywords that returned no relevance were removed from the list but added to a “negative keyword list” for prospective Boolean searches. Duplicate posts within each keyword dataset were also removed. Other content in posts such as emojis, hashtags and usernames were not considered relevant. Links to websites, if included in posts, were used to clarify context of post. Replies were also excluded. Lastly, common words known as “stop-words” (e.g., the, she, he, it, are, etc.) were excluded.

Search Implementation

Keywords determined to be highly relevant were not used in a new prospective search on X® to evaluate the quality of their search results. This final step was not part of the scope of this proof of concept.

Results

Thirty (30) vaccine and vaccine safety-related keywords were extracted for each country (see Figures 1, 2) using manual and AI methods. Vaccine-related keywords were used in three thousand one hundred and seventy-eight (3178) posts. While vaccine safety-related keywords were used in eight-hundred and sixty-nine (869) posts (see Table 1). Bert and GPT identified additional keywords including combined terms not previously identified. Themes extracted and analysed from vaccine safety-related mentions include public skepticism about vaccine safety, particularly COVID-19 vaccines, polarization between vaccination perspectives, concerns about misinformation, mistrust in government, influencers, and pharmaceutical companies.

FIGURE 1

Four bar charts titled "Word frequency safety related" for Canada, the United States, the United Kingdom, and Italy display the relative frequency of words such as "vaccine," "covid," "pfizer," and "mrna," among others. Each chart lists words along the y-axis, with frequency per post on the x-axis, indicating the most common terms related to safety discussions in each country.

Keyword visualization: Vaccine safety related posts (Canada & United States (top), United Kingdom & Italy (bottom)) (Switzerland, 2024).

FIGURE 2

Four bar charts display word frequencies in posts about vaccines, excluding safety-related terms, from Canada, the US, the UK, and Italy. Common words include "vaccine," "covid," and "people," with variations among the countries. Each chart shows words on the y-axis  and frequency per post on the X-axis.

Keyword visualization: Vaccine related but not safety related posts (Canada & United States (top), United Kingdom & Italy (bottom)) (Switzerland, 2024).

TABLE 1

Country Filter # Vaccine related # Vaccine safety related
CAN COVID-19 vaccine 199 37
CAN Coronavirus 53 13
CAN COVID vaccines 193 41
CAN Side effects 92 76
CAN AstraZeneca 101 58
CAN Pfizer 142 63
CAN Pandemic 4 0
CAN Plandemic 32 4
CAN Total 816 292
USA COVID 13 2
USA COVID-19 45 4
USA Coronavirus 46 6
USA Vaccine 188 42
USA Vax 182 28
USA Vaxx 182 25
USA Total 656 107
UK Vaccine 117 45
UK Vaccine 39 2
UK COVID 29 14
UK Jabs 60 12
UK Coronavirus 18 5
UK Total 263 78
IT Vaccino 200 72
IT Vaccini 194 76
IT Vaccinato 184 30
IT Vaccinata 114 24
IT Vaccinati 190 57
IT Vaccinate 139 36
IT Vaccinare 196 17
IT Vaccinazion 200 69
IT Immunizzazione 44 10
IT Pfizer 91 36
IT COVID19 25 9
IT COVID-19 66 28
IT Total 1443 392

Number of posts per country (Switzerland, 2024).

Conclusion

Our objective was to develop a custom keyword filter for producing better quality social media intelligence for public health authorities to easily use and be versatile in their vaccine safety communication strategies. We found the development of a custom keyword filter that uses both manual and AI as methods for extracting social media mentions and performing content analysis about vaccine and vaccine safety-related posts yielded quality data. The strategy for our proof-of-concept study used a commercial platform for testing and keyword refinement. We anticipate that custom keyword filters may be used with other commercial and freely available keyword filters for more precise results. This is an advantage over using commercial or online filters alone. That said, keyword filter refinement is a time-consuming process as well as data analysis and interpretation. Despite these shortcomings, social media monitoring innovations are needed to keep with changing information environments. While we did not test the keyword filter for sensitivity, we found the filter to meet our expectations for specificity. New tools need to focus on improving the relevance of outputs. AI offers other avenues for filtering candidate keywords but we are still learning about its limitations. Our proof-of-concept project contributes to a rapidly evolving area and provides new insights on how public health can use adaptable keyword filter tools, in addition to commercial tools, to improve their capacity to respond to online misinformation.

Statements

Author Contributions

LB, SL and FG wrote the manuscript. All authors contributed to the article and approved the submitted version.

Funding

The author(s) declare that no financial support was received for the research and/or publication of this article.

Acknowledgments

We want to thank Alberto E. Tozzi, Ian Roe, Craig Thompson, Charlotte Moser, and the VSN research working group, Kristen De Graaf, Eve Dubé, Tina Purnat, Elisabeth Wilhelm, for their contribution to the development of the implementation and evaluation framework, and to Isabelle Sahinovic, Tala Ghalayini, Brian Yau, and Cécile Macé for their support. We also want to thank Susan Cheatham for her work on keyword extraction and analysis.

Conflict of Interest

The authors declare that they do not have any conflicts of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

References

  • 1

    Zang S Zhang X Xing Y Chen J Lin L Hou Z . Applications of Social Media and Digital Technologies in COVID-19 Vaccination: Scoping Review. J Med Internet Res (2023) 25:e40057. 10.2196/40057

  • 2

    Hou Z Tong Y Du F Lu L Zhao S Yu K et al Assessing COVID-19 Vaccine Hesitancy, Confidence, and Public Engagement: A Global Social Listening Study. J Med Internet Res (2021) 23(6):e27632. 10.2196/27632

  • 3

    Abuhaloob L Purnat T Tabche C Atwan Z Dubois E Rawaf S . Management of Infodemics in Outbreaks or Health Crises: A Systematic Review. Front Public Health (2024) 15(12):1343902. 10.3389/fpubh.2024.1343902

  • 4

    Purnat T Vacca P Czerniak C Ball S Burzo S Zecchin T et al Infodemic Signal Detection during the COVID-19 Pandemic: Development of a Methodology for Identifying Potential Information Voids in Online Conversations. JMIR Infodemiology (2021) 1(1):e30971. 10.2196/30971

  • 5

    Gesualdo F Bucci LM Rizzo C Tozzi AE . Digital Tools, Multidisciplinarity and Innovation for Communicating Vaccine Safety in the COVID-19 Era. Hum Vaccin and Immunother (2022) 18(1):1865048. 10.1080/21645515.2020.1865048

  • 6

    Gesualdo F Marino F Mantero J Spadoni A Sambucini L Quaglia G et al The Use of Web Analytics Combined with Other Data Streams for Tailoring Online Vaccine Safety Information at Global Level: The Vaccine Safety Net’s Web Analytics Project. Vaccine (2020) 38(41):641826. 10.1016/j.vaccine.2020.07.070

  • 7

    Kim Y Huang J Emery S . Garbage in, Garbage Out: Data Collection, Quality Assessment and Reporting Standards for Social Media Data Use in Health Research, Infodemiology and Digital Disease Detection. J Med Internet Res (2016) 18(2):e41. 10.2196/jmir.4738

  • 8

    Chen J Cypher A Drews C Nichols J . CrowdE: Filtering Tweets for Direct Customer Engagements. Proc Int AAAI Conf Web Soc Media (2021) 7(1):5160. 10.1609/icwsm.v7i1.14378

Summary

Keywords

misinformation related to health, social media monitoring, vaccine safety, vaccine acceptance, vaccine hesitancy

Citation

Bucci LM, Lamprianou S, Gesualdo F and Pal S (2025) A Custom Keyword Tool for Improving the Quality of Social Media Monitoring on Vaccine Safety: A Proof of Concept. Int. J. Public Health 70:1608480. doi: 10.3389/ijph.2025.1608480

Received

03 March 2025

Accepted

28 July 2025

Published

21 August 2025

Volume

70 - 2025

Edited by

L. Suzanne Suggs, University of Italian Switzerland, Switzerland

Reviewed by

Dian Hu, University of Texas Health Science Center at Houston, United States

One reviewer who chose to remain anonymous

Updates

Copyright

*Correspondence: Lucie Marisa Bucci,

†These authors have contributed equally to this work and share senior authorship

‡These authors share last authorship

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article