[Image: Cris Cantón/ Getty Images]
Scientists have long recognized the importance of effectively communicating their work to the general public. This can be a daunting task, however, especially since most researchers aren’t trained in communication. In recent years, podcasts have become an increasingly valuable tool in disseminating scientific research—they are easily accessible and available on-demand, and they reach a diversity of audiences.
In an effort to better leverage this medium, researchers in Belgium decided to see if artificial intelligence (AI) was up to the task of generating podcasts about scientific papers (Eur. J. Cardiovasc. Nurs., doi: 10.1093/eurjcn/zvaf074). Generative AI tools can create a podcast in just minutes, but common issues like hallucinations (when AI presents false or misleading information as fact) can make the information untrustworthy.. The team’s story therefore explored both the quality of AI-generated podcasts and their viability as a tool for science communication.
Podcast generation
Philip Moons, University of Leuven, and colleagues used a personalized AI research assistant called Google NotebookLM, a product of Google Labs, to create 10 podcasts. They chose NotebookLM because it connects the material it generates to specific sources, reducing hallucinations and increasing reliability. Each podcast covered a research article that had been published in the European Journal of Cardiovascular Nursing. The selected papers covered a range of article types: five original research articles, two reviews, one patient perspective, one methods corner and one discussion paper. The topics of the articles were not considered as a selection criterion.
The team leveraged the Audio Overview feature of the research assistant, introduced in September 2024, to make the podcasts. The feature generates a conversational discussion and summary of the source material and can tailor the resulting podcast to an intended audience or a certain focus based on prompts. For this study, the researchers uploaded the research papers to NotebookLM and as their only prompt, asked Audio Overview to create a “deep-dive conversation.” They used eight of the generated podcast episodes as-is, but found that two needed to be recreated with an additional time-limit prompt or with a more tailored focus for cardiovascular nurses. The episodes ranged in length from five to 17 minutes.
Quality and potential
The researchers uploaded the research papers to NotebookLM and as their only prompt, asked Audio Overview to create a “deep-dive conversation.”
Each podcast was then sent to the author of the study covered in the episode for them to assess its quality—without being told that the podcast was AI-generated. Quality was broken down into three categories: engagement, trustworthiness and AI detection. After listening to the podcast, the authors were asked to fill out a questionnaire and then three days later participate in an interview for a mixed-method evaluation. The researchers reviewed the questionnaire responses and interviews to collect quantitative and qualitative data, which were combined to reach final conclusions.
In terms of engagement, the participants found that the podcasts described their research in “very simple, easy-to-understand terms,” and they liked the conversational format. Some thought that the hosts were professional podcasters with a background in nursing or medicine. While most participants were happy with the length and pace of the conversation, one thought only 80% of the material was relevant and the rest was just filler.
When assessing trustworthiness, the podcasts were generally deemed reliable. However, some aspects of the research topics were represented inaccurately, like heart failure management being confused for heart failure diagnosis, and some scientific terms were used or pronounced incorrectly. “It was striking how accurate the podcasts were in general,” Moons said. “Knowing that we are just at the beginning of this kind of AI-generated podcasts, the quality will become better over time, probably within the next few months.”
Some participants didn’t like how findings were occasionally hyped up with terms like “amazing” and “groundbreaking.” Also, the hosts spoke in American accents and never introduced themselves or gave context about who they were, which left the international group of listeners questioning their credibility.
The team ultimately concludes that podcasts like the ones it studied could be useful tools for sharing scientific information with a broader audience.
Only half of the participants were surprised to find out that the podcasts were created with AI. The article states that the unsuspecting listeners were “’shocked,’ ‘amazed’ or even ‘having an existential crisis.’” Those who did think the episodes may be AI were still surprised by what it was able to generate, but said they were tipped off by the lack of filler words, like “um,” the lack of vocal disfluencies and moments when something incoherent was said by a host.
Looking to the future
The team ultimately concludes that podcasts like the ones it studied could be useful tools for sharing scientific information with a broader audience, but with some changes. Each episode should be carefully evaluated for accuracy and mistakes, preferably by the author of the study the podcast is discussing. The AI-generated nature of the podcast should also be made clear, and a reference to the original research should be included.
“If podcasts could be generated by AI, that could really be a game-changer,” said Moons. “Podcasts could be made with very little work, just by uploading the article and maybe a bit of prompting. This could be a sustainable model to get the message out to people who do not typically read scientific journals.”