Check of ‘poisoned dataset’ exhibits vulnerability of LLMs to medical misinformation

Test of 'poisoned dataset' shows vulnerability of LLMs to misinformation — Credit score: *Nature Drugs* (2025). DOI: 10.1038/s41591-024-03445-1

By conducting assessments underneath an experimental state of affairs, a workforce of medical researchers and AI specialists at NYU Langone Well being has demonstrated how straightforward it’s to taint the information pool used to coach LLMs.

For his or her research printed within the journal Nature Drugs, the group generated 1000’s of articles containing misinformation and inserted them into an AI coaching dataset and carried out normal LLM queries to see how usually the misinformation appeared.

Prior analysis and anecdotal proof have proven that the solutions given by LLMs equivalent to ChatGPT aren’t all the time appropriate and, in truth, are generally wildly off-base. Prior analysis has additionally proven that misinformation planted deliberately on well-known web websites can present up in generalized chatbot queries. On this new research, the analysis workforce needed to know the way straightforward or tough it could be for malignant actors to poison LLM responses.

To search out out, the researchers used ChatGPT to generate 150,000 medical paperwork containing incorrect, outdated and unfaithful knowledge. They then added these generated paperwork to a check model of an AI medical coaching dataset. They then educated a number of LLMs utilizing the check model of the coaching dataset. Lastly, they requested the LLMs to generate solutions to five,400 medical queries, which have been then reviewed by human specialists seeking to spot examples of tainted knowledge.

The analysis workforce discovered that after changing simply 0.5% of the information within the coaching dataset with tainted paperwork, all of the check fashions generated extra medically inaccurate solutions than that they had previous to coaching on the compromised dataset. As one instance, they discovered that each one the LLMs reported that the effectiveness of COVID-19 vaccines has not been confirmed. Most of them additionally misidentified the aim of a number of frequent medicines.

The workforce additionally discovered that lowering the variety of tainted paperwork within the check dataset to simply 0.01% nonetheless resulted in 10% of the solutions given by the LLMs containing incorrect knowledge (and dropping it to 0.001% nonetheless led to 7% p.c of the solutions being incorrect), suggesting that it requires only some such paperwork posted on web sites within the actual world to skew the solutions given by LLMs.

The workforce adopted up by writing an algorithm capable of determine medical knowledge in LLMs after which used cross-referencing to validate the information, however they be aware that there isn’t any reasonable solution to detect and take away misinformation from public datasets.

Extra info:
Daniel Alexander Alber et al, Medical giant language fashions are weak to data-poisoning assaults, Nature Drugs (2025). DOI: 10.1038/s41591-024-03445-1

Quotation:
Check of ‘poisoned dataset’ exhibits vulnerability of LLMs to medical misinformation (2025, January 11)
retrieved 11 January 2025
from https://medicalxpress.com/information/2025-01-poisoned-dataset-vulnerability-llms-medical.html

This doc is topic to copyright. Aside from any honest dealing for the aim of personal research or analysis, no
half could also be reproduced with out the written permission. The content material is supplied for info functions solely.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Check of ‘poisoned dataset’ exhibits vulnerability of LLMs to medical misinformation

Related articles

Majority imagine that Eire’s two-tier well being system is unfair

Vietnam Reaches Medical Milestone With First Robotic Mind Surgical procedure on Little one with Drug-Resistant Epilepsy

Research finds dwelling in rural environments in first 5 years of life might be a danger issue for growing sort 1 diabetes

LEAVE A REPLY Cancel reply

Latest posts

Majority imagine that Eire’s two-tier well being system is unfair

Free 7 Day Wholesome Meal Plan (August 4-10)

Tamicka misplaced 39 kilos – Black Weight Loss Success

Vietnam Reaches Medical Milestone With First Robotic Mind Surgical procedure on Little one with Drug-Resistant Epilepsy

Teriyaki Hen and Bowtie Pasta Salad

Air Fryer Quesadilla ⋆ 100 Days of Actual Meals

Popular Posts

Majority imagine that Eire’s two-tier well being system is unfair

Free 7 Day Wholesome Meal Plan (August 4-10)

Tamicka misplaced 39 kilos – Black Weight Loss Success

Popular category

Check of ‘poisoned dataset’ exhibits vulnerability of LLMs to medical misinformation

Related articles

LEAVE A REPLY Cancel reply

Stay Connected

Latest posts

Popular Posts

Popular category