Generative AI Diagnosis in Neurology Enhances Non-Specialist Accuracy

By João L. Carapinha

June 5, 2026

generative AI diagnosis

Generative AI diagnosis is stepping into neurology with striking results. In a new study, ChatGPT-4o reached 65.5% accuracy for the correct leading diagnosis in challenging polyneuropathy cases, statistically matching non-specialist neurologists (63%) while trailing specialists (74%). The model outperformed non-specialists by correctly including the right diagnosis in its differential list more often and selecting appropriate confirmatory tests at a higher rate.

AI Rivals Non-Specialists on Real Patient Data

Researchers analyzed 100 consecutive cases from two tertiary centers in Milan, converting them into standardized English summaries containing demographics, symptoms, exam findings, electrophysiology, and initial labs. Using a zero-shot chain-of-thought prompt, generative AI diagnosis delivered one leading diagnosis, two differentials, and one recommended test for each case. The same information was given to 19 peripheral-nerve specialists and 17 non-specialists, with performance measured against diagnoses confirmed after at least 12 months of follow-up.

Where the Model Excels and Where It Stumbles

ChatGPT-4o demonstrated particular strength in compiling broader and more accurate differential lists than non-specialists, along with 15-percentage-point better test selection. Its leading-diagnosis sensitivity reached 57.3% with 72.7% precision. Most errors stemmed from overlooking provided clinical details or over-relying on laboratory values, while hallucinations accounted for roughly one-third of mistakes. After seeing the AI output, non-specialists changed their initial assessments in 21.8% of cases, producing measurable gains in accuracy, sensitivity, and F1-score.

Practical Value in Specialist-Scarce Settings

These findings suggest generative AI diagnosis could help reduce unnecessary testing and speed up referrals in primary or secondary care where neurologist access is limited. Because polyneuropathy already consumes substantial healthcare resources and often faces long diagnostic delays, embedding this technology as a calibrated second opinion may shorten time-to-etiology and lower downstream costs. Specialist performance remained largely unchanged after reviewing the AI suggestions, reinforcing that the greatest benefit lies in supporting less-experienced clinicians rather than replacing expert judgment.

This head-to-head evaluation of generative AI diagnosis offers a clear roadmap for integrating large language models into neurologic workflows, provided future prospective trials confirm real-world impact on patient outcomes and resource use.

Reference url

Recent Posts

tuberculosis counselling incentives
Enhancing Tuberculosis Outcomes Through Counselling Incentives

By João L. Carapinha

June 5, 2026

Tuberculosis counselling incentives that combine conditional cash transfers with structured pre- and post-test counselling are proving highly effective at overcoming socioeconomic barriers to treatment adherence. This integrated strategy reduces long-term disease transmission while requiring only...
Mirvetuximab ovarian cancer
NICE Guidance on Mirvetuximab Ovarian Cancer Treatment

By HEOR Staff Writer

June 5, 2026

Mirvetuximab ovarian cancer treatment has taken a major step forward after NICE recommended mirvetuximab soravtansine for adults with folate receptor-alpha positive, platinum-resistant high-grade serous epithelial ovarian, fallopian tube or primary peritoneal cancer who have received one to three...
Immunotherapy for Liver Cancer
Exploring Immunotherapy for Liver Cancer through the EMERALD-3 Trial

By HEOR Staff Writer

June 5, 2026

Immunotherapy for liver cancer has advanced substantially with the EMERALD-3 Phase III trial, which showed that