
Generative AI diagnosis is stepping into neurology with striking results. In a new study, ChatGPT-4o reached 65.5% accuracy for the correct leading diagnosis in challenging polyneuropathy cases, statistically matching non-specialist neurologists (63%) while trailing specialists (74%). The model outperformed non-specialists by correctly including the right diagnosis in its differential list more often and selecting appropriate confirmatory tests at a higher rate.
AI Rivals Non-Specialists on Real Patient Data
Researchers analyzed 100 consecutive cases from two tertiary centers in Milan, converting them into standardized English summaries containing demographics, symptoms, exam findings, electrophysiology, and initial labs. Using a zero-shot chain-of-thought prompt, generative AI diagnosis delivered one leading diagnosis, two differentials, and one recommended test for each case. The same information was given to 19 peripheral-nerve specialists and 17 non-specialists, with performance measured against diagnoses confirmed after at least 12 months of follow-up.
Where the Model Excels and Where It Stumbles
ChatGPT-4o demonstrated particular strength in compiling broader and more accurate differential lists than non-specialists, along with 15-percentage-point better test selection. Its leading-diagnosis sensitivity reached 57.3% with 72.7% precision. Most errors stemmed from overlooking provided clinical details or over-relying on laboratory values, while hallucinations accounted for roughly one-third of mistakes. After seeing the AI output, non-specialists changed their initial assessments in 21.8% of cases, producing measurable gains in accuracy, sensitivity, and F1-score.
Practical Value in Specialist-Scarce Settings
These findings suggest generative AI diagnosis could help reduce unnecessary testing and speed up referrals in primary or secondary care where neurologist access is limited. Because polyneuropathy already consumes substantial healthcare resources and often faces long diagnostic delays, embedding this technology as a calibrated second opinion may shorten time-to-etiology and lower downstream costs. Specialist performance remained largely unchanged after reviewing the AI suggestions, reinforcing that the greatest benefit lies in supporting less-experienced clinicians rather than replacing expert judgment.
This head-to-head evaluation of generative AI diagnosis offers a clear roadmap for integrating large language models into neurologic workflows, provided future prospective trials confirm real-world impact on patient outcomes and resource use.
Recent Posts

Enhancing Tuberculosis Outcomes Through Counselling Incentives

NICE Guidance on Mirvetuximab Ovarian Cancer Treatment
