Large Language Models in Evidence-Based Medicine

By João L. Carapinha

July 3, 2026

large language models

Large language models deliver rapid synthesis of medical literature, but according to research on fast information and slow evidence, they cannot independently generate validated evidence for clinical decisions. These tools sit at an intermediate stage in the data-information-evidence-practice hierarchy, gaining true evidentiary value only after rigorous human appraisal, methodological review, and contextual judgment.

Current fears of fabrication and publication overload represent an intensification of long-standing challenges rather than an entirely new threat. When correctly positioned, large language models strengthen evidence-based medicine by accelerating literature screening, mapping knowledge boundaries, and spotlighting genuine evidence gaps.

A Four-Level Knowledge Framework

The analysis applies a structured hierarchy that separates raw observations (data), interpreted findings (information), methodologically vetted results (evidence), and actionable clinical guidance (practice). This lens, grounded in established information science and evidence-based medicine principles, systematically identifies where large language models add value and where they fall short.

Performance Across Review Stages

Empirical testing shows large language models approaching human performance in title-and-abstract screening and achieving over 98 percent accuracy in structured data extraction. Accuracy declines to roughly 80 percent for PICO element identification, drops further in full-text screening, and reaches only 57-70 percent agreement with experts on risk-of-bias assessments. Retrieval-augmented systems improve citation reliability and answer 48 percent of real-world clinical queries with existing evidence, yet still cannot perform the critical synthesis steps of consistency evaluation, indirectness assessment, or confidence calibration.

Guarding Evidentiary Integrity in HEOR

Health economics and outcomes research teams must therefore implement architectural guardrails that restrict large language models to data-to-information tasks while reserving appraisal and applicability decisions for humans. This disciplined governance protects the quality of economic evaluations that drive market access, pricing, and reimbursement. The same framework also enables systematic gap mapping to direct research investment toward high-value evidence generation, ensuring automation enhances rather than erodes the standards that underpin trustworthy healthcare policy.

Reference url

Recent Posts

African Pharmacogenomic Integration
African Pharmacogenomic Integration Enhancing Essential Medicine Prescribing in Africa

By João L. Carapinha

July 3, 2026

African Pharmacogenomic Integration has become an urgent policy priority, with evidence showing that more than 10 percent of essential medicines across Africa require genetically guided prescribing to prevent harm and improve outcomes in diverse populations. Current dosage guidelines for HIV, ...
pharmaceutical manufacturing affordability
Pharmaceutical Manufacturing Affordability as a Key to South Africa’s Local Production Goals

By João L. Carapinha

July 3, 2026

Pharmaceutical manufacturing affordability remains the decisive factor in South Africa’s efforts to build domestic capacity for essential medicines and vaccines. Government, industry and research leaders who met at the TIPS Development Dialogue on 17 June made clear that economic barriers to acce...
Portugal Biotech Growth
Catalysts Driving Portugal Biotech Growth in Europe

By João L. Carapinha

July 2, 2026

Portugal Biotech Growth is reshaping the country’s standing in European life sciences, delivering a compact yet high-potential ecosystem anchored by world-class universities in Porto and Coimbra, innovation parks, and forward-looking industrial policy. Once overshadowed by larger neighbours, Port...