
Large language models deliver rapid synthesis of medical literature, but according to research on fast information and slow evidence, they cannot independently generate validated evidence for clinical decisions. These tools sit at an intermediate stage in the data-information-evidence-practice hierarchy, gaining true evidentiary value only after rigorous human appraisal, methodological review, and contextual judgment.
Current fears of fabrication and publication overload represent an intensification of long-standing challenges rather than an entirely new threat. When correctly positioned, large language models strengthen evidence-based medicine by accelerating literature screening, mapping knowledge boundaries, and spotlighting genuine evidence gaps.
A Four-Level Knowledge Framework
The analysis applies a structured hierarchy that separates raw observations (data), interpreted findings (information), methodologically vetted results (evidence), and actionable clinical guidance (practice). This lens, grounded in established information science and evidence-based medicine principles, systematically identifies where large language models add value and where they fall short.
Performance Across Review Stages
Empirical testing shows large language models approaching human performance in title-and-abstract screening and achieving over 98 percent accuracy in structured data extraction. Accuracy declines to roughly 80 percent for PICO element identification, drops further in full-text screening, and reaches only 57-70 percent agreement with experts on risk-of-bias assessments. Retrieval-augmented systems improve citation reliability and answer 48 percent of real-world clinical queries with existing evidence, yet still cannot perform the critical synthesis steps of consistency evaluation, indirectness assessment, or confidence calibration.
Guarding Evidentiary Integrity in HEOR
Health economics and outcomes research teams must therefore implement architectural guardrails that restrict large language models to data-to-information tasks while reserving appraisal and applicability decisions for humans. This disciplined governance protects the quality of economic evaluations that drive market access, pricing, and reimbursement. The same framework also enables systematic gap mapping to direct research investment toward high-value evidence generation, ensuring automation enhances rather than erodes the standards that underpin trustworthy healthcare policy.
Recent Posts

African Pharmacogenomic Integration Enhancing Essential Medicine Prescribing in Africa

Pharmaceutical Manufacturing Affordability as a Key to South Africa’s Local Production Goals
