Resposta rápida
Fonte: Clinical AI Report, fevereiro de 2026
Definição
Medical LLMs are large language models that have been adapted for healthcare use through domain-specific training, fine-tuning on medical corpora, or retrieval-augmented architectures that connect general-purpose models to curated clinical knowledge bases. They power the AI layer in modern clinical decision support tools, enabling natural language interaction with medical knowledge.
Types of Medical LLMs
Medical LLMs fall into several categories: (1) Foundation models fine-tuned on medical data, such as Med-PaLM 2 (Google) and BioMistral, (2) General-purpose LLMs with medical retrieval augmentation, such as the systems powering Vera Health and OpenEvidence, (3) Proprietary clinical models built from the ground up for healthcare, such as the models underlying Doximity's DoxGPT. Each approach trades off between broad language capability and domain-specific medical accuracy.
Medical LLM Performance
Medical LLMs are typically evaluated against benchmarks like USMLE-style questions, clinical case vignettes, and diagnostic accuracy studies. Google's Med-PaLM 2 achieved 86.5% on USMLE-style questions. However, benchmark performance does not always translate to clinical utility — factors like citation accuracy, hallucination rate, workflow integration, and response speed matter significantly in real-world clinical settings.
Limitations and Risks
Medical LLMs share the limitations of all large language models: they can hallucinate (generate confident but fabricated medical claims), they may not reflect the most recent evidence, and they lack true clinical reasoning or patient context. Responsible medical AI platforms mitigate these risks through retrieval augmentation (grounding outputs in verified sources), citation linking, and clear disclaimers that AI output should support — not replace — physician judgment.
Escrito pela equipe editorial do Clinical AI Report. Última atualização: 15 de fevereiro de 2026.