Summary
This study introduces MetaGP, a generative medical foundation model designed to support complex clinical decision-making using electronic health records (EHRs), medical literature, and imaging data. Unlike general-purpose medical language models, MetaGP was specifically developed to address unmet clinical challenges, particularly rare disease diagnosis and emergency condition identification, where data are sparse, time is critical, and diagnostic errors carry high risk.
The scientific novelty lies in the model’s demonstrated clinical utility in these difficult settings. In rare disease diagnosis tasks, MetaGP achieved an average diagnostic score of 1.57, substantially outperforming GPT-4 (0.93) and exceeding the performance of junior and mid-level general practitioners. In emergency scenarios, the model significantly enhanced physician performance, improving diagnostic accuracy by 53% for junior clinicians and 46% for mid-level clinicians, indicating its ability to function as a reliable decision-support system in high-stakes environments.
A key technical advance underpinning this performance is the scale and specificity of MetaGP’s medical pretraining. The model was trained on approximately 71.4 billion tokens, including 8 million EHRs, 5.4 million academic articles, and 15,731 medical textbooks. This extensive domain-specific corpus enabled MetaGP to produce fewer responses classified as unsafe or clinically inappropriate compared with general-purpose models, addressing a major barrier to the adoption of generative AI in healthcare.
Beyond text-based reasoning, MetaGP introduces an integrated multimodal architecture for medical report generation. By combining the language model with specialized vision encoders for 2D imaging (e.g., chest X-rays) and 3D volumetric data (e.g., CT scans), the system generates clinically coherent imaging reports. In benchmarking evaluations, MetaGP outperformed established multimodal medical models such as LLaVA-Med and Med-Flamingo across both imaging modalities.
The impact of this work is the establishment of a clinically grounded, safety-aware medical foundation model that extends beyond generic question answering. By targeting rare conditions, emergency care, and multimodal clinical reporting, MetaGP demonstrates how generative AI can be adapted to real-world medical complexity while maintaining reliability and clinical relevance.