Summary
This study introduces a multimodal biological age estimation framework that integrates facial images, tongue images, and retinal fundus images to quantify systemic aging. Biological age represents the functional state of the body across multiple physiological systems, rather than simply reflecting chronological time. The technical novelty lies in the model’s ability to fuse heterogeneous visual signals into a single, highly accurate aging metric.
The strength of this approach is demonstrated by its biological age estimation accuracy. The multimodal fusion model achieved a mean absolute error of 1.94 years, compared with substantially higher errors when using individual modalities alone (5.65 years for tongue images, 4.10 years for facial images, and 3.32 years for retinal images). This result provides quantitative evidence that aging manifests across multiple organ systems and that integrating these signals yields a more precise and biologically meaningful measure. The achieved accuracy also markedly exceeds that of conventional deep-learning baselines, which typically report errors of around five years for similar tasks.
Crucially, the clinical significance of biological age is established through its association with chronic disease incidence and progression, measured using disease-free survival. Individuals with accelerated biological aging exhibited earlier onset and faster progression of chronic conditions, even when traditional clinical indicators remained within normal ranges. By linking non-invasive imaging to future disease risk trajectories, this work advances biological age from a descriptive marker to a robust, scalable tool for early risk stratification and preventive healthcare.
Wang, J., Liu, B., Fang, S., Wang, G., Zhang, Y., & Chen, J. (2024). Accurate estimation of biological age and its application in disease prediction using a multimodal image Transformer system. Proceedings of the National Academy of Sciences, 121(2), Article e2308812120. https://doi.org/10.1073/pnas.2308812120