A new international study suggests most clinical artificial intelligence (AI) tools are not yet ready for safe, equitable use at the bedside.
In a scoping review published in Medical Research Archives, researchers including CUNY SPH Assistant Professor Karmen Williams and MS alum Ilse Siguachi examined 390 clinical AI and machine learning models published between March 2020 and December 2021 to see whether developers planned for model updating, followed best-practice development standards, and reported who was included in their data. Only 9% of studies mentioned how their models would be updated over time, even though performance is known to drift as clinical practice, populations, and health systems change.
The team found that 98% of models were still in the research phase and only 2% had been implemented in real-world care, revealing a large gap between technical innovation and clinical deployment. Just 12% of models followed established reporting or development standards, and 84% failed to report the racial or ethnic breakdown of the data used to train the algorithm, raising serious concerns about transparency and equity.
To address these gaps, the authors developed a simplified, six-item version of the CHARMS (Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies) checklist focused on applicability, reproducibility, and patient safety at the point of care. The tool is designed for frontline clinicians who may not have advanced data science training but need a quick way to assess whether an AI model has been rigorously designed, externally validated, and prepared for updating over its lifetime.
Using this modified checklist, the study showed that models which proactively described how they would be updated tended to score higher on overall quality, suggesting that planning for model maintenance is a marker of more robust research. The authors also call for wider use of open data registries to support independent verification and for routine reporting of gender and ethnicity to avoid reinforcing existing health disparities.
The authors warn that as healthcare moves rapidly toward more autonomous, “agentic” AI systems, the absence of simple standards for evaluating clinical models poses growing risks for patients.
“Without practical tools to screen algorithms before deployment, health systems may unknowingly introduce AI that is poorly validated, not regularly updated, and blind to the needs of diverse populations,” says Dr. Williams.



