AMBOSS Newsroom
Product Update

Ranked #1 in Stanford–Harvard NOHARM Study for clinical care safety: AMBOSS AI Mode (LiSA 1.0)

Published on
February 12, 2026
AMBOSS Newsroom
Product Update

Ranked #1 in Stanford–Harvard NOHARM Study for clinical care safety: AMBOSS AI Mode (LiSA 1.0)

Published on
February 12, 2026
Contributors
AMBOSS AI Mode is grounded in evidence and designed for trust.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Background

Clinicians are increasingly turning to AI models for clinical decision support (67% of physicians in the US use AI daily) [1]. However, most evaluations of AI models rely on synthetic or exam-style benchmarks that fail to capture the reality of clinical uncertainty. Independent, specialist-validated benchmarks are needed to assess clinical safety and appropriateness in real-world care settings.

Methods

The NOHARM benchmark evaluated 31 AI systems using 100 real clinical cases derived from primary care to specialist consultations across 10 medical specialties. The cases intentionally preserved incomplete information and real-world ambiguity. 

Model outputs were evaluated for clinical appropriateness (benefit vs. harm) using 12,747 expert annotations provided by 29 board-certified physicians. Performance was assessed across safety and completeness dimensions.

Results

AMBOSS AI Mode (LiSA 1.0) ranked #1 overall in clinical safety benchmark among the 31 evaluated AI systems. It demonstrated consistently strong performance across realistic clinical scenarios and ranked in the top tier across evaluated dimensions.

Interpretation

The performance of AMBOSS AI Mode (LiSA 1.0) in the benchmark reflects an approach to clinical AI that prioritizes:

  • Patient safety and positive impact on clinical care
  • Context-aware medical reasoning
  • Alignment with real-world clinical workflow

Conclusion

In an independent, specialist-validated evaluation using real clinical cases, AMBOSS AI Mode ranked highest overall among the 31 AI systems evaluated using the NOHARM benchmark. Benchmarks such as NOHARM provide essential evidence for responsible adoption of AI in clinical care. 

Funding & Disclosures

The NOHARM benchmark was developed by academic researchers at Stanford University School of Medicine, Harvard Medical School, and the ARISE AI Research Network. AMBOSS did not fund the study, and AMBOSS personnel were not involved in its design or analysis. AMBOSS employees contributed to this poster.

References

[1] Offcall 2025 Physicians’ AI Report

2025-physicians-ai-report.offcall.com 

[2] ARISE AI NOHARM Benchmark

bench.arise-ai.org