Skip to main content
News Directory 3
  • Home
  • Business
  • Entertainment
  • Health
  • News
  • Sports
  • Tech
  • World
Menu
  • Home
  • Business
  • Entertainment
  • Health
  • News
  • Sports
  • Tech
  • World
Ophthalmologist AI: Textual Question Answering Model - News Directory 3

Ophthalmologist AI: Textual Question Answering Model

November 19, 2025 Jennifer Chen Health
News Context
At a glance
  • This research‌ investigated the performance of several Foundational‌ Models (FMs) - ‍Claude 3.5 Sonnet, GPT-4o, Qwen2.5-Max,DeepSeek V3,and Gemini Advanced - ⁤on ophthalmology questions,comparing them to ⁢ophthalmology experts,trainees,and⁢ junior...
  • * Question Source: Questions were ⁤sourced from⁤ a textbook used for​ the Fellowship of⁣ the Royal College of Ophthalmologists part 2 exam (360 questions total, 13 multimodal, 345...
  • * Textual Questions: ⁢ * Claude 3.5 Sonnet performed best with an accuracy of 77.7%.
Original source: ajmc.com

Summary of the Research on Foundational Models (FMs) in Ophthalmology

This research‌ investigated the performance of several Foundational‌ Models (FMs) – ‍Claude 3.5 Sonnet, GPT-4o, Qwen2.5-Max,DeepSeek V3,and Gemini Advanced – ⁤on ophthalmology questions,comparing them to ⁢ophthalmology experts,trainees,and⁢ junior physicians. Here’s a breakdown of the key findings:

Methodology:

* Question Source: Questions were ⁤sourced from⁤ a textbook used for​ the Fellowship of⁣ the Royal College of Ophthalmologists part 2 exam (360 questions total, 13 multimodal, 345 textual). ⁣an additional 27 multimodal questions were created, ‌resulting in 40 images used for testing.
* FM Testing: 7 FMs were tested​ without customization, fine-tuning, or additional guidance. Questions were inputted between September 2024 ​and⁤ March 2025.
* Human Evaluation: 10 physicians with varying experience in ophthalmology also evaluated the multimodal ⁢questions.

Key Results:

* Textual Questions:

⁢ * Claude 3.5 Sonnet performed best with an accuracy of 77.7%.
‍ ​ * Other ⁣models: PT-4o (69.9%), Qwen2.5-Max (69.3%), DeepSeek V3 ‍(63.2%), Gemini Advanced (62.6%).
​ ‍ * Claude 3.5 sonnet ⁤performed comparably to ophthalmology⁤ experts (difference of ‌1.3%).
* Trainees and unspecialized junior physicians performed significantly worse than⁣ Claude 3.5 Sonnet.
* Claude 3.5 ⁤Sonnet also outperformed the mean‍ candidate score and the official pass mark.
* Multimodal Questions:

‌ ⁤ * GPT-4o ‍ had​ the highest accuracy (57.5%), followed by Claude 3.5 sonnet (47.5%).
⁢ *‌ Ophthalmology experts scored 75.7%, FMs averaged 42%, ‍and ⁢trainees ‌scored 71.3%.
* GPT-4o and Claude 3.5 Sonnet ⁤showed the‍ highest agreement with ⁢physicians.

Limitations:

* The study acknowledges ⁤an unclear​ correlation between‍ aptitude and exam⁣ performance.

In essence, the​ study demonstrates that FMs, especially Claude 3.5 Sonnet and GPT-4o, show​ promising potential in answering ophthalmology questions, even rivaling the performance of experts in textual questions. Tho, they still lag ‍behind experts in multimodal ⁣question answering.

Share this:

  • Share on Facebook (Opens in new window) Facebook
  • Share on X (Opens in new window) X

Related

Search:

News Directory 3

ByoDirectory is a comprehensive directory of businesses and services across the United States. Find what you need, when you need it.

Quick Links

  • Disclaimer
  • Terms and Conditions
  • About Us
  • Advertising Policy
  • Contact Us
  • Cookie Policy
  • Editorial Guidelines
  • Privacy Policy

Browse by State

  • Alabama
  • Alaska
  • Arizona
  • Arkansas
  • California
  • Colorado

Connect With Us

© 2026 News Directory 3. All rights reserved.

Privacy Policy Terms of Service