A new study has shown that ChatGPT performs remarkably well in the field of ophthalmology

According to the Financial Times, a new study from the University of Cambridge’s clinical school of medicine has shown that the GPT-4 artificial intelligence performs almost as well as a specialist in this field in ophthalmology assessments.

Researchers tested the large language models GPT-3.5, Palm 2, and LLaMA with 87 multiple-choice questions. Five specialist ophthalmologists, three ophthalmology trainees, and two young non-specialist doctors participated in the same pilot test.

The questions ranged from light sensitivity to serious eye damage. The answers to these questions were not publicly available, so researchers believe that the large language models were not previously trained on them.

GPT-4 outperformed trainees and young doctors, answering 60 questions correctly. While doctors answered an average of 37 questions correctly, specialist doctors answered 56 questions correctly; however, with an average score of 66.4, they showed that they are still ahead of AI.

PalM 2 answered 49 questions correctly and GPT-3.5 answered 42 questions correctly. LLaMA had the lowest score among the other large language models, with 28 correct answers.

The researchers noted that their new study presented a limited number of questions, especially in specific categories; meaning that actual results may vary. Large language models are inherently prone to “hallucinations” or making things up. For example, misdiagnosing cataracts or cancer and generally low diagnostic accuracy can have very dangerous consequences.

Back to top button