GPT-4 performed close to the level of expert doctors in eye assessments

As learning language models (LLMs) continue to advance, so do questions about how they can benefit society in fields such as medicine. A recent research from the University of Cambridge School of Clinical Medicine found that OpenAI performed nearly as well as experts in the field in ophthalmology evaluation of GPT-4, the Financial Times reported first.

In Research, ed PLOS Digital Healthresearchers tested its predecessor, the LLM GPT-3.5Google’s PaLM 2 and LLaMA of methane 87 multiple choice questions. Five expert ophthalmologists, three trainee ophthalmologists and two unqualified junior doctors took the same test. The questions came from a practice textbook on everything from photosensitivity to lesions. The contents are not publicly available, so the researchers believe that LLMs could not have been trained on them before. A ChatGPT equipped with GPT-4 or GPT-3.5 was given three chances to respond definitively or its response was marked as zero.

GPT-4 Getting 60 out of 87 questions correct, he got high scores from trainees and junior doctors. Although this was significantly higher than the junior doctors’ average of 37 correct answers, it beat the three trainees’ average of 59.7. One expert ophthalmologist answered only 56 questions correctly, but five people scored an average of 66.4 correct answers, beating the machine. PALM 2 49 points, and GPT-3.5 scored 42 points. LLaMa scored the lowest with 28 points, lower than junior doctors. It should be noted that these tests took place in the middle of 2023.

Although these results have potential benefits, there are also a number of risks and concerns. The researchers noted that the study offered a limited number of questions, particularly in certain categories, meaning actual results may vary. LLMs also have a tendency to “hallucinations” or fix everything. If it’s a trivial fact, it’s another thing to pretend it’s cataracts or cancer. As with many LLM uses, the systems lack nuance and create additional opportunities for inaccuracy.

Source link

Related Posts

Leave a Reply Cancel reply