HYBRID EVENT: You can participate in person at Rome, Italy or Virtually from your home or work.

3rd Edition of

International Ophthalmology Conference

March 10-12, 2025 | Rome, Italy

IOC 2025

Evaluation of artificial intelligence Chatbots’ reponses regarding questions on common ophthalmic conditions

Speaker at International Ophthalmology Conference 2025 - Simren Shah
Dean McGee Eye Institute, United States
Title : Evaluation of artificial intelligence Chatbots’ reponses regarding questions on common ophthalmic conditions

Abstract:

Purpose: While AI chatbots are increasingly used for patient education, their effectiveness in providing accurate, comprehensive, and understandable information about ophthalmologic conditions remains understudied. We performed an observational, cross-sectional study to evaluate the ability of five AI chatbots (Chat GPT 3.5, Bing Chat, Google Gemini, Perplexity AI, and YouChat) to educate patients on common ophthalmologic conditions by assessing the accuracy, quality, and comprehensiveness of their responses as rated by participants with varying levels of ophthalmic knowledge.
Methods: There were fifteen participants stratified by ophthalmic knowledge, ranging from college- educated adults to practicing ophthalmologists. Ten questions were submitted to each AI chatbot, and de-identified chatbot responses were sent to the respondents. Using a weighted scale, respondents were asked to evaluate the overall quality and five metrics of each chatbot’s response: scientific accuracy, comprehensiveness, balanced explanation, financial considerations, and understandability. Scores from 150 evaluations were averaged, and comparative statistics using mixed-effects models were performed to evaluate significant differences.
Results: Chat GPT 3.5 received the highest overall quality score, while Bing Chat received the lowest (p<0.0001). No significant difference was found in scientific accuracy. Chat GPT 3.5 received the highest comprehensiveness (4.2; p=0.0002) and understandability scores (4.3; p=0.004), while Bing Chat received the lowest scores of 3.4 and 2.7, respectively. Chat GPT 3.5, Perplexity AI, and YouChat had higher scores for balanced explanation than Bing Chat (p <0.0001). For financial considerations, Chat GPT 3.5, Perplexity AI, and YouChat had higher scores than Bing Chat and Google Gemini (p<0.0001). Only ophthalmology residents, optometrists, and ophthalmologists could distinguish scientific accuracy among the chatbots.
Conclusion: Participants graded particular chatbots (e.g., ChatGPT 3.5) with higher scores than others in several of the studied metrics regarding questions about common ophthalmologic diagnoses. However, as the quality of these responses varies across chatbots, eye care professionals remain an authoritative source for patient education.

Biography:

Simren Shah is currently a third-year undergraduate studying Biomedical Engineering at Johns Hopkins University. She conducts research in computational biology under the guidance of Dr. Pablo Iglesias, head of the Electrical and Computer Engineering Department at Johns Hopkins. Additionally, for several projects (including this one), she collaborates with Dr. Kamran Riaz, Vice Chair for Clinical Research at the Dean McGee Eye Institute in Oklahoma, on projects that bridge biomedical engineering and clinical research.

Watsapp