Title : Evaluating the efficacy of artificial intelligence as a patient education tool in paediatric ophthalmology: A cross sectional comparative study
Abstract:
Background: As patients increasingly turn to digital platforms for medical information, the role of artificial intelligence (AI) tools such as ChatGPT and DeepSeek in patient education is expanding. However, concerns remain regarding the readability, accuracy, and overall quality of health content produced by AI models. This study is the first to compare AI-generated patient education materials with clinician-authored resources for parents of children with common ophthalmic conditions.
Methods: We evaluated the readability and content quality of AI patient education materials for four common paediatric ophthalmic conditions (chalazion, blepharitis, amblyopia, and strabismus) compared to NHS-approved leaflets from Moorfields Eye Hospital. Responses from ChatGPT-3.5 and DeepSeek R1 were generated using structured prompts based on NHS leaflet headings. These were assessed alongside NHS materials using four validated readability metrics, the Flesch-Kincaid Grade Level (FKGL) and Reading Ease (FRE), Gunning Fog Score, SMOG index, and the Ensuring Quality Information for Patients (EQIP) tool for quality assessment. Statistical analysis was conducted using Analysis of Variance (ANOVA) testing and Cronbach’s alpha to determine inter-rater reliability.
Results: NHS patient information leaflets were significantly more readable than those generated by AI models (mean FKGL: 4.1 vs ChatGPT: 7.5, DeepSeek: 9.2; p = 0.0011). Only human-generated leaflets met the recommended FRE score of 65 for the general population, with a mean FRE of 74.8 versus 57.7 for ChatGPT and 48.1 for DeepSeek (p = 0.0038). EQIP scores showed significant differences in information quality (p = 0.0041), with human- generated leaflets scoring highest (mean = 69.61, SD = 4.74), outperforming ChatGPT (mean = 58.26, SD = 3.35) and DeepSeek (mean = 56.75, SD = 4.73).
Conclusion: While AI tools offer accessible and timely health information, their outputs currently fall short in readability and quality compared to clinician-authored resources. These limitations, along with a lack of references for information, pose potential risks for patients, particularly for those with limited health or digital literacy. Until LLMs can reliably meet standards for clarity and accuracy, NHS-approved materials should remain the primary reference for patient education. Nonetheless, with refinement and human oversight, AI has the potential to support scalable, personalised, and interactive health communication in the future.

