Title : Evaluating the quality and readability of AI chatbot responses to frequently asked questions on basal cell carcinoma: Implications for patient education and digital health communication
Abstract:
Background:
Artificial intelligence (AI)-powered chatbots are increasingly used to deliver health information, providing real-time, accessible responses to patient queries. Concerns persist about the clarity, accuracy, and credibility of AI-generated medical content, especially in oncology and dermatology, where miscommunication can cause patient anxiety or inappropriate self-management. This study evaluated the quality and readability of responses from four AI chatbots—ChatGPT (OpenAI), Gemini (Google), Grok (xAI), and DeepSeek AI—when queried on frequently asked questions (FAQs) about basal cell carcinoma (BCC), the most common skin cancer.
Methods:
Eight public-facing questions were selected from authoritative health websites and Google Trends to reflect common BCC concerns (e.g., causes, symptoms, treatments, prognosis). Each chatbot was queried with the same questions. Responses were assessed by two independent clinical reviewers using a validated Global Quality Score (GQS; scale 0–25), evaluating accuracy, comprehensiveness, and citation use. Readability was measured with the Flesch Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL) to check alignment with health literacy standards.
Results:
Gemini outperformed other chatbots in response quality, achieving a higher GQS (mean = 18.13) due to consistent referencing, unlike other models. However, all chatbots produced responses at a higher reading level than ideal for public health materials (mean FKGL range: 7.8–9.9), with no significant readability differences among models. Gemini’s responses were the most verbose, with longer sentences and higher word counts.