Title : Evaluating ConvNeXt tiny for multi class diabetic retinopathy detection benchmarking a next generation convolutional architecture against established CNN baselines
Abstract:
Background: Most deep learning systems for diabetic retinopathy (DR) screening rely on architectures such as ResNet and EfficientNet, with recent attention shifting towards transformer-based models. However, less is known about whether modernised convolutional backbones alone can deliver comparable performance for clinically relevant multi-class DR grading, without the additional complexity of hybrid or transformer-based designs. This study evaluates ConvNeXt-Tiny as a streamlined, next-generation convolutional architecture for DR detection.
Methods: We performed a retrospective benchmarking study using the publicly available EyePACS dataset of colour fundus photographs annotated across five DR severity levels. Following quality filtering, images were resized, intensity-normalised, and augmented with random flips, rotations, and colour perturbations. A stratified 70/15/15 split was used for training, validation, and testing. ConvNeXt-Tiny was trained end-to-end with standard optimisation and regularisation settings suitable for large-scale image classification. Model performance was evaluated using macro–area under the receiver operating characteristic curve (macro-AUC), macro F1-score, and overall accuracy. Results were contextualised against commonly reported EyePACS baselines for ResNet50 and EfficientNet-B0 from the literature.
Results: Existing studies on EyePACS typically report macro-AUCs of approximately 0.85–0.88 for ResNet50 and around 0.89 for EfficientNet-B0. In our experiments, ConvNeXt-Tiny achieved a macro-AUC of 0.92, macro F1-score of 0.83, and accuracy of 86.4%. Performance gains were most apparent in intermediate DR grades, particularly in distinguishing no DR, mild, and moderate disease, suggesting improved discrimination beyond binary referable/non-referable thresholds. The architecture remained computationally efficient, making it amenable to deployment in large-scale screening workflows.
Conclusion: ConvNeXt-Tiny demonstrates that a modern convolutional architecture can exceed traditional CNN baselines for five-class DR grading on EyePACS without resorting to more complex hybrid or transformer models. Future work should extend these findings through validation across larger and more diverse imaging settings, alongside rigorous assessment of model interpretability, calibration, and cost–effectiveness to enable safe and efficient clinical deployment.

