Transfer Learning based Multi Class Common Eye Disease Classification Using Enhanced Vision Transformer
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
ASTU
Abstract
Common Retinal diseases like diabetic retinopathy, glaucoma, age-related macular
degeneration, and cataracts are leading causes of vision loss and blindness globally. In
Ethiopia, the high prevalence coupled with limited number of ophthalmologists poses a major
public health challenge. Early detection and diagnosis through fundus imaging can enable
prompt treatment and prevent vision loss. However, manual analysis is time-consuming and
prone to human error. While convolutional neural networks (CNNs) have shown promise in
retinal image analysis, they have limitations in capturing global context in images. transformers
have emerged as an alternative approach leveraging self-attention mechanism to model global
context in images. This study investigates the potential of vision transformer model for multi class classification of common retinal diseases using fundus photography. A custom vision
transformer model is proposed by incorporating the standard architecture with convolutional
and classifier layers tailored for retinal images. The study employed transfer learning from
ImageNet pre-training. The model is trained on fundus datasets augmented through photometric
and geometric transformations to expand the data size and improve generalization.
Comparative assessment is conducted against Convolutional Neural Networks like ResNet50,
VGG19 and InceptionV3. The custom vision transformer model achieves an accuracy of 95%
in classifying normal retina and five disease classes outperforming vision transformer which
scored 87-91% accuracy, as well as the CNN models( highest accuracy of 94%). Precision,
recall and F1-score and AUC metrics are also improved over both baseline ViT and CNNs.
Class activation mapping visualizations highlight discriminative regions focused on by the
model, providing interpretability. The findings demonstrate the effectiveness of the custom
transfer learning-based vision transformer model for retinal disease classification,
demonstrating its superiority and generalization ability over convolutional neural networks.
This work helps establish vision transformers as a promising approach toward developing
automated screening systems to prevent vision impairment through early diagnosis of retinal
diseases using fundus imaging
