NLTK

Glaucoma Detection with Computer Vision

Introduction

Glaucoma is an eye disease that can lead to permanent vision loss, primarily caused by elevated eye pressure. While the condition is serious, early detection can significantly reduce and control the damage.

In the current landscape, various methods are available for diagnosing glaucoma. However, it's noteworthy that AI techniques, especially those involving computer vision, are not yet widely adopted in the medical community. This project serves as an initiative to demonstrate the potential of AI, coupled with computer vision, in the early detection of glaucoma.

The approach employed in this project leverages Vision Transformers (ViT) to generate a dense representation, which is then utilized as a vector for classification. Transfer Learning is applied by adding additional layers to the pre-trained ViT model. For more detailed information about the ViT architecture used in this work, refer to the following documentation: Vision Transformer (ViT).

Data information:
  1. The data used come from the following Kaggle data set.
  2. The dataset has three folders with pictures for training, validation, and testing. In each of those folders, you can find two subfolders, one for each category 0 (Glaucoma no present) and 1 (Glaucoma present).
Data treatment
  • Data was loaded and transformed with the processed established to appply ViT. This was done with AutoImageProcessor object with a transformation.
  • A data loader was created for each subset of data, and it was applied a collector function to concatenate the feature and target tensors.
Model
  • The model used was a ViT model with a pre-trained model from the library HuggingFace.
  • The model was trained with a batch size of 32, 10 epochs, and a learning rate of 0.001.
  • The layers of the ViT were frozen to add three layers at the end. The first layer added was a flatten layer, the second was a Linear layer with a ReLU as activation functions, and the third layer was also Linear but with a sigmoid activation function.
  • The loss function for the final model was the Binary-Cross entropy and the optimizer was Adam.
Results
  • Finally, assessing the model over the dataset for testing, the model attained an overall accuracy of 97%, indicating a successful training process. However, relying solely on accuracy can introduce bias, especially for imbalanced classes. To address this, additional metrics such as recall and precision were evaluated for each predicted class.
  • For class 0 (no glaucoma), the model demonstrated a precision of 0.96 and a recall of 0.98. In contrast, for class 1 (glaucoma), the model achieved a precision of 0.97 and a recall of 0.94. These results signify the model's ability to generalize effectively across both classes.
  • Public code is available in the following GitHub repo.
  • Public Pytorch model is also available in the following Huggingface repo.

Sebastián

Sarasti

Follow me on my social media channels to know more about my projects.

Follow Us

Get In Touch

Pujilí, Cotopaxi, Ecuador

sebitas.alejo@hotmail.com

© Sebastián Sarasti Zambonino. All Rights Reserved.

Designed by HTML Codex

Edited by Sebastián Sarasti and Angel Bastidas