Sebastián Sarasti - Data Scientist

The data used come from the following Kaggle data set.
The dataset has three folders with pictures for training, validation, and testing. In each of those folders, you can find two subfolders, one for each category 0 (Glaucoma no present) and 1 (Glaucoma present).

Data was loaded and transformed with the processed established to appply ViT. This was done with AutoImageProcessor object with a transformation.
A data loader was created for each subset of data, and it was applied a collector function to concatenate the feature and target tensors.

The model used was a ViT model with a pre-trained model from the library HuggingFace.
The model was trained with a batch size of 32, 10 epochs, and a learning rate of 0.001.
The layers of the ViT were frozen to add three layers at the end. The first layer added was a flatten layer, the second was a Linear layer with a ReLU as activation functions, and the third layer was also Linear but with a sigmoid activation function.
The loss function for the final model was the Binary-Cross entropy and the optimizer was Adam.

Finally, assessing the model over the dataset for testing, the model attained an overall accuracy of 97%, indicating a successful training process. However, relying solely on accuracy can introduce bias, especially for imbalanced classes. To address this, additional metrics such as recall and precision were evaluated for each predicted class.
For class 0 (no glaucoma), the model demonstrated a precision of 0.96 and a recall of 0.98. In contrast, for class 1 (glaucoma), the model achieved a precision of 0.97 and a recall of 0.94. These results signify the model's ability to generalize effectively across both classes.
Public code is available in the following GitHub repo.
Public Pytorch model is also available in the following Huggingface repo.

Glaucoma Detection with Computer Vision