NLTK

Text Detection Written by Artificial Intelligences (AIs)

Introduction

In today's context, AIs serve as ubiquitous assistants across various tasks. While they enhance human productivity in routine activities, there is a concern about their potential to diminish human creativity in generating new content.

In critical scenarios, such as academic essays, where understanding genuine human thoughts is paramount, the ability to distinguish whether an AI assistant was employed becomes crucial.

Discerning between text generated by AIs and humans poses a formidable challenge. However, this study aims to demonstrate that traditional Machine Learning (ML) models can effectively classify whether a given text was authored by AI or human.

Link to the app.

Data information:
  1. The data used come from the following Kaggle data set.
  2. The dataset roughly 400k samples, but in this case, it was only selected a random sample of 10% to train the models. This was done because it was not available enough power resources to transform the data.
Data treatment
  • It was removed special characters from the text.
  • After that, it was tokenized the text, and all tokens were lowercased to have a uniform vocabulary.
  • Next, it was removed all tokens which were considered stopwords and had lower length that 3 characters.
  • Then, it was steemed the tokens to have a lower vocabulary.
  • Finally, with the data cleaned it was calculated the TF-IDF matrix and reduced for 5 dimensions with Principal Component Analysis (PCA).
  • 70% of data were used for training, while 30% for testing.
Model
  • It was built several models and assessed which one gave the best performance (Random Forest Classifiers, XGBoost Classifiers, Logistic Regressions, and K-NN Classifiers).
  • All models were logged with MLFlow and the results were saved in a Dagshub repository.
  • The metrics used to evaluated the model's performance were the accuracy, precision, recall, and F1-score.
  • The architecture used to deploy this model is shown in the following picture:
  • Architecture used
Results

Sebastián

Sarasti

Follow me on my social media channels to know more about my projects.

Follow Us

Get In Touch

Pujilí, Cotopaxi, Ecuador

sebitas.alejo@hotmail.com

© Sebastián Sarasti Zambonino. All Rights Reserved.

Designed by HTML Codex

Edited by Sebastián Sarasti and Angel Bastidas