FNets

Stack Overflow Questions Quality Rating

Introduction

Today, the programming community boasts numerous sites, forums, and applications dedicated to discussing programming issues, bugs, or problems. However, promptly rating these discussions may prove challenging.

In rare scenarios where small groups collaborate (e.g.: people working with new technologies or specific fields), obtaining quality answers can be difficult. Furthermore, assessing the answer's quality may be slow or entirely neglected because the small number of people who are experts in the field.

To reduce the time needed to get a rating in the answer, a machine learning model has been developed. This model predicts answer quality, significantly reducing the time required to determine the usefulness of an answer.

Data information:
  1. The data used in this study is sourced from the following Kaggle dataset.
  2. The dataset comprises approximately 60,000 samples, categorized into three labels: HQ (High-quality posts without any edits), LQ_EDIT (Low-quality posts with a negative score and multiple community edits, though they remain open after these changes), and LQ_CLOSE (Low-quality posts closed by the community without any edits).
Data treatment
  • Special characters were removed from the text using regular expressions.
  • Following that, categories were transformed into labels suitable for the model.
  • The required tokens for the model were then calculated.
  • Subsequently, the data was saved in a dataset object to facilitate its use with PyTorch models.
  • This process was applied to both the training and test datasets.
Model
  • A model was constructed using transfer learning based on the FNet architecture. FNet is a neural network that employs Fourier transformations to replace the self-attention mechanism found in transformer architectures.
  • The weights of the FNet architecture were obtained from the Hugging Face repository.
  • The final hidden state of FNet was flattened and connected to a Sequential model until have 3 neurons, each representing the probability for a label.
  • The architecture utilized to build this model can be seen in the following picture:
Architecture used
Results

Sebastián

Sarasti

Follow me on my social media channels to know more about my projects.

Follow Us

Get In Touch

Pujilí, Cotopaxi, Ecuador

sebitas.alejo@hotmail.com

© Sebastián Sarasti Zambonino. All Rights Reserved.

Designed by HTML Codex

Edited by Sebastián Sarasti and Angel Bastidas