Application of AutoML Technique for Predicting Academic Performance of Students Öğrencilerin Akademik Performanslarının Tahmin Edilmesi için AutoML Tekniğinin Uygulanması


Aghalarova S., Keser S.

El-Cezeri Journal of Science and Engineering, vol.9, no.2, pp.394-412, 2022 (Scopus) identifier

  • Publication Type: Article / Article
  • Volume: 9 Issue: 2
  • Publication Date: 2022
  • Doi Number: 10.31202/ecjse.946505
  • Journal Name: El-Cezeri Journal of Science and Engineering
  • Journal Indexes: Scopus
  • Page Numbers: pp.394-412
  • Keywords: AutoML, Educational Data Mining, Machine Learning, Prediction Student Academic Performance
  • Eskisehir Osmangazi University Affiliated: Yes

Abstract

© 2022, TUBITAK. All rights reserved.Educational Data Mining is the development of data mining methods to facilitate the analysis of large amounts of data obtained from various educational sources. Issues such as providing feedback to educators, suggesting courses to students, identifying undesirable student behavior, and predicting the academic performance of students can be shown among the fields of Educational Data Mining. The quality of education can be improved with the improvements to be made in these areas by creating the right models. The selection of suitable machine learning algorithms to build accurate models is highly important for educators and data scientists. In this study, the best model for the dataset used in the study is investigated with the Automatic Machine Learning method in order to predict the students' academic performance. The best model can be found without dealing with difficult tasks such as data preprocessing, model selection, and hyper-parameter optimization using Automatic Machine Learning. In the study, the Distributed Random Forest algorithm is determined as the best algorithm for the real-world data set. And, the hyper-parameters of the algorithm are optimized using grid search. In the results of the experiments, the default hyper-parameters of the Distributed Random Forest algorithm and the accuracy and f-score values were obtained as 77.50% and 80.01%, respectively. For the optimal hyper-parameters found by grid search, the accuracy and f-score values are calculated as 82.30% and 82.50%, respectively. In the study, the proposed AutoML method is compared with traditional machine learning algorithms include KNN and SVM. The proposed method achieves higher results than both algorithms.