Application of AutoML Technique for Predicting Academic Performance of Students Öğrencilerin Akademik Performanslarının Tahmin Edilmesi için AutoML Tekniğinin Uygulanması

Aghalarova, Sevda; Keser, SİNEM

doi:10.31202/ecjse.946505

Application of AutoML Technique for Predicting Academic Performance of Students Öğrencilerin Akademik Performanslarının Tahmin Edilmesi için AutoML Tekniğinin Uygulanması

Atıf İçin Kopyala

Aghalarova S., Keser S.

El-Cezeri Journal of Science and Engineering, cilt.9, sa.2, ss.394-412, 2022 (Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 9 Sayı: 2
Basım Tarihi: 2022
Doi Numarası: 10.31202/ecjse.946505
Dergi Adı: El-Cezeri Journal of Science and Engineering
Derginin Tarandığı İndeksler: Scopus
Sayfa Sayıları: ss.394-412
Anahtar Kelimeler: AutoML, Educational Data Mining, Machine Learning, Prediction Student Academic Performance
Eskişehir Osmangazi Üniversitesi Adresli: Evet

Özet

© 2022, TUBITAK. All rights reserved.Educational Data Mining is the development of data mining methods to facilitate the analysis of large amounts of data obtained from various educational sources. Issues such as providing feedback to educators, suggesting courses to students, identifying undesirable student behavior, and predicting the academic performance of students can be shown among the fields of Educational Data Mining. The quality of education can be improved with the improvements to be made in these areas by creating the right models. The selection of suitable machine learning algorithms to build accurate models is highly important for educators and data scientists. In this study, the best model for the dataset used in the study is investigated with the Automatic Machine Learning method in order to predict the students' academic performance. The best model can be found without dealing with difficult tasks such as data preprocessing, model selection, and hyper-parameter optimization using Automatic Machine Learning. In the study, the Distributed Random Forest algorithm is determined as the best algorithm for the real-world data set. And, the hyper-parameters of the algorithm are optimized using grid search. In the results of the experiments, the default hyper-parameters of the Distributed Random Forest algorithm and the accuracy and f-score values were obtained as 77.50% and 80.01%, respectively. For the optimal hyper-parameters found by grid search, the accuracy and f-score values are calculated as 82.30% and 82.50%, respectively. In the study, the proposed AutoML method is compared with traditional machine learning algorithms include KNN and SVM. The proposed method achieves higher results than both algorithms.