A Comparison of Data Imputation Methods Utilizing Machine Learning for a New IoT System Platform


KALAY S., ÇİNAR E., SARIÇİÇEK İ.

2022 8th International Conference on Control, Decision and Information Technologies (CoDIT), Türkiye, 17 Mayıs 2022 identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/codit55151.2022.9804113
  • Basıldığı Ülke: Türkiye
  • Eskişehir Osmangazi Üniversitesi Adresli: Evet

Özet

IoT systems are being used widely place in manufacturing. The volume of the sensor data in these systems is significant. In real-life scenarios, missing sensor data can cause problems, especially for data-driven machine learning (ML) models. The gaps due to missing sensor data should be handled before employing machine learning models. The common practices are to remove the missing data completely or apply simple arithmetic operations. However, there are more sophisticated approaches in the literature that can be applied to these real-time IoT systems considering the native data characteristics. This study compares the performance of regression-based ML algorithms missing data imputation methods such as Support Vector Regression (SVR), Decision Tree Regression (DTR), Ridge Regression, K-Nearest Neighbors Regression (KNN), MissForest (MF), and XGBoost Regression (XGB). Missing data in different positions and proportions are created utilizing experimentally collected timeseries sensor data from a newly developed IoT system platform. The initial work based on the ML models is presented on these datasets together with an overview of the IoT system architecture. The average RMSE and R-2 values of the six ML models showed that the Ridge Regression outperforms the other ML models for the missing data imputation.