A gradient boosting-based mortality prediction model for COVID-19 patients


Keser S., KESKİN K.

Neural Computing and Applications, cilt.35, sa.33, ss.23997-24013, 2023 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 35 Sayı: 33
  • Basım Tarihi: 2023
  • Doi Numarası: 10.1007/s00521-023-08997-w
  • Dergi Adı: Neural Computing and Applications
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Applied Science & Technology Source, Biotechnology Research Abstracts, Compendex, Computer & Applied Sciences, Index Islamicus, INSPEC, zbMATH
  • Sayfa Sayıları: ss.23997-24013
  • Anahtar Kelimeler: Clustering-based under-sampling, COVID-19, Gradient-based boosting machines, Machine learning, Random under-sampling, SMOTE
  • Eskişehir Osmangazi Üniversitesi Adresli: Evet

Özet

The COVID-19 pandemic has been a global public health concern since March 11, 2020. Healthcare systems struggled to meet patients’ growing needs for diagnosis, treatment, and care. As healthcare industries struggled to cope with the overwhelming demands, advanced intelligence and computing technologies have become essential. Artificial intelligence techniques have become essential for identifying and triaging patients, predicting disease severity, and detecting outcomes. The aim of the paper is to propose a gradient boosting-based model to predict the mortality of COVID-19 patients and to improve the prediction accuracy by incorporating resampling strategies. A real COVID-19 data that includes patients’ travel, health, geographical, and demographic information is obtained from a public repository. The dataset used in the study has the class imbalance problem, and several approaches are applied to solve the problem. In this study, a gradient boosting-based model for predicting the mortality of COVID-19 patients is proposed. This approach incorporates resampling strategies, such as synthetic minority oversampling technique (SMOTE), random under-sampling, and clustering-based under-sampling, to address the imbalanced class distribution problem in the dataset. Then, gradient boosting machines (GBM) such as extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), and categorical boosting (CatBoost) are analyzed in terms of accuracy and computational time. Random search method is used to find the optimal hyper-parameters for the algorithms. A stacking-based hybrid model that combines the XGBoost, LightGBM, and CatBoost algorithms was used for comparison in the experiments. In the experiments, the factors that can influence the mortality of COVID-19 patients are investigated. And, it is found that the age of the patient, whether the patient belonged to Wuhan, the difference between when they first noticed symptoms and when they visited the hospital (in days) affect the mortality. By utilizing over/under-sampling approaches, we ameliorated the concern of class imbalance. XGBoost, LightGBM, and CatBoost are effectively analyzed in terms of various performance metrics to determine the suitable GBM for the proposed system. The experimental results revealed that the stacking-based hybrid model performs well with the balanced dataset provided by SMOTE. CatBoost produces superior results for a balanced dataset with random under-sampling and clustering-based under-sampling. The main focus of the study is to propose a gradient boosting-based model for predicting the mortality of COVID-19 patients. This study also emphasizes the importance of addressing the imbalanced class distribution problem in the dataset and incorporates resampling strategies to improve the prediction accuracy. Our promising result confirms the success of the proposed system in predicting mortality of COVID-19 disease.