Machine Learning-Driven Anemia Diagnosis: A Comparative Study Using Blood Biomarkers from Complete Blood Count Data


Xhepaliu A., Adar N., Leka M.

COMMUNICATIONS IN COMPUTER AND INFORMATION SCIENCE, cilt.2669, sa.1, ss.403-422, 2025 (Scopus)

Özet

Anemia is a prevalent hematological disorder characterized by reduced hemoglobin or red blood cell count. Manual diagnosis of anemia types such as iron deficiency anemia, thalassemia minor, and anemia of chronic disease typically requires specialized testing and expert interpretation, which can be time-consuming and costly. In this work, we develop and compare machine learning models to classify anemia based on routine blood test biomarkers. We leverage a large clinical dataset of complete blood count values such as HGB, HCT, RBC indices, and Ferritin from 9,711 patients. After preprocessing and feature selection using XGBoost importance, we retain six key features such as HGB, MCV, and Ferritin. We then train and evaluate multiple classification models: Random Forest, Decision Tree, K-Nearest Neighbors, Logistic Regression, and a Multilayer Perceptron (MLP) neural network. Model performance was evaluated using accuracy, precision, recall, and F1 score. Ensemble models, particularly Random Forest, achieved the highest accuracy and weighted F1 scores on the test set. While slightly trailing in performance, Logistic Regression emerged as the most cost-efficient model due to its minimal computational requirements and straightforward interpretability, which are key advantages for clinical settings where resources and transparency are critical. These findings demonstrate that machine learning applied to basic blood biomarkers can provide accurate, scalable, and accessible support for anemia detection and classification.