Area under the ROC curve estimation based on ranked set sampling via genetic algorithm

Gürer, Özge; Kılıç, Adil; Güven, GAMZE; Şenoğlu, Birdal

doi:10.25259/jksus_1138_2025

Area under the ROC curve estimation based on ranked set sampling via genetic algorithm

Gürer Ö., Kılıç A., Güven G., Şenoğlu B.

JOURNAL OF KING SAUD UNIVERSITY SCIENCE, cilt.38, sa.4, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 38 Sayı: 4
Basım Tarihi: 2026
Doi Numarası: 10.25259/jksus_1138_2025
Dergi Adı: JOURNAL OF KING SAUD UNIVERSITY SCIENCE
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, BIOSIS, zbMATH, Academic Search Ultimate (EBSCO)
Anahtar Kelimeler: Genetic algorithm, Maximum likelihood, Monte carlo simulation, Ranked set sampling, Receiver operating characteristic curve
Eskişehir Osmangazi Üniversitesi Adresli: Evet

Özet

In this study, the problem of estimating the area under the receiver operating characteristic (ROC) curve, a widely used accuracy index in the context of medical diagnosis, is addressed under non-normality. Instead of using transformation methods such as Box-Cox, the original data is used when test scores are assumed to follow generalized logistic (GL) distribution which can effectively handle positively skewed, negatively skewed and symmetric data. In selecting the sampling units, ranked set sampling (RSS) method is used as an alternative to the conventional simple random sampling (SRS) method due to its known advantage in improving the efficiency of an estimator. In estimation phase, genetic algorithm (GA) based maximum likelihood (ML) is utilized since the likelihood equations involve nonlinear functions of distribution parameters. Unlike the classical GA, here we use a data driven search space as an efficient alternative to the fixed search space. The performances of the proposed AUC estimators are assessed in term of bias, efficiency and robustness criteria via an extensive Monte Carlo simulation study. The performances of RSS based AUC estimators are also evaluated under imperfect ranking conditions. Finally, the proposed methodology is applied to a diabetes data set to demonstrate the practical implementation of it.