Area under the ROC curve estimation based on ranked set sampling via genetic algorithm


Creative Commons License

GÜRER Ö., Kılıç A., GÜVEN G., ŞENOĞLU B.

Journal of King Saud University - Science, cilt.38, sa.4, 2026 (SCI-Expanded, Scopus) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 38 Sayı: 4
  • Basım Tarihi: 2026
  • Doi Numarası: 10.25259/jksus_1138_2025
  • Dergi Adı: Journal of King Saud University - Science
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, BIOSIS, zbMATH
  • Anahtar Kelimeler: Genetic algorithm, Maximum likelihood, Monte carlo simulation, Ranked set sampling, Receiver operating characteristic curve
  • Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
  • Eskişehir Osmangazi Üniversitesi Adresli: Evet

Özet

In this study, the problem of estimating the area under the receiver operating characteristic (ROC) curve, a widely used accuracy index in the context of medical diagnosis, is addressed under non-normality. Instead of using transformation methods such as Box-Cox, the original data is used when test scores are assumed to follow generalized logistic (GL) distribution which can effectively handle positively skewed, negatively skewed and symmetric data. In selecting the sampling units, ranked set sampling (RSS) method is used as an alternative to the conventional simple random sampling (SRS) method due to its known advantage in improving the efficiency of an estimator. In estimation phase, genetic algorithm (GA) based maximum likelihood (ML) is utilized since the likelihood equations involve nonlinear functions of distribution parameters. Unlike the classical GA, here we use a data driven search space as an efficient alternative to the fixed search space. The performances of the proposed AUC estimators are assessed in term of bias, efficiency and robustness criteria via an extensive Monte Carlo simulation study. The performances of RSS based AUC estimators are also evaluated under imperfect ranking conditions. Finally, the proposed methodology is applied to a diabetes data set to demonstrate the practical implementation of it.