Machine Learning to Predict Survival in Small Cell Lung Cancer: A Pilot Study

Creative Commons License


ESTRO 2021, Madrid, Spain, 31 August 2021 identifier

  • Publication Type: Conference Paper / Summary Text
  • City: Madrid
  • Country: Spain
  • Eskisehir Osmangazi University Affiliated: Yes


Purpose or Objective Small-cell lung cancer (SCLC) represents about 15% of all lung cancers and is marked by an exceptionally high proliferative rate, strong predilection for early metastasis and poor prognosis. Although multiple treatment modalities are applied the median overall survival (OS) is 16 to 20 months for limited – SCLC (1). A standard treatment based on the TNM staging system may not be suitable for every patient. Identifying patients at high risk of recurrence and high mortality due to the disease is also valuable in guiding treatment. Therefore, in this complex and heterogeneous disease group, it is important to evaluate prognosis in a personalized manner and plan treatment accordingly. The aim of the study is to predict OS with machine learning in limited- SCLC. Materials and Methods The study included 86 cases diagnosed with limited- SCLC from 2007 to 2018. In the prediction of OS, the following 25 variables were evaluated: age, gender, Karnofsky Performance score (KPS), body mass index (BMI), smoking history, presence of chronic obstructive pulmonary disease (COPD), tumor localization, tumor size, lymph node site, lymph node involvement (single level/multilevel), T stage, N stage, TNM stage, presence of concurrent chemotherapy (CT), concurrent CT scheme, number of CT cycles before RT, GTV, PTV, total RT dose, RT fraction dose, prognostic nutritional index (PNI), pretreatment serum albumin and hemoglobin values, neutrophil lymphocyte ratio (NLR), and advanced lung cancer inflammation index (ALI). For the prediction, the ML algorithms of logistic regression, multilayer perceptron classifier (MLP), eXtreme gradient boosting (XGB) classifier, support vector clustering (SVC), random forest classifier (RFC), and Gaussian Naive Bayes (GNB) were used. As training-test data rates, 80%-20% were selected. Results Patient and tumor characteristics are given in Table-1. Median RT dose was 54 (45-64) Gy. Median fractionation dose was 1.8 (2-3) Gy. Concurrent CT was applied to 68 cases, and the most commonly used CT scheme was cisplatin + etoposide. Out of 25 variables,13 variables affecting OS were selected using the permutation feature importance method. Important variables were; gender, PTV, pretreatment serum albumin and hemoglobin values, NLR, BMI, KPS, RT fraction dose, number of CT cycles before RT, T stage, tumor localization, presence of concurrent CT and COPD respectivly. At the median 23-month follow-up, 54 cases died due to cancer. Median OS was 21 (5-125) months. The algorithm with the highest accuracy was found to be SVC (Accuracy rate: 0.88, Confidence Interval: 0.74-1, ROC AUC: 0.83, sensitivity: 92%, specificity :75%). ROC AUC graph is given in Figure-1.

Conclusion Considering high treatment costs, potential serious toxicity, the harm of early progression, and low survival in cases of ineffective treatment, machine learning-based predictive systems are promising.