The Impact of Features and Preprocessing on Automatic Text Summarization


Bal S., ŞORA GÜNAL E.

ROMANIAN JOURNAL OF INFORMATION SCIENCE AND TECHNOLOGY, cilt.25, sa.2, ss.117-132, 2022 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 25 Sayı: 2
  • Basım Tarihi: 2022
  • Dergi Adı: ROMANIAN JOURNAL OF INFORMATION SCIENCE AND TECHNOLOGY
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Sayfa Sayıları: ss.117-132
  • Anahtar Kelimeler: Computational linguistics, automatic text summarization, feature extraction, feature selection, machine learning, preprocessing
  • Eskişehir Osmangazi Üniversitesi Adresli: Evet

Özet

Automatic text summarization obtains a shortened and informative version of a given text without manual intervention based on specific features, preprocessing methods, and decision mechanisms. This paper aims to thoroughly analyze the impact of common features and preprocessing techniques on the performance of automatic text summarization, particularly in the Turkish language. Also, a new distinctive feature based on latent semantic analysis is proposed as another contribution. Two datasets consisting of a total of 120 documents and 1,466 sentences were used for the analysis. Two different success metrics were utilized to assess the performance of automatic text summarization. A set of comprehensive experimental studies revealed the optimal feature subset and the most useful preprocessing methods that can improve the summarization performance. Moreover, it has been verified that the proposed feature further improves the performance.