The Impact of Features and Preprocessing on Automatic Text Summarization

Bal, Salih; ŞORA GÜNAL, EFNAN

The Impact of Features and Preprocessing on Automatic Text Summarization

ROMANIAN JOURNAL OF INFORMATION SCIENCE AND TECHNOLOGY, cilt.25, sa.2, ss.117-132, 2022 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 25 Sayı: 2
Basım Tarihi: 2022
Dergi Adı: ROMANIAN JOURNAL OF INFORMATION SCIENCE AND TECHNOLOGY
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Sayfa Sayıları: ss.117-132
Anahtar Kelimeler: Computational linguistics, automatic text summarization, feature extraction, feature selection, machine learning, preprocessing
Eskişehir Osmangazi Üniversitesi Adresli: Evet

Özet

Automatic text summarization obtains a shortened and informative version of a given text without manual intervention based on specific features, preprocessing methods, and decision mechanisms. This paper aims to thoroughly analyze the impact of common features and preprocessing techniques on the performance of automatic text summarization, particularly in the Turkish language. Also, a new distinctive feature based on latent semantic analysis is proposed as another contribution. Two datasets consisting of a total of 120 documents and 1,466 sentences were used for the analysis. Two different success metrics were utilized to assess the performance of automatic text summarization. A set of comprehensive experimental studies revealed the optimal feature subset and the most useful preprocessing methods that can improve the summarization performance. Moreover, it has been verified that the proposed feature further improves the performance.