The Impact of Features and Preprocessing on Automatic Text Summarization


Bal S., ŞORA GÜNAL E.

ROMANIAN JOURNAL OF INFORMATION SCIENCE AND TECHNOLOGY, vol.25, no.2, pp.117-132, 2022 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 25 Issue: 2
  • Publication Date: 2022
  • Journal Name: ROMANIAN JOURNAL OF INFORMATION SCIENCE AND TECHNOLOGY
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Page Numbers: pp.117-132
  • Keywords: Computational linguistics, automatic text summarization, feature extraction, feature selection, machine learning, preprocessing
  • Eskisehir Osmangazi University Affiliated: Yes

Abstract

Automatic text summarization obtains a shortened and informative version of a given text without manual intervention based on specific features, preprocessing methods, and decision mechanisms. This paper aims to thoroughly analyze the impact of common features and preprocessing techniques on the performance of automatic text summarization, particularly in the Turkish language. Also, a new distinctive feature based on latent semantic analysis is proposed as another contribution. Two datasets consisting of a total of 120 documents and 1,466 sentences were used for the analysis. Two different success metrics were utilized to assess the performance of automatic text summarization. A set of comprehensive experimental studies revealed the optimal feature subset and the most useful preprocessing methods that can improve the summarization performance. Moreover, it has been verified that the proposed feature further improves the performance.