CLASSIFICATION OF SPAM E-MAILS


ERGÜL B., ALTIN YAVUZ A.

6. INTERNATIONAL MEDITERRANEAN CONGRESS, ROMA, İtalya, 13 - 15 Ağustos 2024, cilt.1, ss.660-663

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Cilt numarası: 1
  • Basıldığı Şehir: ROMA
  • Basıldığı Ülke: İtalya
  • Sayfa Sayıları: ss.660-663
  • Eskişehir Osmangazi Üniversitesi Adresli: Evet

Özet

E-mail is one of the fastest and most professional ways to send messages from one place to another around the world. However, the increase in email usage has also led to an increase in the number of messages received in inboxes. With the rise in the number of messages, various significant issues arise, such as the recipient receiving a large number of messages, identity theft, loss of vital information, and network damage. The recipient cannot avoid these types of messages, which come in various forms, especially as advertisements and other harmful message types. The term used for these emails is spam. Classification is the process of dividing data into specific classes or categories. This process aims to create a systematic arrangement by grouping objects with similar characteristics together. Classification is widely used in the field of machine learning and is supported by many techniques. Classification techniques are a set of predictive computations used to assign data to predefined categories in machine learning. These techniques classify new incoming data by learning from training data. Among these techniques are Support Vector Machines (SVM), Naive Bayes, Decision Trees, Random Forest
Algorithm, and K-Nearest Neighbors (KNN).
WEKA is a widely used open-source software for data mining and machine learning applications. Classification is one of the most important functions of WEKA, allowing users to categorize data into specific categories.
In this study, the aim is to classify a dataset consisting of emails labelled as spam or not spam using the open-source software WEKA. By utilizing the dataset, comparison criteria commonly used in the literature have been applied to evaluate various techniques, and a determination has been made regarding which technique performs better for the spam dataset.