NON-SPEECH ENVIRONMENTAL SOUND CLASSIFICATION USING SVMS WITH A NEW SET OF FEATURES


Uzkent B., Barkana B. D., ÇEVİKALP H.

INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, cilt.8, ss.3511-3524, 2012 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 8
  • Basım Tarihi: 2012
  • Dergi Adı: INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Sayfa Sayıları: ss.3511-3524
  • Anahtar Kelimeler: Environmental sound classification, Feature extraction, Mel-frequency cepstral coefficients (MFCCs), Support vector machines, Radial basis function (RBF) neural network, SUPPORT VECTOR MACHINES, RECOGNITION
  • Eskişehir Osmangazi Üniversitesi Adresli: Evet

Özet

Mel Frequency Cepstrum Coefficients (MFCCs) are considered as a method of stationary/pseudo-stationary feature extraction. They work very well for the classification of speech and music signals. MFCCs have also been used to classify non-speech sounds for audio surveillance systems, even though MFCCs do not completely reflect the time-varying features of non-stationary non-speech signals. We introduce a new 2D-feature set, used with a feature extraction method based on the pitch range (PR) of non-speech sounds and the Autocorrelation Function. We compare the classification accuracies of the proposed features of this new method to MFCCs by using Support Vector Machines (SVMs) and Radial Basis Function Neural Network classifiers. Non-speech environmental sounds: gunshot, glass breaking, scream, dog barking, rain, engine, and restaurant noise, were studied. The new feature set provides high accuracy rates when used as a classifier. Its usage with MFCCs significantly improves the accuracy rates of the given classifiers in the range of 4% to 35% depending on the classifier used, suggesting that both feature sets are complementary. SVM classifier using the Gaussian kernel provided the highest accuracy rates among the classifiers used in this study.