Machine Learning-Based Classification of Soil Parent Materials Using Elemental Concentration and Vis-NIR Data


İnci Y., Bilgili A. V., Gündoğan R., GÖZÜKARA G., Karadağ K., Tenekeci M. E.

Sensors, cilt.24, sa.16, 2024 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 24 Sayı: 16
  • Basım Tarihi: 2024
  • Doi Numarası: 10.3390/s24165126
  • Dergi Adı: Sensors
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Aerospace Database, Aqualine, Aquatic Science & Fisheries Abstracts (ASFA), Biotechnology Research Abstracts, CAB Abstracts, Communication Abstracts, Compendex, INSPEC, MEDLINE, Metadex, Veterinary Science Database, Directory of Open Access Journals, Civil Engineering Abstracts
  • Anahtar Kelimeler: classification, ICP-OES, soil science, Vis-NIR, XRF
  • Eskişehir Osmangazi Üniversitesi Adresli: Evet

Özet

In soil science, the allocation of soil samples to their respective origins holds paramount significance, as it serves as a crucial investigative tool. In recent times, with the increasing use of proximal sensing and advancements in machine-learning techniques, new approaches have accompanied these developments, enhancing the effectiveness of soil utilization in soil science. This study investigates soil classification based on four parent materials. For this purpose, a total of 59 soil samples were collected from 12 profiles and the vicinity of each profile at a depth of 0–30 cm. Surface soil samples were analyzed for elemental concentrations using X-Ray fluorescence (XRF) and inductively coupled plasma–optical emission spectrometry (ICP-OES) and soil spectra using a visible near-infrared (Vis-NIR) spectrometer. Soil samples collected from soil profiles (12 soil samples) and surface (47 soil samples) were used to classify parent materials using machine learning-based algorithms such as Support Vector Machine (SVM), Ensemble Subspace k-Near Neighbor (ESKNN), and Ensemble Bagged Trees (EBTs). Additionally, as a validation of the classification techniques, the dataset was subjected to five-fold cross-validation and independent sample set splitting (80% calibration and 20% validation). Evaluation metrics such as accuracy, F score, and G mean were used to evaluate prediction performance. Depending on the dataset and algorithm used, the classification success rates varied between 70% and 100%. Overall, the ESKNN (99%) produced better results than other classification methods. Additionally, Relief algorithms were employed to identify key variables for each dataset (ICP-OES: CaO, Fe2O3, Al2O3, MgO, and MnO; XRF: SiO2, CaO, Fe2O3, Al2O, and MnO; Vis-NIR: 567, 571, 572, 573, and 574 nm). Subsequent soil reclassification using these reduced variables revealed reduced accuracies using Vis-NIR data, with ESKNN still yielding the best results.