TRAT: Tracking by attention using spatio-temporal features


Creative Commons License

Saribas H., ÇEVİKALP H., Köpüklü O., UZUN B.

Neurocomputing, cilt.492, ss.150-161, 2022 (SCI-Expanded) identifier identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 492
  • Basım Tarihi: 2022
  • Doi Numarası: 10.1016/j.neucom.2022.04.043
  • Dergi Adı: Neurocomputing
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Applied Science & Technology Source, Biotechnology Research Abstracts, Compendex, Computer & Applied Sciences, EMBASE, INSPEC, zbMATH
  • Sayfa Sayıları: ss.150-161
  • Anahtar Kelimeler: Tracking, Two-stream network, 3D CNN, Feature aggregation module, Channel attention, Temporal features, CORRELATION FILTERS
  • Eskişehir Osmangazi Üniversitesi Adresli: Evet

Özet

© 2022 Elsevier B.V.Robust object tracking requires knowledge of tracked objects’ appearance, motion and their evolution over time. Although motion provides distinctive and complementary information especially for fast moving objects, most of the recent tracking architectures primarily focus on the objects’ appearance information. In this paper, we propose a two-stream deep neural network tracker that uses both spatial and temporal features. Our architecture is developed over ATOM tracker and contains two backbones: (i) 2D-CNN network to capture appearance features and (ii) 3D-CNN network to capture motion features. The features returned by the two networks are then fused with attention based Feature Aggregation Module (FAM). Since the whole architecture is unified, it can be trained end-to-end. The experimental results show that the proposed tracker TRAT (TRacking by ATtention) achieves the state-of-the-art performance on most of the benchmarks and it significantly outperforms the baseline ATOM tracker. The source code and pretrained models can be found at https://github.com/Hasan4825/TRAT.