Deep learning (DL) techniques have been gaining ground for intelligent equipment/process fault diagnosis applications. However, employing DL methods for such applications comes with its technical challenges. The DL methods are utilized to extract features from raw data automatically, which leads up to its own complications in data preprocessing and/or feature engineering phases. Moreover, another difficulty arises when DL methods are employed utilizing single type of sensor data as the performance of a fault diagnosis application is hindered. To address these issues, we propose utilization of a deep residual network-based multi-sensory data fusion method. The method is established on time-frequency images obtained by short-time Fourier transform to diagnose machine faults. The experimental results demonstrate that the proposed model combining different types of measured signals can diagnose bearing conditions on machines more effectively compared to a single type of measured signal in terms of diagnostic accuracy.