Condition monitoring is a part of the predictive maintenance approach applied to detect and prevent unexpected equipment failures by monitoring machine conditions. Early detection of equipment failures in industrial systems can greatly reduce scrap and financial losses. Developed sensor data acquisition technologies allow for digitally generating and storing many types of sensor data. Data-driven computational models allow the extraction of information about the machine's state from acquired sensor data. The outstanding generalization capabilities of deep learning models have enabled them to play a significant role as a data-driven computational fault model in equipment condition monitoring. A challenge of fault detection applications is that single-sensor data can be insufficient in performance to detect equipment anomalies. Furthermore, data in different domains can reveal more prominent features depending on the fault type, but may not always be obvious. To address this issue, this paper proposes a multi-modal sensor fusion-based deep learning model to detect equipment faults by fusing information not only from different sensors but also from different signal domains. The effectiveness of the model's fault detection capability is shown by utilizing the most commonly encountered equipment types in the industry, such as electric motors. Two different sensor types' raw time domain and frequency domain data are utilized. The raw data from the vibration and current sensors are transformed into time-frequency images using short-time Fourier transform (STFT). Then, time-frequency images and raw time series data were supplied to the designed deep learning model to detect failures. The results showed that the fusion of multi-modal sensor data using the proposed model can be advantageous in equipment fault detection.