Double Deep Q-Network-Based Solution for the Dynamic Electric Vehicle Routing Problem


Taş M. B. H., ÖZKAN K., SARIÇİÇEK İ., YAZICI A.

Applied Sciences (Switzerland), cilt.16, sa.1, 2026 (SCI-Expanded, Scopus) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 16 Sayı: 1
  • Basım Tarihi: 2026
  • Doi Numarası: 10.3390/app16010278
  • Dergi Adı: Applied Sciences (Switzerland)
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
  • Anahtar Kelimeler: deep reinforcement learning, Double Deep Q-Network, dynamic requests, electric vehicle routing problem, energy-aware routing
  • Eskişehir Osmangazi Üniversitesi Adresli: Evet

Özet

The Dynamic Electric Vehicle Routing Problem (D-EVRP) presents a framework that requires electric vehicles to meet demand with limited energy capacity. When dynamic demand flows and charging requirements are considered together, traditional methods cannot provide sufficient adaptation for real-time decision-making. Therefore, a learning-based approach was chosen to ensure that decision-making processes respond quickly to changing conditions. The solution utilizes a model with a Double Deep Q-Network (DDQN) architecture and a discrete valuation structure. Prioritized Experience Replay (PER) was implemented to increase model stability, allowing infrequent but effective experiments to contribute more to the learning process. The state representation is constructed using the vehicle’s location, battery level, load status, and current customer demands. Scalability is ensured by dividing customer locations into clusters using the K-means method, with each cluster handled by an independent representative. The approach was tested with real-world road data obtained from the Meşelik Campus of Osmangazi University in Eskişehir. Experiments conducted under different demand levels and data sizes have shown that the PER-assisted DDQN structure produces more stable and shorter route lengths in dynamic scenarios, but random selection, greedy method and genetic algorithm experience significant performance losses as dynamicity increases.