Tags:Deep Reinforcement Learning, Graph Clustering Model, Multiple Vehicles and Pickup and Delivery Problem
Abstract:
With the rapid growth of the on-demand logistics industry, large-scale pickup and delivery with late penalties problem has become widespread in various time-critical scenarios. This problem has proven to be an NP-hard problem. Hence, the computation time and resources required to solve it increase exponentially with the growth of size. As a result, it is burdensome for the exact algorithm and heuristic method to generate a high-quality solution instantly. Machine learning seems to be a possible option due to the advantage of offline training. However, it is difficult to solve large-scale problems due to the lengthy training time, heavy computational cost, and training instability. Thus, this paper proposes the two-stage learning-based method composed of the clustering stage and the routing stage to tackle this problem. The clustering stage builds upon the attention mechanism by introducing a graph embedding layer on the input, which captures relationships among customers and classifies them into different vehicles, while the routing stage adopts a well-trained model to generate a route for each vehicle. Experiments show that the model, trained on small-scale problems, generalizes well to larger-scale problems, and achieves superior performance compared with Google OR-Tools, with an extremely short computing time.
Two-Stage Learning-Based Method for Large-Scale Pickup and Delivery Services with Soft Time Windows