MARTES: Multi-Agent Reinforcement Learning Training Environment for Scheduling

Title:MARTES: Multi-Agent Reinforcement Learning Training Environment for Scheduling

Authors:Mario Andreu Villar, Karen Yadira Lliguin León, Jordi Arjona Aroca, Arántzazu López-Larrainzar Salazar, Gerardo Minella and José Manuel Bernabeu Aubán

Conference:SAC_2025

Tags:actor and critic networks, agent reinforcement learning, AI, artificial intelligence, deep reinforcement learning, dispatching rules, earliest deadline first, first come first served, hybrid flow shop, hybrid flow shop scheduling problem, hybrid flowshop, machine learning, MARL, marl environment, MAS, ML, multi agent reinforcement learning, multi-agent planning, multi-agent systems, multi-criterion optimization and decision-making, planning under uncertainty, production scheduling, reinforcement learning, reinforcement learning training environment, reward function, RL, scheduling, scheduling problem, shortest job first, smart factories and supply chain management

Abstract:

Digitization fostered by the Industry 4.0 paradigm and smart factories leads to more connectivity and data abundance but also to a more dynamic industrial environment that makes scheduling an even harder problem. Large factories with complex configurations like Hybrid Flow-Shops (HFS) cannot rely on centralized, reactive, and non-adaptive heuristics, or metaheuristics that produce high-quality schedules but are time-expensive. We propose MARTES, a Multi-Agent Reinforcement Learning (MARL) Training Environment for Scheduling. In this work, MARTES trains models to be used in HFS scenarios. The resulting models decide among different dispatching rules to select what job to process next. The results show that exploiting MARTES models yields high-quality schedulings, outperforming traditional dispatching rules like First Come First Serve, Earliest Deadline First, or Shortest Job First by even a 26.4%, increasing the deadlines met by jobs in more than 30%, and improving tardiness by even a 50.5% in time-constrained scenarios. MARTES models can also compete in performance with heuristics as NEH, and metaheuristics as genetic (GA) or iterative greedy (IG) algorithms, differing in less than a 1% in makespan results for large instances. Time-wise, NEH can be up to 2 orders of magnitude slower than MARTES models' training times. GA or IG execution times can be similar to MARTES models' training times but require additional executions when changes occur on the factory pipeline, unlike MARTES models.