Tags:CNN, MCTS, Reinforcement learning and Sokoban
Abstract:
This paper focuses on the application of reinforcement learning methods to solve the game of Sokoban. While the game is relatively easy for humans to solve, it poses a significant challenge for computer algorithms due to the irreversible nature of specific moves. To predict which actions will lead to such undesirable states is often difficult for a learning agent - a common problem in tasks requiring planning. To address this issue, we propose using a Monte-Carlo tree search (MCTS) algorithm and a heuristic CNN specially trained to distinguish between undesirable, neutral, and desired game states. We experimented with different heuristic variations used to solve the game and compared them against each other. We employed MCTS in two different setups: one with a CNN trained using data obtained during the solving process and one without such training. We also varied the number of rollouts for each move in MCTS and compared the results. The general aim is to improve the performance of learning agents in tasks that require planning to avoid unwanted states.
Solving Sokoban Game with a Heuristic for Avoiding Dead-End States