TransPath: Learning Heuristics For Grid-Based Pathfinding via Transformers

Overview

Heuristic search algorithms, e.g. A*, are the commonly used tools for pathfinding on grids, i.e. graphs of regular structure that are widely employed to represent environments in robotics, video games etc. Instance-independent heuristics for grid graphs, e.g. Manhattan distance, do not take the obstacles into account and, thus, the search led by such heuristics performs poorly in the obstacle-rich environments. To this end, we suggest learning the instance-dependent heuristic proxies that are supposed to notably increase the efficiency of the search. The first heuristic proxy we suggest to learn is the correction factor, i.e. the ratio between the instance independent cost-to-go estimate and the perfect one (computed offline at the training phase). Unlike learning the absolute values of the cost-to-go heuristic function, which was known before, when learning the correction factor the knowledge of the instance-independent heuristic is utilized. The second heuristic proxy is the path probability, which indicates how likely the grid cell is lying on the shortest path. This heuristic can be utilized in the Focal Search framework as the secondary heuristic, allowing us to preserve the guarantees on the bounded sub-optimality of the solution. We learn both suggested heuristics in a supervised fashion with the state-of-the-art neural networks containing attention blocks (transformers). We conduct a thorough empirical evaluation on a comprehensive dataset of planning tasks, showing that the suggested techniques i) reduce the computational effort of the A* up to a factor of 4x while producing the solutions, which costs exceed the costs of the optimal solutions by less than 0.3% on average; ii) outperform the competitors, which include the conventional techniques from the heuristic search, i.e. weighted A*, as well as the state-of-the-art learnable planners.

Method

In this work we are interested in two variants of the pathfinding problem. The first variant asks to find a valid path on a grid, without specifying any constraints on the cost of the path, VP-PROBLEM. The second variant assumes that a suboptimality bound, w ≥ 1, is specified and the task is to find a path whose cost does not exceed the cost of the optimal path by more than a factor of w, BSP-PROBLEM. The solvers that we suggest for both problems share their structure. Each of them is composed of the two building blocks. First, a deep neural network is used to process the input grid and to predict the values of the heuristic function that will be used later. Second, a heuristic search algorithm is invoked that utilizes the heuristic data from the neural network. The neural network used for VP-PROBLEM and BSPPROBLEM has the same architecture; however, in each case, the output heuristic is different (as the neural network was trained using different supervision). The heuristic search algorithm is also different. For solving VP-PROBLEM, we utilize WA*, while for BSP-PROBLEM – Focal Search (FS).

Heuristics learned

The first type of the heuristic is the correction factor (cf) , which is defined as the ratio of the value of the available instance-independent heuristicto the value of the perfect heuristic: cf(n) = h(n)/h*(n). We suggest plugging the predicted cf-values to the WA* algorithm as shown in Alg. 1 (black + blue code fragments). I.e., the f-value of each node is computed as f(n) = g(n) + h(n)/cf(n). This can be though of as running WA* that uses individual weights for different search nodes instead of a single constant weight. As there is no theoretical bound on the error of predicting cf-values by the neural network, the resultant search algorithm provides no guarantees on cost of the solution.

The second suggested heuristic is originally intended to serve as the secondary heuristic for the Focal Search algorithm, h_FOCAL, which is tailored to solve the BSP-PROBLEM. Intuitively, we want from h_FOCAL to discriminate between the nodes that are likely to yield rapid progress toward the goal and the nodes that are not. To this end, we suggest assigning (and learning to predict) a value to each grid cell that tells us how likely it is that this cell is lying on the shortest path between start and goal.

Neural network architecture

The neural network for learning cf-values and pp-values has the same architecture, however the input is slightly different. For pp-values the input contains the grid, represented as a binary image and the start-goal matrix, which has the same size as the grid and contains the values of 1 only for start and goal, while all other pixels are zeroes. For cf-values this matrix contains only one non-zero element – the goal one.

The following figure depicts the suggested neural network architecture: a) Design of the whole model. CNN-encoder is used to produce local features which are further fed into the transformer blocks to catch the long-range dependencies between the features. The resulting representation is passed through the CNN-decoder to produce output values. b) Architecture of the ResNet block. c) Architecture of the Transformer block.

Examples

The following figure shows some examples of how looks the predicted matrices, obtained from the neural network, how the suggested approaches works, as well as the work of other baseline approaches. Black cells are the obstacles, red cells correspond to the found trajectories, green cells correspond to the nodes that were expanded by the algorithm during the search process.