Heuristic search algorithms, e.g. A*, are the commonly used tools for pathfinding on grids, i.e. graphs of regular structure that are widely employed to represent environments in robotics, video games etc. Instance-independent heuristics for grid graphs, e.g. Manhattan distance, do not take the obstacles into account and, thus, the search led by such heuristics performs poorly in the obstacle-rich environments. To this end, we suggest learning the instance-dependent heuristic proxies that are supposed to notably increase the efficiency of the search. The first heuristic proxy we suggest to learn is the correction factor, i.e. the ratio between the instance independent cost-to-go estimate and the perfect one (computed offline at the training phase). Unlike learning the absolute values of the cost-to-go heuristic function, which was known before, when learning the correction factor the knowledge of the instance-independent heuristic is utilized. The second heuristic proxy is the path probability, which indicates how likely the grid cell is lying on the shortest path. This heuristic can be utilized in the Focal Search framework as the secondary heuristic, allowing us to preserve the guarantees on the bounded sub-optimality of the solution. We learn both suggested heuristics in a supervised fashion with the state-of-the-art neural networks containing attention blocks (transformers). We conduct a thorough empirical evaluation on a comprehensive dataset of planning tasks, showing that the suggested techniques i) reduce the computational effort of the A* up to a factor of 4x while producing the solutions, which costs exceed the costs of the optimal solutions by less than 0.3% on average; ii) outperform the competitors, which include the conventional techniques from the heuristic search, i.e. weighted A*, as well as the state-of-the-art learnable planners.

In this work we are interested in two variants of the pathfinding problem. The first variant asks to find a valid path on a
grid, without specifying any constraints on the cost of the
path, VP-PROBLEM. The second variant assumes that a suboptimality bound, w ≥ 1, is specified and the task is to find
a path whose cost does not exceed the cost of the optimal
path by more than a factor of w, BSP-PROBLEM.
The solvers that we suggest for both problems share their
structure. Each of them is composed of the two building
blocks. First, a deep neural network is used to process the
input grid and to predict the values of the heuristic function
that will be used later. Second, a heuristic search algorithm
is invoked that utilizes the heuristic data from the neural network. The neural network used for VP-PROBLEM and BSPPROBLEM has the same architecture; however, in each case,
the output heuristic is different (as the neural network was
trained using different supervision). The heuristic search algorithm is also different. For solving VP-PROBLEM, we utilize WA*, while for BSP-PROBLEM – Focal Search (FS).

The neural network for learning cf-values and pp-values has
the same architecture, however the input is slightly different.
For pp-values the input contains the grid, represented as a
binary image and the start-goal matrix, which has the same
size as the grid and contains the values of 1 only for start
and goal, while all other pixels are zeroes. For cf-values this
matrix contains only one non-zero element – the goal one.

The following figure depicts the suggested neural network architecture: a) Design of the whole model. CNN-encoder is used to produce local
features which are further fed into the transformer blocks to catch the long-range dependencies between the features. The
resulting representation is passed through the CNN-decoder to produce output values. b) Architecture of the ResNet block. c)
Architecture of the Transformer block.

The following figure shows some examples of how looks the predicted matrices, obtained from the neural network, how the suggested approaches works, as well as the work of other baseline approaches. Black cells are the obstacles, red cells correspond to the found trajectories, green cells correspond to the nodes that were expanded by the algorithm during the search process.

TBD