---
title: "Graph Neural Solver for Power Systems"
source_url: "https://hal.science/hal-02175989"
source: "HAL IJCNN PDF"
---

# Graph Neural Solver for Power Systems

Graph Neural Solver for Power Systems
             Balthazar Donon, Benjamin Donnot, Isabelle Guyon, Antoine Marot


      To cite this version:
     Balthazar Donon, Benjamin Donnot, Isabelle Guyon, Antoine Marot. Graph Neural Solver for Power Sys-
     tems. IJCNN 2019 - International Joint Conference on Neural Networks, Jul 2019, Budapest, Hungary. pp.1-8,
     ⟨10.1109/IJCNN.2019.8851855⟩. ⟨hal-02175989⟩


                                        HAL Id: hal-02175989
                              https://hal.science/hal-02175989v1
                                            Submitted on 6 Jul 2019


    HAL is a multi-disciplinary open access archive             L’archive ouverte pluridisciplinaire HAL, est des-
for the deposit and dissemination of scientific re-        tinée au dépôt et à la diffusion de documents scien-
search documents, whether they are published or not.       tifiques de niveau recherche, publiés ou non, émanant
The documents may come from teaching and research          des établissements d’enseignement et de recherche
institutions in France or abroad, or from public or pri-   français ou étrangers, des laboratoires publics ou
vate research centers.                                     privés.


                                              HAL Authorization
               Graph Neural Solver for Power Systems
                            Balthazar Donon                                            Benjamin Donnot
                       R&D department, RTE                                         R&D department, RTE
                & UPSud/INRIA Université Paris-Saclay                      & UPSud/INRIA Université Paris-Saclay
                             Paris, France                                              Paris, France
                   balthazar.donon@rte-france.com                              benjamin.donnot@rte-france.com

                             Isabelle Guyon                                              Antoine Marot
              TAU group of Lab. de Res. en Informatique                                 R&D department
                 UPSud/INRIA Universié Paris-Saclay                                           RTE
                            Paris, France                                                 Paris, France
                            iguyon@lri.fr                                         antoine.marot@rte-france.com


   Abstract—We propose a neural network architecture that            These load flow solvers use Newton-Raphson optimization
emulates the behavior of a physics solver that solves electric-      methods [18], [19] to iteratively satisfy Kirchhoff’s laws
ity differential equations to compute electricity flow in power      (conservation of energy) by reducing progressively the mis-
grids (so-called “load flow”). Load flow computation is a well
studied and understood problem, but current methods (based on        match between ingoing and outgoing power in every electrical
Newton-Raphson) are slow. With increasing usage expectations         node. Although load flow solvers are more accurate and
of the current infrastructure, it is important to find methods       better understood than neural networks, they are comparatively
to accelerate computations. One avenue we are pursuing in this       slower, leaving room for use of the latter for fast screening,
paper is to use proxies based on “graph neural networks”. In         in conjunction with load flow solvers [7], [8]. In particular,
contrast with previous neural network approaches, which could
only handle fixed grid topologies, our novel graph-based method,     speeding up computation would allow TSOs to perform more
trained on data from power grids of a given size, generalizes to     comprehensive security analyses, and thus increase the quality
larger or smaller ones. We experimentally demonstrate viability      of services or make a tighter use of existing infrastructure and
of the method on randomly connected artificial grids of size 30      reduce risks. This would lend itself to a probabilistic approach
nodes. We achieve better accuracy than the DC-approximation (a       of security analysis emphasizing rare events (see e.g. [10]).
standard benchmark linearizing physical equations) on random
power grids whose size range from 10 nodes to 110 nodes, the            While pioneer work in the area has demonstrated feasibility
scale of real-world power grids. Our neural network learns to        of the use of neural networks to estimate power flow, all
solve the load flow problem without overfitting to a specific        methods developed prior to our work exposed in this paper
instance of the problem.                                             are geared towards a given grid topology. They are dedicated
   Index Terms—Graph Neural Solver, Neural Solver, Graph             to one instance (or a small set of grid instances) and thus do
Neural Net, Power Systems
                                                                     not actually learn how to perform a general load flow on every
                                                                     grid topology.
             I. BACKGROUND & M OTIVATIONS                               Our work is in line with Donnot et al. in [9] who proposed a
                                                                     method capable of generalizing to a set of power grid topolo-
   TSOs (Transmission System Operators) such as RTE                  gies, which remain close to a reference topology. However this
(Réseau de Transport d’électricité) need to ensure the security   method is limited to small perturbations and cannot generalize
and resilience of power grids. By transporting electricity across    to completely different grids.
states, countries, or continents, they are vital components of          The main issue of former approaches is that they do not ex-
modern societies, playing the central economical and societal        ploit the graph structure of the data, and ignore the knowledge
role to supply power reliably to industries, services, and con-      of the underlying physics. As explained in [1], one should use
sumers. In particular they should avoid “blackouts”. Currently,      “relational inductive bias” to guide the learning process. Our
TSOs perform security analyses by using “load flow” solvers          proposed architecture aims to achieve “combinatorial general-
based on the physical equations of the system [13]. Such             ization” by using elementary learning blocks that have been
solvers compute the flows of electricity through each line of        laid out based on our knowledge and understanding of the load
a power grid using physical laws depending on:                       flow problem. We apply a novel class of algorithms combining
  • the power grid topology, i.e. the way the electrical nodes       deep learning and knowledge about graph structure: Graph
    are interconnected;                                              Neural Network. This class of artificial neural networks was
  • the amount and location of power being produced or               first introduced by F. Scarselli in [26] and further developed in
    consumed (so-called “injections”);                               [20] and [12]. The algorithms operate on network structures by
  • the physical properties of the power lines.                      iteratively propagating the influence of vertices through edges.
The architecture can be seen as a generalization of convolu-              Our neural network architecture was developed while keep-
tional neural networks to graph structures, by unfolding a finite      ing in mind a few constraints. First of all, we want an architec-
number of iterations. Theoretical properties have been further         ture that learns a strategy to solve any power flow, and does
developed in [15], [28].                                               not specialize to any specific instance of the problem. This
   Prior to our work, such methods have been successfully              concerns the number of electrical nodes, transmission lines,
applied to various problems that deal with graph structures,           productions or consumptions on the power grid. Therefore,
as well as problems that do not explicitly exhibit graph-like          the neural network needs to embed information about line
structures : classification of graphs [3], [5], [11], classification   interconnections, and make use of modular generic learning
of nodes [14], [17], and relational reasoning [25]. Recent             blocks. The amount of learning blocks, as well as their internal
work such as [2], [16], [21] unveil the emergence of hybrid            dimensions have to be independent from any power grid
approaches that rely on deep learning and structure knowledge.         specific characteristics.
   Recently, the use of fast neural solvers based on AI for               Secondly, while power grids are often represented as a graph
physics problems has begun to develop, as it could provide             of nodes (so-called “buses”) interconnected by power lines,
much faster tools for simulation and design for complex                the mathematical treatment is simplified by noting that the
problems. Computational Fluid Dynamics computations have               topological invariant is the set of lines not the set of nodes
been successfully accelerated by Tompson et al. in [27] by             (which can vary when line interconnections are changed).
replacing a process that always estimates the same function            Hence, we work on the dual graph structure of interconnected
but at different locations by a neural network. Ling et al.            power lines (See Figure 1).
applied Deep Learning to a Reynolds averaged turbulence                   Thirdly, we want to emulate the behavior of an AC power
modelling problem in [22]. It has also been applied to the             flow solver taking into account physical line characteristics,
Schrödinger equation with success by Mills at al. in [23].            although we restrict ourselves to power grids where line
Exploiting the graph structures of the physics problems, and           impedances are all equal. Since there are losses in the power
drawing inspiration from these efforts to model physics using          lines, the inflow via one side does not equal to the outflow via
deep learning seems to be a good direction towards fast neural         the other side. Therefore we have to distinguish between line
approximations that learn to solve any instance of a given             extremities and origins (denoted in Figure 1 by resp. “ex” and
problem, while not being dedicated to only one instance of it.         “or”). This choice can also be justified by the fact the the flow
   The main contribution of this paper is therefore to devise          through a power line is oriented. This forces us to consider
a neural network architecture allowing to predict accurately           multiple adjacency matrices, that are introduced below.
power flows, without specializing to a given grid topology. By
design, our proposed architecture achieves zero-shot learning
[24] on novel grid topologies, as confirmed experimentally:
After being trained on random grid topologies of constant size,
state-of-the-art prediction accuracy is attained on both smaller
and larger power grids (of any type). It is completely in line
with the approach developed in [1] as it combines the relational
structure between the power line and agnostic neural network
blocks that are intricately laid out.
                                                                       Fig. 1. Construction of the power lines graph - This schematics shows
   This paper is organized as follows. First we introduce              how one goes from the classical graph of electrical nodes to the equivalent
notations and concepts about the load flow problem, and                graph of transmission lines. In the first representation, the nodes are connected
develop our proposed Graph Neural Solver architecture. We              through electrical lines, while in the second, power lines are seen as vertices
                                                                       of a graph, and electrical nodes as edges. Our neural network architecture
then present three different experiments that gradually outline        considers the second representation. One should also notice the fact that each
the ability of our proposed architecture to generalize to power        power line has an origin (called or) and an extremity (called ex). For the sake
grid sizes that have never been encountered during training.           of consistency and understandability, this toy example will be used throughout
                                                                       the whole paper.
Finally, we talk about the limits of the current architecture and
give directions for future investigations towards an even more
robust neural solver for power systems.                                A. Notations
                                                                         This architecture takes as inputs 3 different types of vari-
                II. P ROPOSED M ETHODOLOGY                             ables:
  In this section we state the problem and introduce our                 • Injections X: The input vector X concatenates informa-
approach and notations. As illustrated with the toy example                 tion about the electrical power that is being produced and
of Figure 1, the problem is, given injections (productions                  consumed everywhere on the grid.
and consumptions) inj1 , inj2 , inj3 , to compute the flows                 Specifically, each production p ∈ P is defined by an
of electricity in all lines l1 , l2 , l3 , l4 . In what follows, for        active power infeed Pp (in MegaWatts) and a voltage
simplicity, all lines will be assumed to share the same physical            infeed Vp (in Volts). Therefore each production p ∈ P is
characteristics.                                                            defined by a 2-dimensional information.
    Similarly, a consumption c ∈ C is defined by an active            2D matrices: the input is a nin × din matrix and the output
    power consumption Pc (in MegaWatts) and a reactive                is a n × dout matrix. In what follows we will use compact
    power consumption Qc (in MegaVolt-Amps reactive).                 notations: F will denote a vectorial function applying the
    Each consumption c ∈ C is thus also defined by a 2-               same function F to a number of inputs (e.g. power flows or
    dimensional information.                                          injections) and A a function that is a right side multiplication
    We denote by nin the total number of injections: nin =            with the corresponding adjacency matrix A.
    |P| + |C|. Each injection has an information in din = 2              The overall workflow of the approach, which is described
    dimensions. Therefore, X ∈ Rnin ×din is a vector that             in Figures 5 and 6 is described in more details below. It con-
    concatenates all these injection characteristics.                 sists of 3 steps: Embedding, propagation, and decoding. The
  • Lines adjacency matrices Aor and Aex : Since we model             embedding step transforms input space in an abstract vector
    transmission lines as bipolar objects, we need to make a          space, taking into account the grid topology. The propagation
    distinction between the extremity and the origin of each          space implements a relaxation procedure to compute the flows
    transmission line. Each connection between two power              in a finite number of iterations unfolded in time, and the
    lines can thus be of four different types. Let ori and exi        decoding step transforms back results into our output space.
    be respectively the origin and the extremity of line i.                 a) Embedding: This step aims at embedding the initial
           Ai,j               if ori is connected to orj              information contained in each injection (din -dimensional) into
            or = 1                                              (1)
                                                                      a d-dimensional space. It applies the same neural network E :
                = −1          if ori is connected to exj        (2)   Rdin → Rd to each injection. It then proceeds to send this
                =0            otherwise                         (3)   information from the injections to the transmission lines they
           Ai,j
            ex = 1
                                    i
                              if ex is connected to or     j
                                                                (4)   are connected to. Mathematically, this consists in a right-side
                                    i                      j          matrix multiplication with the adjacency matrix Ainj .
                = −1          if ex is connected to ex          (5)
                =0            otherwise                         (6)                        H (0) = Ainj ◦ E(X)                        (13)
    Note that the choice of the polarity of the lines is
                                                                      The dimension d is a hyperparameter of our architecture. One
    arbitrary. Changing it only changes the sign of current
                                                                      should notice that E affects each of the injections similarly,
    flowing.
                                                                      while Ainj affects each of the d dimensions similarly. Since
  • Injections adjacency matrix Ainj : This matrix encodes
                                                                      we have two different types of injections (productions and
    the way injections (productions and consumptions) are
                                                                      consumptions), we use two different types of embedding func-
    connected to lines, and through which pole (origin or
                                                                      tions: Ep for productions and Ec for consumptions. This step
    extremity). Let inj i be the injection i (regardless of it
                                                                      is important for two reasons. First we want the information
    being a production or consumption).
                                                                      of both productions and consumptions to be compatible (they
         Ai,j
          inj = 1          if ori is connected to inj j         (7)   originally have different meanings and units), so they need
               = −1             i
                           if ex is connected to inj   j
                                                                (8)   to be embedded into a consistent latent space. Secondly, we
                                                                      experimentally observed that using a large embedding space
               =0          otherwise                            (9)   (d ≈ 100) provides a faster learning.
   The output we want to predict is the flows through the lines            b) Propagation: In this step, we iteratively update the
(both in Amps A and in MegaWatts M W ) at the origin and              latent state of each of the n power lines by performing latent
the extremity of every line, which we denote by vector Y ∈            leaps that depend on the value of their direct neighbors.
Rn×dout , where for each of the n lines, we try to predict flow       Because of the bipolar nature of power lines, we consider sep-
information in dout = 4 dimensions.                                   arately the influence of neighbors connected to their origins,
   The system we are interested in emulating is therefore:            and neighbors connected to their extremities. This is the reason
                                                                      why in Figures 6 and 5 we consider two separate entries into
                 Y = S(X, Aor , Aex , Ainj )                   (10)   functions L(k) . At each iteration k, the same neural network
B. Graph Neural Solver architecture                                   L(k) : Rd × Rd → Rd is used to update the embedding of
                                                                      each of the power lines. The aim of the right-side matrix
  Our neural network architecture can be written:                     multiplications by Aor and Aex is to sum the information
                 N N : Rnin ×din → Rn×dout                     (11)   of the direct neighbors connected to respectively the origin
                                                                      and the extremity of each power line. Moreover, this sum
                               X 7→ Ŷ                         (12)
                                                                      is weighted by ±1 depending on whether the neighbors are
where nin = |P| + |C| is the number of injections, n is               connected by their own origin or extremity.
the number of power lines in the Power Grid, din is the
dimensionality of the input information of each injection, and                 H (k+1) = H (k) + L(k) (Aor H (k) , Aex H (k) )        (14)
dout is the dimensionality of the output for each power line.                                    (k)                       (k)
                                                                                        ≡ (I + L       ◦ (Aor , Aex ))(H         )    (15)
In the toy example of Figure 1, we have nin = 3, n = 4,
din = 2 and dout = 4. The Graph Neural Solver operates on             for k ∈ {0, . . . , K − 1}, where I is the identity function.
                                                                                Fig. 4. Decoding step - The same decoding function D is applied to the
                                                                                embedding of each power line. There is no exchange between power lines.
                                                                                H (K) is a n × d (i.e. 4 × 5 here) latent matrix that is decoded into a n × dout
                                                                                (i.e. 4 × 4 here) output matrix.


                                                                                  The proposed architecture can be summed up by the fol-
                                                                                lowing function composition:
                                                                                     Ŷ =D ◦ (I + L(K−1) ◦ (Aor , Aor )) ◦ . . .                          (17)
                                                                                                          (0)
                                                                                           · · · ◦ (I + L         ◦ (Aor , Aor )) ◦ Ainj ◦ E(X)           (18)
Fig. 2. Embedding step - The input data is a nin × din (i.e. 3 × 2 here)
matrix (a). We first embed the din -dimensional information of each of the      C. Regarding the design of the architecture
nin injection into a d-dimensional space. This results in a nin × d matrix
(b). We then proceed to assign the information of each of the nin injections,
                                                                                  Our neural network architecture consists in K + 3 learning
to the n power lines they are respectively connected to (see Figure 6). We      blocks that are intricately laid out:
then end up with a nin × din (i.e. 4 × 5 here) matrix (c). This is based on
the toy example from Figure 1.                                                           Ep : Rdin → Rd                                                   (19)
                                                                                                  din         d
                                                                                         Ec : R         →R                                                (20)
                                                                                         (k)      d       d         d
                                                                                       L       :R ×R →R ,                k ∈ {0, . . . , K − 1}           (21)
                                                                                           D : Rd → Rdout                                                 (22)
                                                                                   One should also observe that there is no information ex-
                                                                                change between power lines except through a matrix multi-
                                                                                plication with the adjacency matrices. This point is key to
                                                                                the ability of our neural net to be compatible with various
                                                                                power grid shapes. There is no exchange between lines when
                                                                                the learning blocks are applied, only combinations between
Fig. 3. Propagation step - The embedding of each line is iteratively updated
                                                                                the d or din internal components of each of the n power line
depending on the embeddings of its direct neighbors. There is no direct         or nin injection. The learning blocks internal dimensions are
propagation between power lines 1 and 4.                                        independent from the power Grid it is working on. Thus it can
                                                                                perform inference and learn on a power grid of any size and
                                                                                shape.
    c) Decoding: This step consists in a simple decoding                           The computational complexity of inference for this archi-
from the embedding space to the output space. It applies the                    tecture is in O(Knd [lL d + 2n]), where K is the number
same function D : Rd → Rdout to each of the n power lines.                      of propagation steps, n the number of power lines, d the
There is no exchange of information between power lines.                        dimension of the lines latent embeddings and lL the depth of
                                                                                each block Lk . This complexity estimation does not exploit the
                             Ŷ = D(H (K) )                            (16)     sparsity of the adjacency matrices, and could thus be reduced.
Fig. 5. Architecture of the neural network - This schematic presents the way the different operations are laid out. The input X is taken as a regular input
to a neural network, while the adjacency matrices directly affect the architecture. (1) Round operations consist in right-side matrix multiplications, there is
no exchange between the d dimensions during these steps. Those operations are not learned, and are based on the adjacency matrices that are inputed. (2)
Rectangle operations consist in applying the same neural network for each of the n power lines or nin injections, there is no exchange between them. Those
are the neural networks that are actually learned during training.


Fig. 6. Full architecture of the neural network - This schematic offers a more precise overview of the neural network architecture, and unveils the way
the adjacency matrices actually impact the connections within the neural nets, based on the toy example from Figure 1. At the top of the Figure are displayed
the dimensions of the latent embeddings throughout the architecture. At the bottom are presented the different formulas of some of the main steps of the
architecture. For the sake of readability, we chose not to show the sign (±1) of the links created by the adjacency matrices. However, these signs can be
deduced from the bottom equations. The double brackets in these equations signify that we are dealing with 2D matrices.

                          III. E XPERIMENTS                                      consists in a percentage of error on the 10% of largest flows in
   In this section we first compare our architecture to a fully                  Amps in absolute value (per power line). This reflects the idea
connected neural net when the grid topology is the same during                   that when one tries to predict the flows through transmission
both training and testing. We then assess the generalization                     lines, it is most important to be accurate on extreme values
ability of our neural network to both larger and smaller power                   that can actually cause damage.
grids than those observed during training. Finally, we compare                      We used RTE’s proprietary load flow solver to compute
the computational requirements of our architecture to that of                    flows (given the power grid topology, and the set of injections)
a regular physics solver.                                                        to obtain the ground truth of predictions. Our neural network’s
   In our experiments, we optimized the `2 -loss (sum of square                  goal is to emulate the behavior of this solver.
errors) with regards to the normalized flows through both the                       As baseline, we compare every result to a standard reference
origin and the extremity of each line.                                           method in power systems: the DC-approximation, which is a
   Current intensities that induce Joule’s effect which can cause                linearization of the physical equations. One of our goals is to
potential damage to lines and other equipment are flows in                       beat the DC-approximation in terms of efficiency.
Amps. Hence, we adopted the MAPE90A metric for power                                Each model was trained 20 times with random initialization
system security analysis applications as introduced in [6]. It                   of weights and mini batches. This allowed us to compute the
median, and the 20th and 80th percentiles for each of the
observed metrics.
Experiment A : Constant Power Grid Topology
   The first experiment we conducted is a sanity check, at fixed
grid topology, similarly to what has been done previously in
the literature. We show good performance of our new method
but at the expense of additional computational expenses com-
pared to a fully connected network. However, this is not the
case for which our method was designed.
   Specifically, in this experiment, the power grid topology
(i.e. Aor , Aex and Ainj ) is the same in every datapoint in
both Train and Test sets. Thus, only the injections vary and
are randomly sampled. See [6] for more informations about
the injection sampling. We focus on a fictitious power grids          Fig. 7. Experiment A : Learning curves - Both the Fully Connected and
consisting in 30 nodes, 42 power lines, 20 consumptions and 6         our Graphical Neural Solver manage to outperform the DC approximation
productions (which are the same characteristics as the standard       before the 1000th learning iteration. The Fully Connected seems to quickly
                                                                      reach an asymptote, while our proposed architecture still keeps on learning
30 nodes case [29]). We then compare the performance of our           at the end of the 200,000 iterations. The plain line is the median of the 20
architecture with a Fully Connected neural network. A cross-          runs for each model, and the shaded areas is delimited by the 20th and 80th
validation has been performed on both architectures.                  percentiles of the 20 runs.
   For both architectures, each of the 20 models has been
trained for a number of 200,000 iterations (with mini batches
of size 100). Training sets consist of 100,000 samples, and           Experiment B : Random Grid Topologies of Constant Sizes
Test sets of 1,000 samples. We used fully connected neural               In this experiment, both injections and power grid topolo-
networks for the learning blocks of our proposed architecture         gies are randomly sampled. We randomly sample the power
(i.e. E, L(k) , D), with leaky ReLu activations. We also used         grid topologies (using the sampling method described below)
leaky Relu in the FC baseline.                                        in the set of connected power grids that have 30 nodes, 42
   Figure 7 shows that our GNS architecture (our proposed             power lines, 20 consumptions and 6 productions. Each point
method) learns faster in number of iterations. However, due           in both Train and Test set is a randomly sampled power grid,
to the larger number of parameters in our GNS, it takes ≈ 4           with a set of randomly sampled injections (same method). The
times longer for each iteration. This is due to the fact that our     distribution are the same in Train and Test sets.
proposed architecture is a lot larger and more complex than a            Random power grid generation: Given m the number of
Fully Connected (FC) network. Although the figure presents            electrical nodes, n the number of power lines, |P| the number
our proposed architecture as faster, one may prefer a simple          of productions and |C| the number of consumptions, we use
Fully Connected network for this task.                                the following process to generate random power grids:
   Table I presents the Test MAPE90A obtained for both
models and the DC-approximation. Both baselines easily out-              • Generate a random spanning tree of m nodes.
perform the DC-approximation, and there is a slight advantage            • Uniformly create random edge in the graph until there
in favor our GNS architecture. For each architecture, the                  are n edges.
median is displayed, as well as the 20th and 80th percentiles            • Attribute the |P| productions and |C| consumptions to

of the 20 independently learned models.                                    randomly picked nodes on the graph. Each electrical node
   The Fully Connected (FC) network is able to learn in this               can be neutral, production, consumption or both. It is
setup partly because the Power Grid topology is constant: it               however impossible for a node to have multiple injections
specializes in one specific instance of the load flow problem.             of the same type.
In the next experiment, where we randomly vary the power                 We chose not to present the results of the Fully Connected
grid topology in both Train and Test sets, it will be infeasible      network because it proved to be unable to learn on such a
to learn to generalize to other grid architectures for a FC net.      dataset. The experimental setup has been specifically designed
                                                                      so as to help our proposed architecture to generalize well.
                           TABLE I                                       Table II shows that our architecture is able to generalize to
E XPERIMENT A - I DENTICAL TOPOLOGY IN BOTH T RAIN AND T EST
M EDIAN OF THE METRICS ON TEST SET FOR 20 TRAINED MODELS (20 TH
                                                                      power grids that it has potentially never encountered in the
           AND 80 TH PERCENTILES BETWEEN BRACKETS )                   training set. While the Fully Connected baseline from the pre-
                                                                      vious experiment is completely unable to extract information
                        FC            proposed GNS       DC-approx.   from a dataset in which the Power Grid topology constantly
           Loss       0.0494             0.0332            0.68
                   [0.0483; 0.0507]   [0.0273; 0.0636]
                                                                      changes, our proposed architecture still manages to beat the
     MAPE90A           0.534              0.472             3.21      DC approximation on graphs that it has never encountered
    (% of error)    [0.508; 0.578]     [0.3880.501]
                                                                      in the Train set. For our Graph Neural Solver, the median is
displayed, as well as the 20th and 80th percentiles of the 20
learned models.                                                                                    6
                                                                                                       Ground Truth (solver)
                                                                                                   5   DC-approximation
                           TABLE II                                                                    Power Grid Size never observed in Train set
    E XPERIMENT B - R ANDOM T OPOLOGIES OF CONSTANT SIZE                                               Same Grid Size as in Train set


                                                                            MAPE90A (% of error)
M EDIAN OF THE METRICS ON TEST SET FOR 20 TRAINED MODELS (20 TH                                    4
           AND 80 TH PERCENTILES BETWEEN BRACKETS )
                                                                                                   3
                           proposed GNS       DC-approx.
                   Loss       0.0715           0.0678                                              2
                           [0.0623; 0.0882]
             MAPE90A           0.729             3.17                                              1
            (% of error)    [0.705; 0.873]
                                                                                                   0

                                                                                                        20         40           60          80         100
                                                                                                             Number of Nodes in the tested Power Grids
Experiment C : Random Grid Topologies of Various Sizes
   This experiment has the exact same training protocol as             Fig. 8. Experiment C : Generalization to both larger and smaller Power
the previous one, but the testing occurs on several datasets in        Grids - This graph shows the MAPE90A metric on various Test sets. We test
which the power grid sizes can be different. Predicting in such        our trained models on sizes that range from 10 to 110 nodes. While having
                                                                       observed only Power Grids with 30 nodes during Training, our proposed
a setting is impossible for a Fully Connected network, because         Graph Neural Solver manages to consistently beat the DC-approximation for
of a matrix dimension mismatch, which prevents the size of the         Power Grids whose sizes range from 10 to 110 nodes. 20 models were trained
input to change. Figure 8 shows the different Test MAPE90A             of the same architecture but with different random initializations. The dots are
                                                                       the median of the MAPE90A on each Test set, and the error bars are defined
for datasets that consist in random power grids of sizes in            by the 20th and 80th percentiles of these 20 runs. The red error bar sums
{10, 15, . . . , 105, 110}. In every case, the architecture has been   up the MAPE90A results on a Test set made of random Power Grid with 30
trained only on random 30 nodes Power Grids. This means                nodes (i.e. same size as during Training), while the blue error bars stand for
                                                                       the various Test sets made of Power Grid larger of smaller than 30 nodes.
that while having observed solely Power Grids with 30 nodes
during Training, our architecture is able to achieve a better
MAPE90A than the DC-approximation, on Power Grids whose                the adjacency matrices. This limitation is due to the lack of
sizes range from 10 nodes to 110 nodes. This experimentally            an implementation of sparse matrices of rank strictly above
proves the ability of our neural net to generalize to both larger      2 in TensorFlow. It strongly hinders the computational speed
and smaller power grids. It seems to be able to learn how to           of our implementation, and should be an important axis of
solve the load flow problem in general, while not specializing         improvement in terms of speed.
in a specific instance of the problem.
   A quite unexpected result that can be observed in this plot,                                        IV. D ISCUSSION & C ONCLUSIONS
is that the trained models are consistently more accurate on              Our proposed architecture relies on iteratively updating the
power grids that have 10 nodes than on power grids that                embeddings of power lines according to the values of the direct
have 30 nodes, even though the models were solely trained              neighbors. It relies on ideas from the blooming field of Graph
on power grids with 30 nodes.                                          Neural Networks, and aims at emulating the behavior of Load
   While most error bars are quite consistent in Figure 8,             Flow (LF) solvers. We experimentally showed its ability to
there seems to be a larger error on power grids of 100 nodes           generalize to both larger and smaller power grids than that
(that still remains below the DC-approximation). We have not           used for training, opening the door for strongly generalizable
identified what causes it to be higher than expected, but we           Neural Solver for other Physics applications.
think that some networks sampled in this test set were not                The main limitation of our current model is that we made the
representative enough of actual power systems, thus causing            strong choice to work with power grids where every line has
our neural network to perform a poorer accuracy than on the            the same physical characteristics. We are conscious that incor-
95 nodes and 105 nodes test sets.                                      porating the line characteristics in our neural network model
                                                                       would be critical to using it in real-world applications. We
Computational Time                                                     already have several insights on how this could be performed:
   Being able to decrease the computation time of complex              one could for instance have the adjacency matrix coefficients
problems such as load flows, computational fluid dynamics,             be functions of the line characteristics.
etc. can be a major catalyser in being able to iterate over               Another aspect that would require some attention is the
many different device design / situation, thus increasing our          computational speed of our artificial neural network archi-
abilities in terms of installation design and security analysis.       tecture. It currently treats the adjacency matrices as dense,
   In its current implementation and on power grids whose size         while they are actually very sparse. We envision that an
range from 10 nodes to 110 nodes, our Graph Neural Solver              implementation of our architecture that would exploit this
is approximately twice as fast as the proprietary Load-Flow            sparsity could provide much faster computations. Even without
solver used at RTE (which is already thoroughly optimized).            such optimization, our neural network is approximately twice
Our current implementation does not exploit the sparsity of            as fast as the Load Flow solver that we want to emulate.
   Our current implementation uses different Leap functions                         [5] M. Defferrard, X. Bresson, and P. Vandergheynst. Convolutional neural
L(0) , . . . , L(K−1) at each propagation step. However, it would                       networks on graphs with fast localized spectral filtering. In D. D.
                                                                                        Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors,
make sense to instead use the same propagation update at                                Advances in Neural Information Processing Systems 29, pages 3844–
each step. We will be further investigating some alterations                            3852. Curran Associates, Inc., 2016.
to the current formulation of the architecture, drawing some                        [6] B. Donnot and et al. Introducing machine learning for power system
                                                                                        operation support. In IREP Symposium, Espinho, Portugal, Aug. 2017.
inspiration from the recent and inspiring Neural Ordinary                           [7] B. Donnot, I. Guyon, A. Marot, M. Schoenauer, and P. Panciatici.
Differential Equations [4]. Moreover, we will investigate the                           Optimization of computational budget for power system risk assessment.
idea of training in an unsupervised our proposed architecture                           working paper or preprint, May 2018.
                                                                                    [8] B. Donnot, I. Guyon, M. Schoenauer, A. Marot, and P. Panciatici. An-
by directly minimizing the violation of the physical laws.                              ticipating contingengies in power grids using fast neural net screening.
   The sum operation in each propagation step ensures a                                 In IEEE WCCI 2018, Rio de Janeiro, Brazil, July 2018.
consistency between the latent representations. The meaning of                      [9] B. Donnot, I. Guyon, M. Schoenauer, A. Marot, and P. Panciatici. Fast
                                                                                        Power system security analysis with Guided Dropout. In European
the d-dimensional information contained in each line, at each                           Symposium on Artificial Neural Networks, Bruges, Belgium, Apr. 2018.
propagation update is unchanging. We could plug the learned                        [10] L. Duchesne, E. Karangelos, and L. Wehenkel. Using machine learning
decoder D in each power line and at each propagation step to                            to enable probabilistic reliability assessment in operation planning, 06
                                                                                        2018.
visualize the evolution of the flows.                                              [11] D. K. Duvenaud, D. Maclaurin, J. Aguilera-Iparraguirre, R. Gómez-
   We could also investigate an adaptative number of propa-                             Bombarelli, T. Hirzel, A. Aspuru-Guzik, and R. P. Adams. Convolu-
gation steps. The RNN domain already deals with this type of                            tional networks on graphs for learning molecular fingerprints. CoRR,
                                                                                        abs/1509.09292, 2015.
problems and should provide us with ideas on how to build                          [12] J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl.
Graph Neural Solver with an adaptable number of propagation                             Neural message passing for quantum chemistry. CoRR, abs/1704.01212,
iterations. Another aspect is that it is possible for a Load                            2017.
                                                                                   [13] J. D. D. Glover and M. S. Sarma. Power System Analysis and Design.
Flow computation to not converge, not because of numerical                              Brooks/Cole Publishing Co., Pacific Grove, CA, USA, 3rd edition, 2001.
instability, but because the power grid is actually in danger of                   [14] W. L. Hamilton, R. Ying, and J. Leskovec. Inductive representation
blackout. Our proposed architecture could be modified so as to                          learning on large graphs. CoRR, abs/1706.02216, 2017.
                                                                                   [15] R. Herzig, M. Raboh, G. Chechik, J. Berant, and A. Globerson. Mapping
include predictions on whether the computation will converge                            images to scene graphs with permutation-invariant structured prediction.
or not.                                                                                 02 2018.
   Currently, we only deal with stead-state power flows but                        [16] T. N. Kipf, E. Fetaya, K. Wang, M. Welling, and R. S. Zemel.
                                                                                        Neural relational inference for interacting systems. In Proceedings of
we could try to predict the dynamic aspects of power grids,                             the 35th International Conference on Machine Learning, ICML 2018,
or try to tackle some other finite-element problems that can                            Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, pages 2693–
have temporal components. The ability to develop extremely                              2702, 2018.
                                                                                   [17] T. N. Kipf and M. Welling. Semi-supervised classification with graph
fast AI-based proxies for Fluid Dynamics solvers could help                             convolutional networks. CoRR, abs/1609.02907, 2016.
researchers and engineers perform much faster investigation                        [18] P. Kundur, N. J. Balu, and M. G. Lauby. Power system stability and
when it comes to designing novel buildings, aircrafts or wind                           control, volume 7. McGraw-hill New York, 1994.
                                                                                   [19] P. S. Kundur. Power system stability. In Power System Stability and
turbines. Those applications usually require heavy and slow                             Control, Third Edition, pages 1–12. CRC Press, 2012.
computations, which restrict the amount of designs that can                        [20] Y. Li, D. Tarlow, M. Brockschmidt, and R. S. Zemel. Gated graph
be tested.                                                                              sequence neural networks. CoRR, abs/1511.05493, 2015.
                                                                                   [21] Y. Li, O. Vinyals, C. Dyer, R. Pascanu, and P. Battaglia. Learning deep
                                                                                        generative models of graphs. CoRR, abs/1803.03324, 2018.
                         ACKNOWLEDGMENT                                            [22] J. Ling, A. Kurzawski, and J. Templeton. Reynolds averaged turbu-
                                                                                        lence modelling using deep neural networks with embedded invariance.
   We would like to thank Vincent Barbesant, Rémy Clément,                            Journal of Fluid Mechanics, 807:155–166, 11 2016.
Laure Crochepierre, Guillaume Genthial and Wojciech Sitarz                         [23] K. Mills, M. Spanner, and I. Tamblyn. Deep learning and the schrödinger
for their attentive review and remarks.                                                 equation. Physical Review A, 96, 02 2017.
                                                                                   [24] S. J. Pan and Q. Yang. A survey on transfer learning. IEEE Transactions
                                                                                        on Knowledge and Data Engineering, 22(10):1345–1359, Oct 2010.
                              R EFERENCES                                          [25] A. Santoro, D. Raposo, D. G. T. Barrett, M. Malinowski, R. Pascanu,
 [1] P. W. Battaglia, J. B. Hamrick, V. Bapst, A. Sanchez-Gonzalez,                     P. Battaglia, and T. P. Lillicrap. A simple neural network module for
     V. F. Zambaldi, M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro,                relational reasoning. CoRR, abs/1706.01427, 2017.
     R. Faulkner, aglar Gülehre, F. Song, A. J. Ballard, J. Gilmer, G. E. Dahl,   [26] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini.
     A. Vaswani, K. R. Allen, C. Nash, V. Langston, C. Dyer, N. Heess,                  The graph neural network model. Trans. Neur. Netw., 20(1):61–80, Jan.
     D. Wierstra, P. Kohli, M. Botvinick, O. Vinyals, Y. Li, and R. Pascanu.            2009.
     Relational inductive biases, deep learning, and graph networks. CoRR,         [27] J. Tompson, K. Schlachter, P. Sprechmann, and K. Perlin. Acceler-
     abs/1806.01261, 2018.                                                              ating eulerian fluid simulation with convolutional networks. CoRR,
 [2] M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst.               abs/1607.03597, 2016.
     Geometric deep learning: Going beyond euclidean data. IEEE Signal             [28] M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Póczos, R. Salakhutdinov, and
     Processing Magazine, 34(4):18–42, July 2017.                                       A. J. Smola. Deep sets. CoRR, abs/1703.06114, 2017.
 [3] J. Bruna, W. Zaremba, A. Szlam, and Y. Lecun. Spectral networks and           [29] R. D. Zimmerman, C. E. Murillo-Sanchez, and R. J. Thomas. Matpower:
     locally connected networks on graphs. In International Conference on               Steady-state operations, planning, and analysis tools for power systems
     Learning Representations (ICLR2014), CBLS, April 2014, 2014.                       research and education. IEEE Transactions on Power Systems, 26(1):12–
 [4] T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud. Neural                19, Feb 2011.
     ordinary differential equations. In S. Bengio, H. Wallach, H. Larochelle,
     K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in
     Neural Information Processing Systems 31, pages 6572–6583. Curran
     Associates, Inc., 2018.