How to compute the gradient of edge with respect to some loss

Hi DGLer,

I wonder whether the DGL supports getting the gradient of edges? We can just set .requirs_grad to be true to get the gradient of the features of edges or the features of nodes. But it seems not that intuitive how to compute the gradient of edges (aka, the adj matrix).


There are two interpretations of “computing the gradient of adjacency matrix of a graph”.

  1. The graph is a weighted graph, and you wish to compute the gradients w.r.t. weights on the edges (i.e. computing the gradients w.r.t. only the non-zero entries of the adjacency matrix). In this case, you can store the weights as an edge feature, enable its gradients, and compute it as usual.
  2. You wish to compute the gradients w.r.t. the adjacency matrix itself including the zero entries. To me, this means that the adjacency matrix itself will be dense. In this case, you can create a weighted complete graph and learn its edge weights as edge features. Although for simpler computations I would probably prefer direct tensor operations in PyTorch rather than playing with DGL.

Please feel free to follow up.



For the first case, what should we do if the graph is unweighted? In this case, we only have the (source, dest) pairs as the edges and do not have any edge feature. For example, the original GCN model on the cora dataset.

In this case, you can convert it into the second case by creating a complete graph, with the edge weight as 1 if the edge exists in your original graph, and 0 otherwise.

And when aggregating, instead of summing them up:

complete_graph.ndata['h'] = node_features
dgl.update_all(fn.copy_u('h', 'm'), fn.sum('m', 'h_neigh'))
node_aggregations = complete_graph.ndata['h_neigh']

You can instead perform a weighted sum:

complete_graph.ndata['h'] = node_features
dgl.update_all(fn.u_mul_e('h', 'weight', 'm'), fn.sum('m', 'h_neigh'))
node_aggregations = complete_graph.ndata['h_neigh']

So as to make the edge weights a part of the computation graph.

Note that the above is not the only way of using edge weights in aggregation. You may want to modify the aggregation accordingly if you are running models other than GCN (e.g. GAT etc.)

I see the point. Thanks for your kindly reply.