Explainability using saliency and integrated gradients (Captum)

Would it be very difficult to make capture library work with DGL models? Does anyone has experience with this? Many thanks in advance! I found a pytorch geometric example but I get some errors when trying to make in work in DGL:

https://colab.research.google.com/drive/1fLJbFPz0yMCQg81DdCP5I8jXw9LoggKO?usp=sharing

2 Likes

The example looks interesting and explainability is what we are trying to build for DGL. Could you provide the detailed error you’ve met?
Copy @mufeili

1 Like

See the example below. Credit to @BarclayII .

import dgl
import dgl.nn as dglnn
import torch
import torch.nn.functional as F
from captum.attr import IntegratedGradients
from functools import partial

w = dglnn.EdgeWeightNorm()
m = dglnn.GraphConv(20, 30, norm='none')

gs = []
for _ in range(5):
    g = dgl.graph((torch.randint(0, 20, (100,)), torch.randint(0, 20, (100,))), num_nodes=20)
    g = dgl.add_reverse_edges(g)
    g = dgl.remove_self_loop(g)
    g = dgl.add_self_loop(g)
    gs.append(g)
g = dgl.batch(gs)

edge_weight = (torch.randn(g.num_edges()) ** 2).requires_grad_()    # weighted graph
x = torch.randn(g.num_nodes(), 20).requires_grad_()                 # node feature inputs

def forward(x, edge_weight, g):
    norm = w(g, edge_weight)
    return F.relu(m(g, x, edge_weight=norm))
ig = IntegratedGradients(partial(forward, edge_weight=edge_weight, g=g))
ig.attribute(x, target=0, internal_batch_size=g.num_nodes(), n_steps=50)
2 Likes

@mufeili @BarclayII @VoVAllen many thanks for your help, you are amazing! I have been checking the example slowly and I have a couple of questions:

  • Is the example above intended for node prediction? I was puzzled at first with the output but I realised that the dimensions of the edge mask look like number of nodes x number of features, so I thought that this might be the case
  • I would like to make this work in the setting of an already trained graph prediction model (I am actually playing with your graph classification tutorial before moving to my own data). I assume in this case that the forward function we define in the example should be changed but I am unsure how. Do you have any advise or suggestion? Should we pass the pretrained model? But what if it doesn’t use edge weights at all such as below?
class GCN(nn.Module):
    def __init__(self, in_feats, h_feats, num_classes):
        super(GCN, self).__init__()
        self.conv1 = GraphConv(in_feats, h_feats)
        self.conv2 = GraphConv(h_feats, num_classes)
    
    def forward(self, g, in_feat):
        h = self.conv1(g, in_feat)
        h = F.relu(h)
        h = self.conv2(g, h)
        g.ndata['h'] = h
        return dgl.mean_nodes(g, 'h')

Thanks in advance :slight_smile:

The code above is indeed for node prediction, but it should also work for graph classification. Just replace the forward function with your graph classification module and you should be fine.

The edge weights are just an example of how to attribute node feature inputs together with edge features. So you can just remove it if you don’t have edge features.

Thanks @BarclayII! I did that actually and it worked!

ig = IntegratedGradients(partial(model, G))
mask = ig.attribute(G.ndata['attr'].float(), target=0, internal_batch_size=1, n_steps=50)

where model is the result of: model = GCN(dataset.dim_nfeats, 16, dataset.gclasses) after training. Would that be correct?

mask has shape Number of Nodes x Number of Node Features so, are we getting here the importance of each node feature in the classification? Would it be possible to get, for example, the edge importances as well? How could we do that?

Thanks in advance!

I think so.

Should be.

Essentially you will create a separate IntegratedGradients/Saliency instance but with another partial function fixing the node features and the graph. Say if your forward function is

forward(node_features, edge_features, g)

Then to get node feature importance you use

IntegratedGradients(partial(forward, edge_features=..., g=...))

and to get edge feature importance you use

IntegratedGradients(partial(forward, node_features=..., g=...))

Thanks again @BarclayII! I have an extra question regarding the additional partial function to get edge importances:

From what you mention, I assume that with the model that I have at the moment that doesn’t use edge features at all (the GCN class above) we can’t do or get edge importances. Is that correct? If we could, would you mind explaining how please? I am unsure of how to do a forward function that feeds edge weights if they aren’t used.

I guess I need to create a new model, similar to GCN, that does edge convolution as well and therefore I could use that as a forward function. Or did I get this totally wrong? Thanks!

If by “edge importance” you mean the importance of an edge’s connection itself (i.e. attribution to the whole adjacency matrix), then I guess the best you can do is to use a complete graph, assign masks to it, and run the GCN treating the mask as an edge feature. Essentially, in the example above, you replace the edge_weight variable with a 0-1 array where the value is 1 if an edge exists, and 0 otherwise. Then you run IntegratedGradients or Saliency with partial function fixinig the node features and the graph.

Thanks again @BarclayII!

I have started a small Jupyter notebook to have this together somewhere :slight_smile:

I have a couple of questions:

  • When defining the GCN_edgefeats class with the GNN model, I have added the EdgeWeightNorm. But since I am using two graph convolutions, is it correct to use it the same in the two layers? Or should I do EdgeWeightNorm twice? (I am not used to working with edge features, apologies!)
class GCN_edgefeats(nn.Module):
    def __init__(self, in_feats, h_feats, num_classes):
        super(GCN_edgefeats, self).__init__()
        self.conv1 = GraphConv(in_feats, h_feats)
        self.conv2 = GraphConv(h_feats, num_classes)
        self.w = EdgeWeightNorm()
        self.out_act = nn.Sigmoid()
    def forward(self, g, in_feat,edge_weight):
        norm = self.w(g, edge_weight)
        h = self.conv1(g, in_feat,edge_weight=norm)
        h = F.relu(h)
        h = self.conv2(g, h,edge_weight=norm)
        g.ndata['h'] = h
        return self.out_act(dgl.mean_nodes(g, 'h'))
  • When trying to add the edge weight from the adjacency matrix, I get an error when training the network:
[/usr/local/lib/python3.7/dist-packages/dgl/nn/pytorch/conv/graphconv.py](https://tme4sw1skj-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab-20210824-060141-RC00_392612027#) in forward(self, graph, edge_weight) 104 graph = block_to_graph(graph) 105 if len(edge_weight.shape) > 1: --> 106 raise DGLError('Currently the normalization is only defined ' 107 'on scalar edge weight. Please customize the ' 108 'normalization for your high-dimensional weights.')

DGLError: Currently the normalization is only defined on scalar edge weight. Please customize the normalization for your high-dimensional weights.

I have commented that line and added some random weights to make it work, but do you have any hints into how to solve it?

  • When trying to get the edge importances as you advised, I get the error below and I think it is because it understand the edge weights as node features. Do you know how to change this?
[/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py](https://tme4sw1skj-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab-20210824-060141-RC00_392612027#) in _call_impl(self, *input, **kwargs) 1049 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1050 or _global_forward_hooks or _global_forward_pre_hooks): -> 1051 return forward_call(*input, **kwargs) 1052 # Do not call functions when jit is used 1053 full_backward_hooks, non_full_backward_hooks = [], []

TypeError: forward() got multiple values for argument 'in_feat'

Many thanks in advance for all your help! :smiley:

This usually means that your edge_weight variable is not a vector (e.g. a 2D matrix). Could you make sure that the edge_weight is one-dimensional vector with the number of elements the same as the number of edges?

functools.partial can throw subtle errors when mixing positional arguments with keyword arguments (see functools — Higher-order functions and operations on callable objects — Python 3.9.6 documentation for details). For instance, the following may not work:

def f(a, b, c):
    return a + b + c
partial(f, 3, 4)(5)        # ok - equivalent to f(3, 4, 5)
partial(f, a=3, b=4)(5)    # error
partial(f, b=4, c=5)(3)    # ok - equivalent to f(3, 4, 5)
partial(f, a=3, b=4)(c=5)  # ok - equivalent to f(3, 4, 5)

Unfortunately captum seems to assume that the forward function takes in positional arguments where the first few arguments are the ones you want to attribute. You will need to work around it by wrapping the model yourself.

def tmp(edge_weight):
    return model(batched_graph, batched_graph.ndata['h_n'].float(), edge_weight)

ig = IntegratedGradients(tmp)
# make sure that the internal batch size is the same as the number of nodes for node
# feature, or edges for edge feature
mask = ig.attribute(edge_weight, target=0,
    internal_batch_size=batched_graph.num_edges(), n_steps=50)

Thanks again @BarclayII !! It is working with the dummy data, I leave the code here with visualisation for reference if someone wants to do a similar thing in the future :smiley: (I will do some trials on the Mutagenicity dataset to check if the explanations agree more or less with the known ground truth).

I have an extra question: I have a trained model of the form


class Classifier_gen(nn.Module):
    def __init__(self, in_dim, hidden_dim_graph,hidden_dim1,n_classes,dropout,num_layers,pooling):
        super(Classifier_gen, self).__init__()
        if num_layers ==1:
            self.conv1 = GraphConv(in_dim, hidden_dim1)
        if num_layers==2:
            self.conv1 = GraphConv(in_dim, hidden_dim_graph)
            self.conv2 = GraphConv(hidden_dim_graph, hidden_dim1)
        if pooling == 'att':
            pooling_gate_nn = nn.Linear(hidden_dim1, 1)
            self.pooling = GlobalAttentionPooling_mod(pooling_gate_nn)
        self.classify = nn.Sequential(nn.Linear(hidden_dim1,hidden_dim1),nn.Dropout(dropout))
        self.classify2 = nn.Sequential(nn.Linear(hidden_dim1, n_classes),nn.Dropout(dropout))
        self.out_act = nn.Sigmoid()
    def forward(self, g,num_layers,pooling):
        h = g.ndata['h_n'].float()
        h = F.relu(self.conv1(g, h))
        if num_layers=='2':
            h = F.relu(self.conv2(g, h))
            
        g.ndata['h'] = h

        if pooling == "max":
            hg = dgl.max_nodes(g, 'h')
        elif pooling=="mean":
            hg = dgl.mean_nodes(g, 'h')
        elif pooling == "sum":
            hg = dgl.sum_nodes(g, 'h') 
        elif pooling =='att':  
            # Calculate graph representation by averaging all the node representations.
            [hg,g2] = self.pooling(g,h) 
        
        g2 = hg
        a2=self.classify(hg)
        a3=self.classify2(a2)
        return self.out_act(a3),g2,hg

And I am trying to get the node features relevance as we were doing before (my variable ‘model’ is the trained classifier).

When I do

ig = IntegratedGradients(partial(model, G,num_layers=2,pooling='att'))
mask = ig.attribute(G, target=0, internal_batch_size=1, n_steps=50)

I get the error: AssertionError: inputsmust have type torch.Tensor but <class 'dgl.heterograph.DGLHeteroGraph'> found:

I have tried to do some wrapping as

def tmp(G):
    return model(G, G.ndata['h_n'].float(),num_layers=2,pooling='att')


ig = IntegratedGradients(tmp)
mask = ig.attribute(G, target=0, internal_batch_size=1, n_steps=50)

But I get the same error :frowning:

Do you have any idea if it is possible to make it work? Since the forward function of the classifier takes a lot of inputs, defines the node features already from the graph and returns a few things as well, I am worried it might not work. Should I redefine the classifier then? Thanks!

The wrapping function must take in feature tensors as input. So your wrapper should look something like this instead:

def tmp(features):
    with G.local_scope():
        G.ndata['h_n'] = features
        return model(G, num_layers=2, pooling='att')
1 Like

Thanks again @BarclayII for all your help! I will update the public notebook accordingly :slight_smile: and I might come back with extra questions if I have them.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.