Node-level classification in multiple graphs

acho · March 28, 2019, 8:48pm

Hi there,

I’ve seen in the tutorials examples where all the nodes are classified (link) and examples where the graph as a whole is classified (link). However, I’m interested in performing regression at node-level in different graphs, where there’s only one node of interest.

Making reference to the karate club example, let’s say that I have data for clubs with good and bad instructors (labeled as good or bad instructors) and I want to classify the instructors of new clubs (not the club itself and not every member, just the instructors).

Is there any example/doc available that might guide me?

Thanks in advance

mufeili · March 29, 2019, 3:31am

Check out our example of GAT on PPI.

acho · March 30, 2019, 9:21pm

I’ll do that! Thank you

acho · April 7, 2019, 5:55pm

I have been taking a look at the example, but the size of the labels’ tensor is Nx121 (where N is the number of nodes in the batched graph). Given that the batch contains two sub-graphs I could only provide two labels (a 2x121 tensor), one for each sub-graph.

Maybe I didn’t explain myself correctly. In the example I am dealing with I am only interested in one of the nodes of each graph and I only have a label for these nodes (we can safely assume it’s always node 0 in each graph if it helps).

I understand that it is doable and less demanding from a computational point of view, but I can’t figure out how to implement the loss function. I would have to replace BCEWithLogitsLoss and f1_score with another function that only takes into account one node (or some nodes in the case of batches).

Any help would be appreciated!

mufeili · April 7, 2019, 7:31pm

There can be two approaches:

Get back the list of sub-graphs with node features updated and compute a loss function on the corresponding nodes. You can get back a list of DGLGraphs with g_list = dgl.unbatch(bg), where bg is a batched graph.

Simply index into the node features of the batched graph and compute a loss on them. See the example below

import dgl
import torch
g1 = dgl.DGLGraph()
g1.add_nodes(2)
g1.ndata['h'] = torch.tensor([[1.], [2.]])

g2 = dgl.DGLGraph()
g2.add_nodes(3)
g2.ndata['h'] = torch.tensor([[3.], [4.], [5.]])
bg = dgl.batch([g1, g2])

# To get the node feature of the first node in the first graph 
# and second node in the second graph.
print(bg.ndata['h'][[0, 3]])

Note that the indices of nodes in the batched graph are relabeled using consecutive integers from 0 following the order of subgraphs in the list and the order of nodes in subgraphs.

Hope this helps.

acho · April 9, 2019, 3:45pm

Hello,

Thank you very much for your response. I’ve done as you suggested and I think that the loss function would work properly now.

However, I’m getting NaNs in the predictions and I don’t know how to fix this. From the GAT example:

# No NaNs are found in 'inputs' or 'h' at this point
h = inputs
for l in range(self.num_layers):
    h = self.gat_layers[l](h).flatten(1)
    # At this point 'h' contains NaNs (more as we go in deeper layers)

I’m working on a new dataset, but the dataset class that I’ve coded is quite similar to the PPIDataset example.

I’ve uploaded all the code with a tiny version of the dataset I’m generating to https://github.com/ljmanso/socnav . I’ve tried to keep it as simple as possible so that it’s easy to understand.

Any help would be appreciated.

mufeili · April 9, 2019, 7:05pm

The NaN values appear starting from ret = self.g.ndata['ft'] / self.g.ndata['z'].
==> This suggests that some values in self.g.ndata['z'] are 0.
==> self.g.ndata['z'] is obtained by summing unnormalized attentions over incoming edges for nodes. If some are zero, then it’s likely that the corresponding nodes have no incoming edges at all, which I’ve verified.
==> To circumvent this issue, common practices are using only the largest connected component from a graph or adding self loops. Since your graphs are pretty small, I think it’s a better idea to add self loops. This can be performed by adding self.add_edges(self.nodes(), self.nodes()) in the last line of SocNavGraph initialization.
==> Verified that NaN values disappear

acho · April 10, 2019, 9:42am

That worked! Thanks!