Does my understanding of GCN implementation right?

guotong1988 · November 19, 2019, 1:32am

From https://docs.dgl.ai/tutorials/basics/1_first.html

def gcn_message(edges):
    return {'msg' : edges.src['h']}

def gcn_reduce(nodes):
    return {'h' : torch.sum(nodes.mailbox['msg'], dim=1)}


class GCNLayer(nn.Module):
    def __init__(self, in_feats, out_feats):
        super(GCNLayer, self).__init__()
        self.linear = nn.Linear(in_feats, out_feats)

    def forward(self, g, inputs):

        g.ndata['h'] = inputs

        g.send(g.edges(), gcn_message)

        g.recv(g.nodes(), gcn_reduce)

        h = g.ndata.pop('h')

        return self.linear(h)

My question is about gcn_message and gcn_reduce.

If we pass all the edges into gcn_message, does it mean that each edge will send message from its own node?

If we pass all the nodes into gcn_reduce, does it mean that each node will recv the message from all of its edges?

if I edit gcn_reduce to

def gcn_reduce(nodes):
    recv_msg = nodes.mailbox['msg'] # [1,9,34]
    recv_msg2 = torch.sum(recv_msg, dim=1) # [1,34]
    return {'h' : recv_msg2}

I debug and stop at the first time. What does [1,9,34] mean in the code above?

Thank you.

VoVAllen · November 19, 2019, 7:07am

DGL did degree bucketing to accelerate the computation. The computation on nodes with same in-degrees are batched together. For example, if both node 1 and 2 have 3 in edges, the incoming message would be batched together, therefore the shape of nodes.mailbox['msg'] would be [2(bucket size, two nodes has same in degrees), 3(node’s degree is 3), feat_size].

guotong1988 · November 19, 2019, 7:15am

Thank you!

If we pass all the edges into gcn_message , does it mean that each edge will send message from its own node?

If we pass all the nodes into gcn_reduce , does it mean that each node will recv the message from all of its edges?

Am I right？

guotong1988 · November 19, 2019, 7:18am

I am sorry.

“both node 1 and 2 has 3 in edges”

Could you please upload a figure to describe this sentence?

Thank you!

VoVAllen · November 19, 2019, 7:27am

It just means node has 3 edges. Since DGLGraph is a directed graph, therefore I say 3 in edges (incoming edges).

guotong1988 · November 19, 2019, 7:29am

Thank you.

I understand.

Each node has 3 incoming edges and there exists 2 nodes.

I had better improve my English language.

qillbel · April 11, 2020, 11:44am

Hey all,

I have following this example as well, and have a question about it.

labeled_nodes = torch.tensor([0, 33]) # only the instructor and the president nodes are labeled
labels = torch.tensor([0, 1]) # their labels are different

Based on my understanding, those lines assume that only node 0 and 33 are known (I think because this example demonstrate semi-supervised technique). My question is what if all the nodes are known? Can I still modify this code to make it supervised technique?

Thanks in advance.

BarclayII · April 13, 2020, 7:36am

The GCN example (and the original GCN paper) works under a semi-supervised setting, meaning that you have the entire graph beforehand but labels on only some of the nodes, and you need to predict the labels of the rest of nodes.

VoVAllen · April 13, 2020, 7:45am

Yes. You can use all labels if you have. However you may need to split train/validation/test set for generalization.

qillbel · April 13, 2020, 7:09pm

Thanks @VoVAllen,

Let me give a try.

And to follow it up, once I have trained a model, save it, and load it again, how can I infer the nodes in a new graph? I tried to find a demo about it, but most of the demos usually perform train-test-eval in one go.

Thanks again