How DGL divides graph into NodeBatches of particular size?

Hello!

Thank you for the great product.
When I print inside reduce function:

def gcn_reduce(nodes):
    # The argument is a batch of nodes.
    # This computes the new 'h' features by summing received 'msg' in each node's mailbox.
    print(nodes)
    print(nodes.batch_size())
    print(nodes.nodes())
    return {'h' : torch.sum(nodes.mailbox['msg'], dim=1)}

I have the following output:

<dgl.udf.NodeBatch object at 0x7fd348f4f450>
1
tensor([1])
<dgl.udf.NodeBatch object at 0x7fd343b16c90>
1
tensor([2])
<dgl.udf.NodeBatch object at 0x7fd348f4f650>
1
tensor([0])
<dgl.udf.NodeBatch object at 0x7fd352143390>
3
tensor([23,  8, 13])
<dgl.udf.NodeBatch object at 0x7fd349038410>
11
tensor([26,  9, 12, 14, 15, 16, 17, 18, 20, 21, 22])
<dgl.udf.NodeBatch object at 0x7fd349038a10>
6
tensor([28, 25, 24, 10,  4, 19])
<dgl.udf.NodeBatch object at 0x7fd3521435d0>
6
tensor([30, 29, 27,  7,  6,  5])
<dgl.udf.NodeBatch object at 0x7fd349038450>
1
tensor([11])
<dgl.udf.NodeBatch object at 0x7fd349038890>
1
tensor([32])
<dgl.udf.NodeBatch object at 0x7fd348c581d0>
2
tensor([31,  3])
<dgl.udf.NodeBatch object at 0x7fd349038750>
1
tensor([33])

How the size of NodeBatch is calculated?

According to the Internal Message Passing Implementation by @zihao

Note that not all nodes have the same in-degrees, the parallel of all nodes’ reduce functions is not trivial. Currently DGL uses degree bucketing policy, which means auto-batching the reduce function of nodes with the same in-degrees, this is a good solution for uniform degree graph (subgraph produced by some GCN sampling algorithms) or trees (e.g. TreeLSTM) where degree variance is small.

Therefore, a batch is formed using the same in-degrees.

For example, node 1 and 5 has 3 incoming edges. Therefore they would be formed as NodeBatch with nodes.batch_size() equal 2 by DGL

Is this statement correct?

Do you have an article that proves that such an approach is efficient?

Andrei

Yes currently a batch is formed using the same in-degrees, that’s what we called degree bucketing.

We never said such approach is efficient, it’s a fall-back solution – if your operation could not be expressed as the combination of builtin functions, write the user-defined message+reduce functions instead.

In most cases we recommend user to use builtin functions, we have optimized these operations.

1 Like

Thanks, @zihao!

Andrei