Block.num_nodes() maybe incorrect?

g = dgl.graph(([0, 1, 2, 4, 5, 6], [3, 3, 3, 0, 1, 2]))

sampler = dgl.dataloading.MultiLayerFullNeighborSampler(1)

dataloader = dgl.dataloading.NodeDataLoader(g, th.arange(g.num_nodes()), sampler, batch_size=g.num_nodes())

for input_nodes, output_nodes, blocks in dataloader:
    print(blocks[0]) # Block(num_src_nodes=7, num_dst_nodes=7, num_edges=6)
    print(blocks[0].edges()) # (tensor([4, 5, 6, 0, 1, 2]), tensor([0, 1, 2, 3, 3, 3]))
    print(blocks[0].num_nodes()) # 14

It turns out that the .num_nodes() of a block just return num_src_nodes + num_dst_nodes, where nodes maybe duplicated, rather than the true number of nodes.

In my understanding, DGLBlock treats source and destination nodes differently.

>>> blocks[0].ntypes
['_N', '_N']
>>> blocks[0].srctypes
['_N']
>>> blocks[0].dsttypes
['_N']
1 Like

Because dstnodes will always appear the first in srcnodes, you can just use num_src_nodes() to get the number of nodes used in the computation of the current minibatch.

The behavior of outputting num_src_nodes + num_dst_nodes in blocks[0].num_nodes() is expected, because an MFG (block) is conceptually a bipartite graph, and therefore the srcnodes and dstnodes are treated as different nodes.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.