How does dgl create mailbox?

hust-nj · April 24, 2020, 1:47am

After reading a lot of code using dgl, it is indeed very simple to use, but one thing I haven’t thought about clearly is that when writing reduce_func, the length of the mailbox at each vertex is different.

def reduce_func(nodes):
	return {'h': torch.sum(nodes.mailbox['m'], dim=1)}

For example, torch.sum above. If the number of adjacent vertices of each vertex is not the same, it is impossible to stack to a big tensor. How can dgl handle this problem? Using padding or sampling a minibatch of neighbor vertices? Can we specify the behavior ourselves?

Hope for your answer, thank you!

zihao · April 24, 2020, 6:25am

The default behavior is degree bucketing (e.g. group nodes with the same in-degrees), the dgl scheduler will generate these mailboxes, see https://github.com/dmlc/dgl/blob/master/python/dgl/runtime/degree_bucketing.py and https://github.com/dmlc/dgl/blob/605b51857ea2380da76816cdf56f64e69df30dba/src/scheduler/scheduler.cc

This solution is not optimal especially when the graph have large degree variance.
We have also tried strategies such as degree padding https://github.com/dmlc/dgl/pull/973 and it works better when use some complex reducers such as LSTM.
Arrange these variable length messages as RaggedTensor is the best solution but it depends on how much our backend (pytorch/mxnet/tensorflow) supports Ragged Tensors.

We recommend user to use builtin functions: https://docs.dgl.ai/api/python/function.html which fuses message+reduce functions and use efficient sparse kernels to replace them.

hust-nj · April 24, 2020, 6:32am

Hello, thanks for your answer.
What is the meaning of

Arrange these variable length as RaggedTensor

Can you give me an example or paper ?

zihao · April 24, 2020, 6:41am

Tensorflow has a full ragged tensor support: https://www.tensorflow.org/guide/ragged_tensor.
PyTorch has a minimal implementation called PackedSequence: https://pytorch.org/docs/stable/nn.html?highlight=packed#torch.nn.utils.rnn.PackedSequence as far as I know it can only be used in RNN/LSTM.
MXNet’s ragged tensor support is in progress.