Mask mechanism for GATConv with batchDGLGraph type data

I have met a question:
Suppose I use a batch of data with torch.Tensor type which represents a set of batched nodes for a batch of graphs(size of data: batch_size* max_node_num * emb). Also the data is attached with a mask tensor due to the unequal number of nodes for different graphs in a batch. Currently I use graph attention network(GAT) to build my model. So how to intergrate mask tensor(batch_size * max_node_num) along with data tensor(batch_size* max_node_num * emb) into GATConv to adapt for a batch of data computation for a model?

Hope for the answer.
Thank you!

Any particular reason you want to pad the number of nodes in each graph to max_node_num? For normal tensor computation you need to perform zero padding but this is not required for DGL.

Hi,
The question comes from my research(NLP). I need to improve the semantic parsing model with Graph Attention Network for some batches of sentences(with different lengths of sequence of words) with different graphs. Although I could do with ‘for’ loop to tranverse and concate, it would be time consuming in dealing with a large number of batches. I think this function could do efficiently in package(with some acceleration like CUDA c++ components in ‘fairseq’ by facebook AI). So for the cause in proposing this requirement.

How about multiplying g.ndata['ft'] by mask before L113. Maybe you can elaborate more about the computation of your modeling.

Yes, however it works only for certain graph, not a batch of graphs. what I want to fit is a batch of graphs with different number of nodes.

Sorry why does it only work for certain graph? I suppose you need to initialize the masks for all graphs anyway? Could you please elaborate more about the whole computation process?

As for the parameters in the forward module https://github.com/dmlc/dgl/blob/master/python/dgl/nn/pytorch/conv/gatconv.py#L113:

graph : DGLGraph
The graph.
feat : torch.Tensor
The input feature of shape :math:(N, D_{in}) where :math:D_{in}
is size of input feature, :math:N is the number of nodes.

The type of graph is DGLGraph, not BatchedDGLGraph. So the function is only fit for operating a graph for each time, not for a batch of graphs.

Yes, I do need to initialize the masks for a batch of graphs, but I could not operate the procedure(handle a minibatch operation of GATConv) without ‘for’ loop to tranverse the batch of graphs.

There might be place to improve our documentation, but note that BatchedDGLGraph in fact inherits DGLGraph and nn modules are supposed to work with them. Let’s say for each graph in a list we have g.ndata['mask'] for a boolean mask indicating the existence of nodes. With dgl.batch(g_list), you will automatically get the masks concatenated, which can be used for masking non-existing nodes during the computation.