Edge_embed shape in GatedGraphConv

in the dgl.nn.pytorch GatedGraphConv class.
the model defined includes a layer of embedding of shape (number_of_edges X output_feature_size**2 )

in the example of babi task given this output feature size is taken as task id /
example :

>>> import dgl 
>>> from dgl.nn.pytorch import GatedGraphConv 
>>> k = GatedGraphConv(in_feats=5, out_feats=10, n_steps=1, n_etypes=4)
>>> k
  (edge_embed): Embedding(4, 100)
  (gru): GRUCell(10, 10)

Why do we need to maintain this shape of embedding dict ?
I understand the model learns an embedding for each edge type and its message update goes through the gated gru cell.

but why is embedding vector length square of output_feature_size .
Shouldn’t it be number_of_edges X output_feature_size

In GGNN we learn a projection matrix for each type of edge.
In your example the number of edges is 4 and the output feature size is 10, we flatten the projection matrix of shape (10, 10) to a vector of length 100 for each edge type, so the embedding shape is (4, 100).

In DGL 0.4.1 we no longer uses nn.Embedding but explicitly uses a nn.Linear for each edge type, which is more clear: https://github.com/dmlc/dgl/blob/9a0511c8e91a7f633c9c3292fccbcbad5281d1f5/python/dgl/nn/mxnet/conv/gatedgraphconv.py

Hope this helps.

Thanks for the reply. Got it now.

Yes I read the bug issue for taking too much space for huge graphs after which you guys updated it :slight_smile:

Though for my current domain where I have small to medium (max 300 nodes 15-20 edge types) graphs i’m still using the projection matrix module as the new approach takes too long to train.

I see, may I ask you how much faster the earlier version of GatedGraphConv is compared to updated version in your domain?