How do I combine GATConv and NNConv with builin function?

In the source code, GATConv is implemented by using fn.u_mul_e, where e is the attention weight. NNConv also uses fn.u_mul_e, where e is the parameter W. I found that there is a fundamental confilct between them, since you could only pass one fn.u_mul_e to fn.update_all (right?).

Possible solution:
g.send(message) + g.update_all(fn.u_mul_e(), fn.sum())

What I prefer:
g.update_all([fn.u_mul_e(), fn.u_mul_e()], fn.sum()) or
g.update_all(fn.u_mul_e(), fn.sum_with_weight())

What do you think?

I don’t think I completely get it. Maybe you can write down the equations with the variables available by the time of calling update_all. But we already support something like:

import dgl
import dgl.function as fn
import torch

g = dgl.DGLGraph([[0, 1], [1, 0]])
g.ndata['h'] = torch.tensor([[1.], [2.]])
g.edata['e1'] = torch.tensor([[1.], [1.]])
g.edata['e2'] = torch.tensor([[2.], [2.]])

g.update_all([fn.u_mul_e('h', 'e1', 'm1'), fn.u_mul_e('h', 'e2', 'm2')], [fn.sum('m1', 'h1'), fn.sum('m2', 'h2')])

{‘h’: tensor([[1.], [2.]]),
‘h1’: tensor([[2.], [1.]]),
‘h2’: tensor([[4.], [2.]])}

Thanks for your reply. I think I just came into a very paticular situation,but it should be normal that you cannot always use build-in functions to implement every model. The fomular of which the model I intended to build is
z_i^{\left( l+1 \right)}=\text{edge_nn}\left( e\_feat \right) h_j^{\left( l \right)} \\ h_i^{\left( l+1 \right)}=\sum{a^Tz_i^{(l+1)}z_i^{\left( l+1 \right)}},
where a is an attenion vector.

g.edata['w'] = edge_nn(e_feats).view(-1, in_feats, out_feats)
def reducer(nodes):
    attn = nodes.mailbox['m'].matmul(a)
    h = (attn*nodes.mailbox['m']).sum(dim=1)
    return {'h':h}
g.update(fn.u_mul_e('h', 'w', 'z'), reducer)

If we say a model is ‘proper’ if it could be implemented using only build-in functions. I had a weird thought that combining two ‘proper’ model should also give me a ‘proper’ model, which is simply not true.

How about the proposal below:

g.edata['w'] = edge_nn(e_feats).view(-1, in_feats, out_feats)
g.apply_edges(fn.u_mul_e('h', 'w', 'z'))
g.edata['e'] = g.edata['a'] * g.edata['z'] # Attention stored under 'a'
g.update_all(fn.copy_e('e', 'm'), fn.sum('m', 'h'))

Thanks for your proposal. I have thought about it but I am not sure if it is still fused message passing when you use g.apply_edges(fn.u_mul_e('h', 'w', 'z'))?

Since we need to explicitly modify the edge data, which is then used as messages, you don’t benefit much in terms of memory. I’m not sure about the speed, probably not very bad. Maybe you can perform some benchmark.

Yeah I understand. But I am still wondering if it is a way out to add a reduce function fn.sum_with_weight('h', 'w') or just fn.sum('h', 'w') ? If this holds then there is no conflict between NNConv and GAT, and maybe we could have fused message passing based on it?

I assume you are suggesting something like

g.update_all([fn.copy_e('a', 'm_a'), fn.copy_e('z', 'm_z')], fn.sum('m_a', 'm_z', 'h'))


g.edata['e'] = g.edata['a'] * g.edata['z'] # Attention stored under 'a'
g.update_all(fn.copy_e('e', 'm'), fn.sum('m', 'h'))

But I think they are equivalent and probably not bring much gain.

I think you missed the fn.u_mul_e('h', 'w', 'z'), and I assume that it would greatly improve performance if we can plug it in g.update_all() rather than apply_edges, which means z goes to mailbox instead of edata. For example, if I want to implement something like:

g.update_all([fn.u_mul_e('h', 'w', 'z'), fn.u_mul_e('h', 'a', 'g')], fn.sum('z', 'g', 'h'))
#'a' is a vector and 'g' represents gate

The benefits are neat coding and fused message passing.

ccing @minjie and see if he can bring more insights.

1 Like