Edge softmax on DGLHeteroGraph

Hi, I have a DGLHeteroGraph with 3 different types of edges. Each edge has a feature “attn_before_softmax”. For each node, I want to do softmax on all its incoming edges (3 types together). I tried the function group_apply_edges, but it seems that it can only be applied to one type of edge at a time (i.e. it cannot do softmax on 3 types of edge together). Is there an alternative way to do that? Any suggestions or tips would be appreciated.

Hi, you can first use our to_homo api to convert the heterograph to a homo graph, then apply edge_softmax api to compute the softmax value (normalized by destination nodes) and convert the homo graph to a heterograph with our to_hetero api.

1 Like

Does to_homo store nodes and edges in the same order as ntypes and etypes? The document does not say anything about the order. I ask this question because to_homo does not copy node/edge features automatically. We need to know the order of nodes and edges in order to copy the features correctly.

And how can we copy the features from the homo graph back to the heterograph? The documentation of to_hetero says "The returned node and edge types may not necessarily be in the same order as ntypes and etypes". Suppose we want to copy the node features, does that mean for each node type, we need to use filter_nodes to get the node IDs, and then use the node IDs to retrieve the node features?

Or maybe we don’t need to do any conversion at all. We may use a function like torch_geometric.utils.softmax. Although we still need to group the values of attn_before_softmax from all edges into a single tensor and then distribute the results.

Yes, by the current implementation. The type information is stored in the dgl.NTYPE and dgl.ETYPE data field. The following demo-code shows how to copy the features across:

hg = ... # some heterograph
g = dgl.to_homo(hg)
g.ndata['h'] = th.zeros((g.number_of_nodes(), feat_size)
for ntid, nty in enumerate(hg.ntypes):
    nid = (dgl.ndata[dgl.NTYPE] == ntid).nonzero().view(-1)
    # find the node id of the original heterograph
    orig_nid = dgl.ndata[dgl.NID][nid]
    # copy features
    g.ndata['h'][nid] = hg.nodes[nty].data['h'][orig_nid]

Note that the code does not rely on whether g stores features in the same order of ntypes and etypes.

You can leverage the node/edge id mapping generated and stored in the ndata/edata.

g = dgl.graph(([0, 1, 2], [3, 4, 5]))  # a bipartite graph stored as homograph
g.ndata[dgl.NTYPE] = th.tensor([0, 0, 0, 1, 1, 1])
g.edata[dgl.ETYPE] = th.tensor([0, 0, 0])
g.ndata['feat'] = ...
hg = dgl.to_hetero(g, ['user', 'item'], ['buy'])
# copy node features
hg.nodes['user'].data['feat'] = g.ndata['feat'][hg.nodes['user'].data[dgl.NID]]
hg.nodes['item'].data['feat'] = g.ndata['feat'][hg.nodes['item'].data[dgl.NID]]

Thank you for your reply with code examples! I know how to use to_homo and to_hetero clearly now!

But before your reply, I found a solution using the PyTorch Scatter library. It does not require converting the graph and solves the problem perfectly. The idea is to use torch_scatter.composite.scatter_softmax to normalize the attention scores. The index argument of scatter_softmax can be obtained using dgl.DGLHeteroGraph.all_edges.

hg = ... # the hetero graph
# assume the hg.etypes have the same dsttype,
# otherwise need to do this separately for each dsttype.
src = []
index = []
for etype in hg.etypes:
    attn = hg[etype].edata['attn_before_softmax']
    uid, vid = hg.all_edges(form='uv', order='eid', etype=etype)
src = th.cat(src, dim=0)
index = th.cat(index, dim=0).to(src.device)
a = scatter_softmax(src, index, dim=0)

Thanks for the code reference. I’m thinking about further improving DGL’s usability based on your example. One thing we could do is to add the scatter_softmax operator into DGL if you found installing torch_scatter is a little bit headache. Another direction is to automatically copy features when a heterograph is converted to a homo graph via to_homo, something like the following:

hg = ... # the hetero graph
# assume the hg.etypes have the same dsttype,
g = dgl.to_homo(hg)
# it automatically copies and concats the 'attn_before_softmax' features for all edges.
attn_before = g.edata['attn_before_softmax']
a = dgl.edge_softmax(attn_before)

Would you like this new behavior?

My use case is to implement a model similar to Heterogeneous Graph Transformer. Currently, I use scatter_softmax to normalize the attention scores and scatter_add to aggregate the values. So I definitely support adding scatter_softmax into DGL because I don’t have to install an additional library.

But I probably won’t use the to_homo approach even if it automatically copies features. It is mainly due to the concern about efficiency. The steps specific to the scatter_softmax approach is concatenating the attention scores and vids. I don’t think these steps are more costly than converting the graph. Besides, storing features in the homo graph may consume more memory. For example, hg has two kinds of nodes, namely user nodes and item nodes. Only the item nodes have the feature 'x'. If to_homo automatically copies all features, the user nodes in the homo graph also have the feature 'x', consuming more memory.

I think the best solution is to introduce functions like edge_softmax and update_all that supports a heterograph with more than one edge type, so that we could do something like:

dgl.hetero_edge_softmax(hg, 'e', 'a')
hg.hetero_update_all(fn.u_mul_e('v', 'a', 'm'), fn.sum('m', 'rst'))