DGLHeteroGraph: How to add new edge types, especially reverse edges?

Suppose we have a HeteroGraph G with the following relations with directed edges.

(‘author’, ‘affiliated_with’, ‘institution’), (‘author’, ‘writes’, ‘paper’), (‘paper’, ‘cites’, ‘paper’), (‘paper’, ‘has_topic’, ‘field_of_study’)

I want to add a reverse relation (‘paper’, ‘written_by’, ‘author’), however, I’m limited to the following procedures:

rel = G.edge_type_subgraph([“writes”])
G.add_edges(*rel.reverse().all_edges(), etype=(‘paper’, ‘written_by’, ‘author’))

This would result in the error “DGLError: Edge type “(‘paper’, ‘written_by’, ‘author’)” does not exist.”, making it impossible to add a new edge type to the metagraph.

This procedure doesn’t solve the problem either:

dgl.transform.add_reverse_edges(G, ignore_bipartite=True)

Is there a way to add a new canonical edge type to an DGL HeteroGraph without instantiating a new one?

Currently, adding an edge type after creating a heterogeneous graph is not supported and you need to create a new graph from scratch.

Can we bypass this limitation by adding to the G.canonical_etypes list?

No, you can’t. This is also related to the internal storage of graphs and requires a lot of hacking other than modifying canonical_etypes.

1 Like

Hi again @mufeili, I’ve been still running into problems with trying to add reverse metapaths to a DGLHeteroGraph. Here is my code for creating a new graph from scratch (with the reverse metapaths added), but there are still bugs when trying to dgl.sampling.sample_neighbors() or message passing on the new graph.

    def add_reverse_hetero(g):
        relations = {}
        for metapath in g.canonical_etypes:
            # Original edges
            src, dst = g.all_edges(etype=metapath[1])
            relations[metapath] = (src, dst)

            reverse_metapath = metapath[::-1]
            reverse_metapath[1] = reverse_metapath[1] + "_by"  # Reverse metapath looks like ("paper", "written_by", "author")
            assert reverse_metapath not in relations
            relations[reverse_metapath] = (dst, src)           # Reverse edges

        new_g = dgl.heterograph(relations, num_nodes_dict=num_nodes_dict)

        # copy_ndata:
        for ntype in g.ntypes:
            for k, v in g.nodes[ntype].data.items():
                new_g.nodes[ntype].data[k] = v.detach().clone()

    new_g = add_reverse_hetero(g)

When trying to run dgl.sampling.sample_neighbors() on new_g, if it tries to traverse on the reverse_metapath edge types, it would simply crash the Python run time kernel without giving any error message.

I suspected it might be some issues with copying of the (dst, src), but the problem still persists even when I use src.detach().copy(). I’ve also tried this:

eids = []
for metapath in new_g.canonical_etypes:
    eid = F.copy_to(F.arange(0, new_g.number_of_edges(metapath)), new_g.device)
    eids.append(eid)
edge_frames = utils.extract_edge_subframes(new_g, eids, store_ids=True)
utils.set_new_frames(new_g, edge_frames=edge_frames)

What would be the proper way to instantiate a new DGLHeteroGraph without running into these issues? It’s expected that the constructor dgl.heterograph() should initialize a new graph object given a edge_index_dict.

I ran your code and it seems to work fine for me:

import dgl
import torch

g = dgl.heterograph({
    ('A', 'AB', 'B'): ([0, 1, 2], [2, 1, 3]),
    ('A', 'AA', 'A'): ([2, 2, 3, 4], [1, 2, 1, 0])})
def add_reverse_hetero(g):
    relations = {}
    num_nodes_dict = {ntype: g.num_nodes(ntype) for ntype in g.ntypes}
    for metapath in g.canonical_etypes:
        # Original edges
        src, dst = g.all_edges(etype=metapath[1])
        relations[metapath] = (src, dst)

        reverse_metapath = (metapath[2], metapath[1] + '_by', metapath[0])
        relations[reverse_metapath] = (dst, src)           # Reverse edges

    new_g = dgl.heterograph(relations, num_nodes_dict=num_nodes_dict)

    # copy_ndata:
    for ntype in g.ntypes:
        for k, v in g.nodes[ntype].data.items():
            new_g.nodes[ntype].data[k] = v.detach().clone()

    return new_g
new_g = add_reverse_hetero(g)

dgl.sampling.sample_neighbors(
    new_g,
    {'A': [1, 2], 'B': [2]},
    {'AA': 1, 'AB': 1, 'AA_by': 2, 'AB_by': 2})

What is your DGL/PyTorch/CUDA version?

Hi @BarclayII, sorry I forgot a few important details to replicate the bug.

Firstly, I was trying to run sample_neighbors with edge-wise importance sampling by assigning a computed "prob" to each edge of each metapath. It seems that the crash will happen only when assigning "prob" on the edges (of the reverse metapaths) based on the src node ids. My code is:

from dgl.dataloading import BlockSampler

class ImportanceSampler(BlockSampler):
    def __init__(self, fanouts, metapaths, degree_counts: dict, edge_dir="in", return_eids=False):
        super().__init__(len(fanouts), return_eids)

        self.fanouts = fanouts
        self.edge_dir = edge_dir
        # degree_counts is a Dict[(nid, metapath, ntype), int]
        self.degree_counts = defaultdict(lambda: 1.0, degree_counts) # defaults to 1.0 if node not found
        self.metapaths = metapaths

    def sample_frontier(self, block_id, g: dgl.DGLGraph, seed_nodes: Dict[str, torch.Tensor]):
        fanouts = self.fanouts[block_id]

        if self.edge_dir == "in":
            sg = dgl.in_subgraph(g, seed_nodes)
        elif self.edge_dir == "out":
            sg = dgl.out_subgraph(g, seed_nodes)

        # Assign edge prob's for each metapath
        for metapath in sg.canonical_etypes:
            ntype = metapath[0]
            src, dst = sg.edges(etype=metapath)

            # This crashes if metapath is a reverse metapath
            edge_weight = src.apply_(
                lambda nid: self.degree_counts[(nid, metapath, ntype)] # assign edge prob by src node degree
            ).to(torch.float)

            # This works
            edge_weight = src.detach().clone().apply_(
                lambda nid: self.degree_counts[(nid, metapath, ntype)] # assign edge prob by src node degree
            ).to(torch.float)

            sg.edges[metapath].data["prob"] = torch.sqrt(1.0 / edge_weights)

        frontier = sample_neighbors(g=sg,
                                    nodes=seed_nodes,
                                    prob="prob",
                                    fanout=fanouts,
                                    edge_dir=self.edge_dir)

        print("n_nodes", sum([k.size(0) for k in seed_nodes.values()]),
              "fanouts", {k[1]: v for k, v in fanouts.items()} if isinstance(fanouts, dict) else fanouts,
              "edges", frontier.num_edges(),
              "pruned", sg.num_edges() - frontier.num_edges())

        return frontier

Thanks for your help!

DGL: 0.6.0.post1 (dgl-cu110)
CUDA: cu110
PyTorch: 1.8.0, 1.7.1

apply_ modifies the contents in-place. src and dst will directly return the contents in the graph. So src.apply_ will essentially change the contents in the graph which will cause an error. You will need to either clone it:

src.clone().apply_()

or just create a brand new edge_weight tensor.

1 Like

Thanks @BarclayII . I think my question is answered.

Now these are loaded questions, but do you know of any ways to

  1. Merge multiple DGLBlock’s (MFG) into one subgraph? I read somewhere that dgl.merge will be implemented soon.
  2. Adding new edges (of a new etype) to a block in an ad-hoc manner?

These features are important to me because I’m working on extracting higher-order relationships into new metapaths, but there’s currently no way to accomplish that in DGL.

If you could point me to some backend API guides, I’d gladly try to implement them :slightly_smiling_face:

There is a function called dgl.block_to_graph that converts a MFG to an ordinary (bipartite) graph. You can then fiddle it with whatever you like.