Weird behavior for HeteroGraph EdgeLoader

I am using the edge loader to load a toy heterograph. Sometimes I see a Block like this

Block(num_src_nodes={'image': 3, 'tag': 2},
      num_dst_nodes={'image': 3, 'tag': 2},
      num_edges={('image', 'hasTag', 'tag'): 0, ('image', 'similarTo', 'image'): 2, ('image', 'similarToReverse', 'image'): 3, ('tag', 'hasImage', 'image'): 1},
      metagraph=[('image', 'tag', 'hasTag'), ('image', 'image', 'similarTo'), ('image', 'image', 'similarToReverse'), ('tag', 'image', 'hasImage')])

As you see, num_dst_nodes['tag'] = 2, but there is no edge pointing to a tag according to num_edges. I have attached the relevant code.

Relevant code

Graph -

graph_data = {
        ('image', 'similarTo', 'image'): (th.tensor([0, 1, 2]), th.tensor([1, 0, 0])),
        ('image', 'similarToReverse', 'image'): (th.tensor([1, 0, 0]), th.tensor([0, 1, 2])),
        ('image', 'hasTag', 'tag'): (th.tensor([0, 1, 2, 2]), th.tensor([0, 1, 0, 1])),
        ('tag', 'hasImage', 'image'): (th.tensor([0, 1, 0, 1]), th.tensor([0, 1, 2, 2]))

Dataloader -

    train_seeds = {
        'similarTo': th.arange(3),
        'similarToReverse': th.arange(3),
        'hasTag': th.arange(4),
        'hasImage': th.arange(4)
    sampler = dgl.dataloading.MultiLayerNeighborSampler(
        [1, 1])
    train_dataloader = dgl.dataloading.EdgeDataLoader(
        g, train_seeds, sampler, exclude='reverse_types',
            'similarTo': 'similarToReverse',
            'similarToReverse': 'similarTo',
            'hasTag': 'hasImage',
            'hasImage': 'hasTag'

Can you help me figuring out the error? I’m happy to provide a minimal script to demonstrate this if needed.

This is intended because the destination nodes are always included in the source nodes, no matter whether an edge connecting to those destination nodes exist.

The reason is that some GNN models require combining the node’s own representation and the neighbor aggregation separately (e.g. GraphSAGE).

1 Like

Thanks for the quick reply. But I’m not completely sure I get it. Could you clarify your first sentence a bit more, perhaps with a small example?

Let me explain my use case here -
I’m trying to run heterogenous graphSAGE and I run into the given error at this line(KeyError: 'h_neigh') -

for ntype in graph.dsttypes:
    if graph.num_dst_nodes(ntype) > 0:
        graph.dstnodes[ntype].data['h'] = graph.dstnodes[ntype].data['h_neigh'] + self.fc_self[ntype](dst_inputs[ntype]) # Error

(If I understand correctly, and please correct me if I’m wrong here, but since there is no image->tag edge, the 'h_neigh' for the tag nodes isn’t updated during message passing, which results in the error)
Could you explain what I’m expected to do in this case?
Thanks a lot.

Say if you have a graph that has two tag nodes which have no images connecting to it. If your model is GraphSAGE, it will require combining the representation of (1) the tag nodes’ own features from the previous layer, and (2) the neighboring images’ feature aggregation. So for simplicity we included the two tag nodes even if they have no edges connecting towards them.

A possible solution is to first assign zero features for each dstnode type.