Trouble Training Link Prediction on Heterograph with EdgeDataLoader

Yes. As input to the inference function, I still pass in x, which is:

source_feats = g.nodes['source'].data['source_embedding'].to(torch.device('cuda'))
user_feats = g.nodes['user'].data['user_embedding'].to(torch.device('cuda'))
node_features_for_inference = {'source': source_feats, 'user': user_feats}

@hockeybro12 How did you make your graph bidirectional? I think I am facing a similar issue

@hockeybro12 Why are you using the NodeDataLoader here if you want to do link prediction instead of the EdgeDataLoader? I thought that the edges were sampled, and not the nodes.

@hockeybro12 @BarclayII
One thing I am also unsure about is that only the first element of the blocks is selected in the code below. Since blocks is a list of DGLGraphs, why only select the first one and not iterate through all?

I use the NodeDataLoader for creating an embedding for the graph after training it with LinkPrediction. If you are doing LinkPrediction, you should use EdgeDataLoader.

Originally I had sources that follow users. I then added another edge, so users have sources that they follow. So I still have two types of nodes - sources and users. But now I have two types of edges - follow and follow source - depending on which direction.

So, in this case we are iterating through the layers and then passing the blocks. Normally, in a forward function, we would do:

def forward(self, blocks, inputs):
        x = self.conv1(blocks[0], inputs)
        x = self.conv2(blocks[1], x)

However, in the inference function we are passing it in ourselves:

for l, layer in enumerate([self.conv1, self.conv2]):
...
       for input_nodes, output_nodes, blocks in tqdm(dataloader):
                block = blocks[0].to(torch.device('cuda'))
                ...
                h = layer(block, h)

I think this is why we index it like so, but I’m not 100% sure.

1 Like

@hockeybro12 Thanks for your response :pray: . I’ve got a follow-up question here.

Did you add the edges manually by swapping the order of source and target node or is there any clever built-in way to do that?

I had to add it manually when creating the graph.

Correct.

It can, according to the statement above. However, those other nodes may not impact that node’s representation.

I built the ScorePredictor similar to yours and it looks as follows:

class ScorePredictor(nn.Module):
    def forward(
        self,
        edge_subgraph: dgl.DGLHeteroGraph,
        x: Dict[str,  torch.Tensor],
        eval_edge_type: str,
    ) -> torch.Tensor:
        """Perform score prediction only on the evaluation edge type.

        :param edge_subgraph: subgraph to be evaluated
        :param x: dictionary mapping node type  to features
        :param eval_edge_type: edge type to be evaluated
        :return: dictionary mapping edge type to the scores for the subgraph
        """
        with edge_subgraph.local_scope():
            edge_subgraph.ndata['x'] = x
         
            # only test on evaluation edge type
            edge_subgraph.apply_edges(
                dgl.function.u_dot_v('x', 'x', 'score'), etype=eval_edge_type)
            return edge_subgraph.edata['score']

Yet I am receiving an error that the data at 'x'cannot be accessed:

  File "/.../ScorePredictor.py", line 40, in forward
    dgl.function.u_dot_v('x', 'x', 'score'), etype=eval_edge_type)
  File "/.../lib/python3.7/site-packages/dgl/heterograph.py", line 4064, in apply_edges
    edata = core.invoke_gsddmm(g, func)
  File "/...lib/python3.7/site-packages/dgl/core.py", line 195, in invoke_gsddmm
    x = alldata[func.lhs][func.lhs_field]
  File "/.../lib/python3.7/site-packages/dgl/view.py", line 66, in __getitem__
    return self._graph._get_n_repr(self._ntid, self._nodes)[key]
  File "/...k/lib/python3.7/site-packages/dgl/frame.py", line 373, in __getitem__
    return self._columns[name].data
KeyError: 'x'

Do you have an idea how that can be, even though I am setting the .ndata here to x ? x here is a dictionary mapping node type to feature tensors. @BarclayII @hockeybro12

Did you try printing x at that function? Does it have the features you expect?

x here is a dictionary mapping node type to tensor :

>>> x
{'protein': tensor([[ 1.5659e+00,  7.6492e-01, -1.3036e-01,  1.0010e+00,  7.0579e-01,
          1.3227e-02,  1.3354e-01,  1.3097e-01,  2.2603e-02,  1.6489e-01,
          1.1034e+00,  4.6917e-01],
       ...,
       grad_fn=<AddBackward0>)}

which is also saved in the ndata of the edge_subgraph:

>>> edge_subgraph.ndata
{'disease': {'feature': tensor([], size=(0, 500)), '_ID': tensor([], dtype=torch.int64)}, 'drug': {'feature': tensor([], size=(0, 7467)), '_ID': tensor([], dtype=torch.int64)}, 'protein': {'feature': tensor([[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000], ..., ]), 
'_ID': tensor([327,  35, 264,  27, 211, 114, 355, 367, 108, 303,  25, 238, 188, 194,
        272, 351, 157,  63, 343, 245, 331, 167, 120]), 
'x': tensor([[ 1.5659e+00,  7.6492e-01, -1.3036e-01,  1.0010e+00,  7.0579e-01, 1.3227e-02,  1.3354e-01,  1.3097e-01,  2.2603e-02,  1.6489e-01, 1.1034e+00,  4.6917e-01], ...,
        , grad_fn=<AddBackward0>)}}

it has the shape:

>>> x['protein'].shape
 torch.Size([23, 12])

since the edge_subgraph has corresponding number of nodes:

>>> edge_subgraph
Graph(num_nodes={'disease': 0, 'drug': 0, 'protein': 23},
         ...,
     )

I am not sure if this is correct, since it is a dictionary. But when I try to just take the dictionary value of x, it throws me an error that x has to be a dictionary. I could not find the source code of dgl.function.u_dot_v(), so I am not sure if it can handle the input being a dictionary.

@hockeybro12 @BarclayII When I try to replicate the same code by replacing GraphConv with SAGEConv and norm=‘right’ with aggregation=‘mean’, I am getting some size mismatch error. Any idea on why is it not working for GraphSAGE?

Following is the code to reproduce the error

data_dict  = {('source', 'has_follower', 'user'): (torch.tensor([0, 0]), torch.tensor([0, 0])), ('user', 'follows', 'source'): (torch.tensor([0, 0]), torch.tensor([0, 0]))}
dgl_graph  = dgl.heterograph(data_dict)

eid_dict   =  {etype: dgl_graph.edges(etype=etype, form='eid') for etype in dgl_graph.canonical_etypes}

dataloader = dgl.dataloading.EdgeDataLoader(dgl_graph, eid_dict, sampler, negative_sampler = dgl.dataloading.negative_sampler.Uniform(k))

dgl_graph.nodes['source'].data['source_embedding'] = torch.zeros(1, 70)
dgl_graph.nodes['user'].data['user_embedding']     = torch.zeros(1, 80)

class TestRGCN(nn.Module):
    def __init__(self, in_feats, hid_feats, out_feats, canonical_etypes):
        super(TestRGCN, self).__init__()

        self.conv1 = dglnn.HeteroGraphConv({
                etype : dglnn.SAGEConv(in_feats[utype], hid_feats, 'mean')
                for utype, etype, vtype in canonical_etypes
                })
        self.conv2 = dglnn.HeteroGraphConv({
                etype : dglnn.SAGEConv(hid_feats, out_feats, 'mean')
                for _, etype, _ in canonical_etypes
                })

    def forward(self, blocks, inputs):
        x = self.conv1(blocks[0], inputs)
        x = self.conv2(blocks[1], x)

        return x

class HeteroScorePredictor(nn.Module):
    def forward(self, edge_subgraph, x):
        with edge_subgraph.local_scope():
            edge_subgraph.ndata['h'] = x
            for etype in edge_subgraph.canonical_etypes:
                edge_subgraph.apply_edges(dgl.function.u_dot_v('h', 'h', 'score'), etype=etype)
                # edge_subgraph.apply_edges(self.apply_edges, etype=etype)
            return edge_subgraph.edata['score']

class TestModel(nn.Module):
    # here we have a model that first computes the representation and then predicts the scores for the edges
    def __init__(self, in_features, hidden_features, out_features, canonical_etypes):
        super().__init__()
        self.sage = TestRGCN(in_features, hidden_features, out_features, canonical_etypes)
        self.pred = HeteroScorePredictor()
    def forward(self, g, neg_g, blocks, x):
        x = self.sage(blocks, x)
        pos_score = self.pred(g, x)
        neg_score = self.pred(neg_g, x)
        return pos_score, neg_score 

def compute_loss(pos_score, neg_score, canonical_etypes):
    # Margin loss
    all_losses = []
    for given_type in canonical_etypes:
        n_edges = pos_score[given_type].shape[0]
        if n_edges == 0:
            continue
        all_losses.append((1 - neg_score[given_type].view(n_edges, -1) + pos_score[given_type].unsqueeze(1)).clamp(min=0).mean())
    return torch.stack(all_losses, dim=0).mean()

model = TestModel(in_features={'source':70, 'user':80}, hidden_features=512, out_features=256, canonical_etypes=dgl_graph.canonical_etypes)
for epoch in range(5):
        for input_nodes, positive_graph, negative_graph, blocks in dataloader:
            model.train()
            print(epoch)

            node_features = {'source': blocks[0].srcdata['source_embedding']['source'], 'user': blocks[0].srcdata['user_embedding']['user']}
            pos_score, neg_score = model(positive_graph, negative_graph, blocks, node_features)
            loss = compute_loss(pos_score, neg_score, g.canonical_etypes)

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

#RuntimeError: size mismatch, m1: [1 x 80], m2: [70 x 512] at /opt/conda/conda-bld/pytorch_1595629401553/work/aten/src/TH/generic/THTensorMath.cpp:41

Seems that your source node feature and user node feature have different dimensionality. So for the first layer you will instead need to specify the source and destination node feature size separately, like this:

        self.conv1 = dglnn.HeteroGraphConv({
                etype : dglnn.SAGEConv((in_feats[utype], in_feats[vtype]), hid_feats, 'mean')
                for utype, etype, vtype in canonical_etypes
                })

The second layer will have the same feature dimension which is hid_feats so conv2 shouldn’t have any problem.

1 Like

Would it be possible to exclude HeteroGraphConv from layer 2?
As the output dimensions are going to be the same for every entity, Is there a way to directly feed the layer 1 outputs to layer 2 of SAGEConv (without HeteroGraphConv)?

In the docs, I see that the SAGEConv can be applied on the homogeneous graph and unidirectional bipartite graph but just wondering whether we have any workarounds to tackle this?

If your graph has multiple edge types then you will have to use HeteroGraphConv.

If you truly want to exclude HeteroGraphConv, you will need to make your graph have only one edge type. That being said, you might also want to keep the original edge type information as edge features in order to use different weight matrices for each SAGEConv. The RGAT implementation for OGB-LSC shows an example of operating on a homogenized heterogeneous graph.

Hello,
The whole discussion helped me solve a lot of thing. However, I need one more help. I am getting error in the main epoch loop within the model pass for loss calculation. Particularly,

pos_score = self.pred(g, x)
neg_score = self.pred(neg_g, x)

I am using a custom Data Loader, with MultiLayerNeighborSampler for pos_graph and negative_sampler.Uniform for neg_graph. However, I am getting error, Pred function is trying to set a sub graph(positive or negative) to x (which is generated for whole network) and so the mismatch.

I was able to fix it by creating custom x for pos_graph. But for neg_graph on checking documentation, I find the heterograph generated do not have node feature, which is required for Pred.
Am i doing something wrong. Or as the sampler are creating sub-graph, the feature needs to be coded to incorporate. Any example will be highly appreciated.
Thanks

Could you open a new discussion thread and include a minimal code snippet to reproduce the issue? This thread is quite old and very long.

Thanks @mufeili I figured it out. I am not sure how to close this. Will keep in mind to start a new thread if any future issue comes up.

1 Like

Sounds great. <20 character limit>