Doubt in positive pair graph returned by the DataLoader

I created the below heterogeneous graph and used the DataLoader to get mini-batches. But, the resultant graph contains the edge which is not in the original graph. Can you help me identify what causing this?

Graph:

Code: Use the below code to reproduce the issue

import dgl
import torch

data_dict = {
    ('user', 'watches', 'movie'): (torch.tensor([0, 0, 1, 2]), torch.tensor([0, 1, 0, 1])),
    ('director', 'directs', 'movie'): (torch.tensor([0, 1]), torch.tensor([1, 0]))
}

graph = dgl.heterograph(data_dict)

train_eids = {
    ('director', 'directs', 'movie'): torch.tensor([0, 1]),
    ('user', 'watches', 'movie'): torch.tensor([0, 1, 2, 3])
}

sampler = dgl.dataloading.as_edge_prediction_sampler(
    dgl.dataloading.NeighborSampler([5, 5])
)
dataloader = dgl.dataloading.DataLoader(
    graph, 
    train_eids,
    sampler,
    batch_size=2,
)

input_nodes, pos_pair_graph, blocks = next(iter(dataloader))
for etype in pos_pair_graph.etypes:
    print(etype, pos_pair_graph.edges(etype=etype))

# directs (tensor([0, 1]), tensor([0, 1])) 
# - director_0 is never connected to movie_0 in the original graph, similarly, director_1 is never connected to movie_1 in the original graph (It should be vice-versa)
# watches (tensor([], dtype=torch.int64), tensor([], dtype=torch.int64))

Hi @Sriharsha, the sampled graph use its own node ID to represent edges, so it doesn’t mean [0, 0] and [1,1] edges in original graphs. To remapping back, try this code:

b = blocks[0]
u, v = pos_pair_graph.edges(etype=etype)
u_type, _, v_type = pos_pair_graph.to_canonical_etype(etype)
original_u = [b.srcdata[dgl.NID][u_type][x] for x in u]
original_v = [b.dstdata[dgl.NID][v_type][y] for y in v]
original_u = torch.stack(original_u)
original_v = torch.stack(original_v)

Thanks for the reply. I didn’t pay attention to the node id’s shuffling :slight_smile:

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.