Hi,
I am trying to build a recommender system using GNN and DGL. I have a graph with 3 types of nodes (user, item and sport), and 6 types of edges.
I train my model without problem, using EdgeDataLoader. However, when I want to do inference (i.e. compute embeddings for all nodes in my test set), I run into a size mismatch. I use the NodeDataLoader. It seems that the number of ‘user’ output nodes is not the same as the number or ‘user’ dst_nodes in the last block of my dataloader.
valid_users, _ = g.find_edges(valid_eids, etype=etype)
valid_items = np.arange(g.number_of_nodes('item'))
valid_nids = {'user':valid_users.numpy(),'item':valid_items}
sampler = dgl.dataloading.MultiLayerFullNeighborSampler(2)
dataloader_test = dgl.dataloading.NodeDataLoader(g, valid_nids, sampler,
batch_size=32, shuffle=True, drop_last=False, num_workers=0)
for input_nodes, output_nodes, blocks in dataloader_test:
print(blocks[1].num_dst_nodes('user'))
print(output_nodes['user'].shape[0])
Output:
19
20
Thus, when I try to do message reducing e.g. fn.mean, there is the following error:
DGLError: Expected data to have 20 rows, got 19.
Why is there a mismatch between the number of dst_nodes of the last block, and the number of output nodes? Any resources could help here.
Thanks a lot in advance!