How to create own Dataset like builtin for RGCN? What is train_mask for?

eric2213 · December 8, 2022, 5:26pm

I’m wondering what are these masks for?
Does it represent the True fact and False fact? (Triplets and negative triplets?)
or is it created randomly from code like below:
g.edata['train_mask'] = torch.zeros(1000, dtype=torch.bool).bernoulli(0.6)

I picked some triplets from FB15K-237 for examples , here is how I create a heterogeneous graph, is this a proper way?

data_dict = {
    ('entity', '/travel/travel_destination/climate./travel/travel_destination_monthly_climate/month', 'entity'): (torch.tensor([0]), torch.tensor([1])),
    ('entity', '/music/performance_role/regular_performances./music/group_membership/group', 'entity'): (torch.tensor([2]), torch.tensor([3])),
    ('entity', '/location/location/contains', 'entity'): (torch.tensor([4, 4]), torch.tensor([5, 6]))
}

g = dgl.heterograph(data_dict)

with this heterograph, how do I create masks and split data into train-valid-test like builtin dataset?

不知道能不能用中文问，我的英文太差了…
我不了解mask的用意，我在其他教学有看到可用来划分训练集、验证集和测试集，但其tensor是随机产生的
g.edata['train_mask'] = torch.zeros(1000, dtype=torch.bool).bernoulli(0.6)

想请问若以原始的FB15K237数据为例

我该如何创造mask，将数据集弄得跟内建的FB15k237Dataset一样，才可以直接给RGCN里的link.py来使用？

程式新手，问的问题可能很浅白，请见谅，先感谢回覆了，谢谢！

czkkkkkk · December 21, 2022, 3:21am

Duplicate with github issue How to create own Dataset like builtin? What is train_mask for? · Issue #5001 · dmlc/dgl · GitHub.

system · January 20, 2023, 3:22am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.