How to create own Dataset like builtin for RGCN? What is train_mask for?

I’m wondering what are these masks for?
Does it represent the True fact and False fact? (Triplets and negative triplets?)
or is it created randomly from code like below:
g.edata['train_mask'] = torch.zeros(1000, dtype=torch.bool).bernoulli(0.6)

image
I picked some triplets from FB15K-237 for examples , here is how I create a heterogeneous graph, is this a proper way?

data_dict = {
    ('entity', '/travel/travel_destination/climate./travel/travel_destination_monthly_climate/month', 'entity'): (torch.tensor([0]), torch.tensor([1])),
    ('entity', '/music/performance_role/regular_performances./music/group_membership/group', 'entity'): (torch.tensor([2]), torch.tensor([3])),
    ('entity', '/location/location/contains', 'entity'): (torch.tensor([4, 4]), torch.tensor([5, 6]))
}

g = dgl.heterograph(data_dict)

with this heterograph, how do I create masks and split data into train-valid-test like builtin dataset?

不知道能不能用中文问,我的英文太差了…
我不了解mask的用意,我在其他教学有看到可用来划分训练集、验证集和测试集,但其tensor是随机产生的
g.edata['train_mask'] = torch.zeros(1000, dtype=torch.bool).bernoulli(0.6)

想请问若以原始的FB15K237数据为例
image
我该如何创造mask,将数据集弄得跟内建的FB15k237Dataset一样,才可以直接给RGCN里的link.py来使用?

程式新手,问的问题可能很浅白,请见谅,先感谢回覆了,谢谢!

Duplicate with github issue How to create own Dataset like builtin? What is train_mask for? · Issue #5001 · dmlc/dgl · GitHub.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.