Load data from text file

Hello,

For doing link prediction, I know that we can load FB15k or wn18 or FB15k-237 KG by using load data function in dgl/contrib/data.

But, how about if I have a Toy Example Knowledge Graph (in the same format with FB15k) and I want to load this Toy Example KG to DGL. Do I need to modify the code in dgl/contrib/data/__init.py or is there any other data loading function that I am not aware of?

Thank you.

@lingfan could you please help see whether the data loader can support that?

Also, we find there are many requests about loading custom data. Our current plan is to define our own data format and provide efficient data loader for it. Thus in the future, you may expect to convert your custom data to our data format first (we will provide detailed tutorial and doc on how to do it), and then use some data loader to feed to memory.

1 Like

Hi @bagindokemas ,

For the link prediction dataset, the preprocessing code is actually quite clean. And if your own dataset is organized in the same way as RGCN authors, I don’t think you need to make any change to preprocessing code. (But you do need to add your dataset name to dgl/contrib/data/init.py).

You can check out the short load function here to get a better idea how the dataset is loaded.

Hi @lingfan,

I have added my dataset name to dgl/contrib/data/init.py . But since my dataset does not listed in https://s3.us-east-2.amazonaws.com/dgl.ai/dataset/ , I got this error message when I want to load my data:
Namespace(dataset=‘ToyKG’)

Downloading /…/.dgl/ToyKG.tar.gz from https://s3.us-east-2.amazonaws.com/dgl.ai/dataset/ToyKG.tgz

download failed, retrying, 4 attempts left
Downloading /…/.dgl/ToyKG.tar.gz from https://s3.us-east-2.amazonaws.com/dgl.ai/dataset/ToyKG.tgz
download failed, retrying, 3 attempts left
Downloading /…/.dgl/ToyKG.tar.gz from https://s3.us-east-2.amazonaws.com/dgl.ai/dataset/ToyKG.tgz
download failed, retrying, 2 attempts left
Downloading /…/.dgl/ToyKG.tar.gz from https://s3.us-east-2.amazonaws.com/dgl.ai/dataset/ToyKG.tgz
download failed, retrying, 1 attempt left
Downloading /…/.dgl/ToyKG.tar.gz from https://s3.us-east-2.amazonaws.com/dgl.ai/dataset/ToyKG.tgz
Traceback (most recent call last):
File “KGIntoMatrix.py”, line 22, in
build_graph(args.dataset)
File “KGIntoMatrix.py”, line 6, in build_graph
data = load_data(stringData)
File “/…/.local/lib/python3.6/site-packages/dgl/contrib/data/init.py”, line 8, in load_data
return knwlgrh.load_link(dataset)
File “/…/.local/lib/python3.6/site-packages/dgl/contrib/data/knowledge_graph.py”, line 203, in load_link
data = RGCNLinkDataset(dataset)
File “/…/.local/lib/python3.6/site-packages/dgl/contrib/data/knowledge_graph.py”, line 175, in init
download(_downlaod_prefix + ‘{}.tgz’.format(self.name), tgz_path)
File “/…/.local/lib/python3.6/site-packages/dgl/data/utils.py”, line 97, in download
raise e
File “/…/.local/lib/python3.6/site-packages/dgl/data/utils.py”, line 83, in download
raise RuntimeError(“Failed downloading url %s”%url)
RuntimeError: Failed downloading url https://s3.us-east-2.amazonaws.com/dgl.ai/dataset/ToyKG.tgz

Is there any way to tell utils.py for downloading the dataset from a specific file path on my server?

Best Regards.

Hi @bagindokemas,

Currently, there isn’t an argument for you to configure the file path. But you can always tweak the code locally by yourself. Check out the __init__ function here: https://github.com/dmlc/dgl/blob/master/python/dgl/contrib/data/knowledge_graph.py#L84