Pretrained Embedding on Heterogeneous Graph

Hibb-bb · January 7, 2021, 3:03am

Hi, I have multiple Heterogeneous Graphs (knowledge graph), and I want to create a dataloader to generate batches.
However, I have a set of pretrained embeddings for each node and edge, I know that there is a feature which eg.nodes['drug'].data['hv'] can setup hidden values, but I am not sure if this is also suitable for the dataset of multiple graphs, I am afraid that creating these hidden values before sending into GPUs will lead to huge memory usage.
So I want to ask what is the normal of suggested solution for this situation? Thank you.

Hibb-bb · January 7, 2021, 3:11am

Also, if I want to return other information like labels, orders, it is still recommended to use dgl.batch?
Or should I use edgedataloader instead?
Sorry if my questions are so basic, I am new to this library.

Hibb-bb · January 7, 2021, 3:18am

my embedding dim is 256.

mufeili · January 7, 2021, 5:16am

For pre-trained embeddings of multiple heterogeneous graphs, if you can load them into memory at once, it will still be more efficient to do so. Otherwise you can load them separately depending on the datapoints in the batch.
If labels or orders can be put in ndata or edata, dgl.batch will handle them in batching. Otherwise you need to manually batch them.

Hibb-bb · January 7, 2021, 5:27am

Just to make sure, so you are saying that it is more efficient to load the embedding when creating dataloader class than using a embedding layer right?

mufeili · January 7, 2021, 8:23am

If you want to use pre-trained embeddings, you need to load them anyway, right? It’s just whether you can load them at once or load them in batches.

system · February 6, 2021, 8:24am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.