Pretrained Embedding on Heterogeneous Graph

Hi, I have multiple Heterogeneous Graphs (knowledge graph), and I want to create a dataloader to generate batches.
However, I have a set of pretrained embeddings for each node and edge, I know that there is a feature which eg.nodes['drug'].data['hv'] can setup hidden values, but I am not sure if this is also suitable for the dataset of multiple graphs, I am afraid that creating these hidden values before sending into GPUs will lead to huge memory usage.
So I want to ask what is the normal of suggested solution for this situation? Thank you.

Also, if I want to return other information like labels, orders, it is still recommended to use dgl.batch?
Or should I use edgedataloader instead?
Sorry if my questions are so basic, I am new to this library.

my embedding dim is 256.

  1. For pre-trained embeddings of multiple heterogeneous graphs, if you can load them into memory at once, it will still be more efficient to do so. Otherwise you can load them separately depending on the datapoints in the batch.
  2. If labels or orders can be put in ndata or edata, dgl.batch will handle them in batching. Otherwise you need to manually batch them.
1 Like

Just to make sure, so you are saying that it is more efficient to load the embedding when creating dataloader class than using a embedding layer right?

If you want to use pre-trained embeddings, you need to load them anyway, right? It’s just whether you can load them at once or load them in batches.