RGCN exporting entity embeddings for a large graph

Hi, I am using the R-GCN model for link prediction from here - https://github.com/dmlc/dgl/tree/master/examples/pytorch/rgcn on a custom dataset (which is fairly large).
I want to get the embeddings for entities and relations in npy or any other format, so that I can conveniently reuse them later.
To export relations - I can simply use
torch.save(model.w_relation,“relations.pth”)
But if I build graph with whole training data and try to get embeddings of all entities, I get error

Traceback (most recent call last):
File “export.py”, line 172, in
main(args)
File “export.py”, line 125, in main
embed = model(test_graph, test_node_id, test_rel, test_norm)
File “/opt/conda/envs/dgl/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 550, in call
result = self.forward(*input, **kwargs)
File “/home/user/link_predict.py”, line 67, in forward
return self.rgcn.forward(g, h, r, norm)
File “/home/user/model.py”, line 47, in forward
h = layer(g, h, r, norm)
File “/opt/conda/envs/dgl/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 550, in call
result = self.forward(*input, **kwargs)
File “/opt/conda/envs/dgl/lib/python3.8/site-packages/dgl/nn/pytorch/conv/relgraphconv.py”, line 185, in forward
g.update_all(self.message_func, fn.sum(msg=‘msg’, out=‘h’))
File “/opt/conda/envs/dgl/lib/python3.8/site-packages/dgl/graph.py”, line 3238, in update_all
Runtime.run(prog)
File “/opt/conda/envs/dgl/lib/python3.8/site-packages/dgl/runtime/runtime.py”, line 11, in run
exe.run()
File “/opt/conda/envs/dgl/lib/python3.8/site-packages/dgl/runtime/ir/executor.py”, line 204, in run
udf_ret = fn_data(src_data, edge_data, dst_data)
File “/opt/conda/envs/dgl/lib/python3.8/site-packages/dgl/runtime/scheduler.py”, line 972, in _mfunc_wrapper
return mfunc(ebatch)
File “/opt/conda/envs/dgl/lib/python3.8/site-packages/dgl/nn/pytorch/conv/relgraphconv.py”, line 145, in bdd_message_func
weight = self.weight.index_select(0, edges.data[‘type’]).view(
RuntimeError: [enforce fail at CPUAllocator.cpp:64] . DefaultCPUAllocator: can’t allocate memory: you tried to allocate 116673854000 bytes. Error code 12 (Cannot allocate memory)

Any suggestions for how I can export the embeddings for entities?

Can you export them with mini-batches? As the error suggests, you simply do not have enough memory.

Can you share an example or some pointers on how it would be done? In training data, I have a set of triplets.
So, suppose my batch size is 256 for example, then I choose 256 nodes, and all edges with these 256 nodes as lhs, construct a graph and export their embeddings. Is this what you mean?

You can follow this code to do an minibatch based method to generate embeddings:

For relation embedding, you can directly dump it from your rgcn model.

2 Likes