R-GCN run time error during evaluation

Hi,

I get the following error when running R-GCN with FB15k-237. It happens during evaluation. I reduced the eval batch size but still after 2000 epochs it fails with this run time error.

File “/home/saatviga_sudhahar/workspace/dgl/examples/pytorch/rgcn/utils.py”, line 210, in calc_mrr
ranks_s = perturb_and_get_rank(embedding, w, o, r, s, test_size, eval_bz)
File “/home/saatviga_sudhahar/workspace/dgl/examples/pytorch/rgcn/utils.py”, line 193, in perturb_and_get_rank
out_prod = torch.bmm(emb_ar, emb_c) # size D x E x V
RuntimeError: [enforce fail at CPUAllocator.cpp:64] . DefaultCPUAllocator: can’t allocate memory: you tried to allocate 3722496000 bytes. Error code 12 (Cannot allocate memory)

Has anyone faced the same issue?

Did you train your model on CPU or GPU?

hi. I have the same question
so how could I use mini-batch run during evaluation?
or Can I use the GPU for evaluation?

I presume the model is trained with mini-batch training on GPU. For evaluation, you can adapt the training code for mini-batch evaluation. For concerns about randomness in sampling, you can perform evaluation for multiple times and compute mean/std of the results.

so, what does it means? :grinning:
" For evaluation, you can adapt the training code for mini-batch evaluation."
I didn’t find this related code to adapt the training code for mini-batch evaluation in R-GCN prediction code.

We are writing example codes of R-GCN link prediction on heterograph using mini-batch training and evalution right now. We will release a PR this week.

thank you so much : )
You did help me a lot :blush:

Hi, I also have the same issue of lack of memory for evaluation. I wonder have you released the example codes of mini-batch evaluation yet? Thanks!

Here is an example code based on neighbor sampling: https://github.com/classicsong/dgl/blob/rgcn-homo-to-homo/examples/pytorch/rgcn/link_predict_hetero_mb.py

It is not merged yet, I will finalize it and merge it into DGL.