R-GCN run time error during evaluation

Saatviga · December 2, 2019, 11:16am

Hi,

I get the following error when running R-GCN with FB15k-237. It happens during evaluation. I reduced the eval batch size but still after 2000 epochs it fails with this run time error.

File “/home/saatviga_sudhahar/workspace/dgl/examples/pytorch/rgcn/utils.py”, line 210, in calc_mrr
ranks_s = perturb_and_get_rank(embedding, w, o, r, s, test_size, eval_bz)
File “/home/saatviga_sudhahar/workspace/dgl/examples/pytorch/rgcn/utils.py”, line 193, in perturb_and_get_rank
out_prod = torch.bmm(emb_ar, emb_c) # size D x E x V
RuntimeError: [enforce fail at CPUAllocator.cpp:64] . DefaultCPUAllocator: can’t allocate memory: you tried to allocate 3722496000 bytes. Error code 12 (Cannot allocate memory)

Has anyone faced the same issue?

mufeili · December 3, 2019, 3:04am

Did you train your model on CPU or GPU?

hackerchenzhuo · March 20, 2020, 9:55am

hi. I have the same question
so how could I use mini-batch run during evaluation?
or Can I use the GPU for evaluation?

mufeili · March 20, 2020, 5:45pm

I presume the model is trained with mini-batch training on GPU. For evaluation, you can adapt the training code for mini-batch evaluation. For concerns about randomness in sampling, you can perform evaluation for multiple times and compute mean/std of the results.

hackerchenzhuo · March 20, 2020, 6:06pm

so, what does it means?
" For evaluation, you can adapt the training code for mini-batch evaluation."
I didn’t find this related code to adapt the training code for mini-batch evaluation in R-GCN prediction code.

classicsong · March 23, 2020, 7:25am

We are writing example codes of R-GCN link prediction on heterograph using mini-batch training and evalution right now. We will release a PR this week.

hackerchenzhuo · March 23, 2020, 8:57am

thank you so much : )
You did help me a lot

JingyiChen1996 · May 5, 2020, 7:56pm

Hi, I also have the same issue of lack of memory for evaluation. I wonder have you released the example codes of mini-batch evaluation yet? Thanks!

classicsong · May 11, 2020, 7:02am

Here is an example code based on neighbor sampling: https://github.com/classicsong/dgl/blob/rgcn-homo-to-homo/examples/pytorch/rgcn/link_predict_hetero_mb.py

It is not merged yet, I will finalize it and merge it into DGL.