When I test my model on a large graph with 3 million nodes and 9 million edges, it took 92 seconds to execute the forward function only once. My server has 32 E5-2620 2.10GHz CPUs, but the testing program only used one of them, not like training. I wonder if the DGLGraph’s functions like message_function, reduce_function or apply_edges can be done in parallel, by multithread or multiprocess. In addition, I think NodeFlow and Sampling is not suitable for testing. Thanks a lot!
Both pytorch and DGL built-in operator will utilize the full cpu without multi-processing/multi-threading. DGL’s built-in operator used OpenMP for parallelization and pytorch may use similar tools or MKL to do so. Therefore extra multithreading or multiprocessing may slow down the program due to some race condition.
Do you mean when testing only one core is used?
Thanks. I understand. So OpenMP can only use one cpu? It cannot use multiple cpu?
It will automatically use all cpu cores. However OpenMP is not supported on Mac OS, therefore if you are using mac, only one core is used. What do you mean by multiple cpu?
I checked it again and found it indeed used multiple cpu cores. I made some mistakes in settings. Thank you very much!