Can UVA be shareable in multi-gpu trainings?

DesmonDay · May 18, 2022, 11:12am

Hi, I have tried the multi-gpu graphsage example in dgl by using use_uva=True. However, I notice that as the number of processes and GPUs increases, the CPU memory usage also increases. I believe that is caused by UVA such things. So I wonder it is possible to make UVA shareable in multi-gpu trainings?

Here I paste the remaining CPU memory in the following table.
截屏2022-05-18 下午7.14.33

BarclayII · May 19, 2022, 3:54am

The graph itself should be shared across multiple GPUs. However, it’s very possible that each process’ DataLoader (or training function) might generate some internal structure that are not shared. So I can’t say for sure if the increase is caused by DataLoader itself or other places. I can think of several things you could try to figure out where the increase happens: (1) remove all training code but keep the DataLoader, (2) vary the graph size and see if the increase scales with the graph size.

system · June 18, 2022, 3:55am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.