Asking for a problem related to data transform which I discovered in dgl example of graphsage

yichuan520030910320 · January 21, 2023, 9:03am

train_idx = dataset.train_idx.to(device)
val_idx = dataset.val_idx.to(device)

before trainig (device is GPU) . If that means the GPU can accommodate all of the dataset. so I think it is unnecessary for us to use minibatch later

for it, (input_nodes, output_nodes, blocks) in enumerate(train_dataloader):

becuase our GPU is large enough

and find there is a line of code

blocks = [b.to(torch.device(‘cuda’)) for b in blocks]

in the loop body of minibatch

so I wonder which one is correct

because I want to find an example that cater to the process of

take a mini-batch of training nodes and sample a subgraph from these nodes
gather node / edge features used by the subgraph from CPU memory and
copy into GPU
GNN model forward & backward & weight updates

also If anyone gan give me an example of a totally large dataset that A100 can not accommodate, thanks a lot

dyru · January 28, 2023, 2:23am

For the first question, train_idx and val_idx are just node indices. Full-graph training requires the storage of features and intermediate hidden states, which consume much larger memory. Besides, even if your GPU capacity is enough for full-graph training, mini-batch training is still worthy of attempts. The stochastic training may bring faster convergence and better performance.
Please find some large-scale datasets here.

yichuan520030910320 · January 30, 2023, 5:49pm

Thanks for your reply!

system · March 1, 2023, 5:49pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.