I think the sampling step is within dataloader as the code shows below. How can I measure the sampling time decoupled from dataloader? Looking forward to your reply
sampler = dgl.dataloading.NeighborSampler([4-4], prefetch_node_feats=['feat'], prefetch_labels=['label'],
)
train_dataloader = dgl.dataloading.DataLoader(
# The following arguments are specific to NodeDataLoader.
graph, # The graph is in CPU
train_nids, # The node IDs to iterate over in minibatches
sampler, # The neighbor sampler
device=device, # Put the sampled MFGs on CPU or GPU
use_ddp=True, # Make it work with distributed data parallel
# The following arguments are inherited from PyTorch DataLoader.
batch_size=args.batch_size, # Per-device batch size.
# The effective batch size is this number times the number of GPUs.
shuffle=True, # Whether to shuffle the nodes for every epoch
drop_last=False, # Whether to drop the last incomplete batch
num_workers=args.num_worker # Number of sampler processes
)
for step, (input_nodes, output_nodes, blocks) in enumerate(dataloader):
...