How to measure the CPU neighborhood sampling time?

I think the sampling step is within dataloader as the code shows below. How can I measure the sampling time decoupled from dataloader? Looking forward to your reply :slight_smile:

sampler = dgl.dataloading.NeighborSampler([4-4], prefetch_node_feats=['feat'], prefetch_labels=['label'],
        )
train_dataloader = dgl.dataloading.DataLoader(
    # The following arguments are specific to NodeDataLoader.
    graph,              # The graph is in CPU
    train_nids,         # The node IDs to iterate over in minibatches
    sampler,            # The neighbor sampler
    device=device,      # Put the sampled MFGs on CPU or GPU
    use_ddp=True,       # Make it work with distributed data parallel
    # The following arguments are inherited from PyTorch DataLoader.
    batch_size=args.batch_size,    # Per-device batch size.
                        # The effective batch size is this number times the number of GPUs.
    shuffle=True,       # Whether to shuffle the nodes for every epoch
    drop_last=False,    # Whether to drop the last incomplete batch
    num_workers=args.num_worker       # Number of sampler processes
)
for step, (input_nodes, output_nodes, blocks) in enumerate(dataloader):
    ...

By “decoupled”, you mean you don’t want to consider multiprocessing and overlapping with computation? If so, you probably could just benchmark sampler.sample method.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.