Does EdgeDataLoader input argument "batch size" refer to "number of node" in each batch?

Dear all, I found that in dgl.dataloadingEdgeDataLoader, the input arg “batch size” actually is number of nodes in each batch? Is that true? It is called edge dataloader, I thought it samples edges, why the batch size refers to the number of nodes?

Where did you find that “the input arg “batch size” actually is number of nodes in each batch”?

Thanks Mufei! I use the node neighbor sampler “dgl.sampling.sample_neighbors” in the EdgeDataLoader, so the batch sizes refers to the node batch size. I am doing an edge classification, can I use this build-in node sampler, or I need to write an edge sampler as the input arg in the EdgeDataLoader?

I don’t think this is where batch_size comes into effect. This is where batch_size comes into effect and it’s for edges. Samplers are initialized before EdgeDataLoader instances.

If you are using EdgeDataLoader, batch_size refers to the number of edges in the minibatch. You can still use the builtin node samplers; they will start from the incident nodes of the edges sampled in the minibatch.

Thanks BarclayII. Yes, you are right. In EdgeDataLoader , batch_size refers to the number of edges. I am looking at this graphsage example, in the EdgeDataLoader it uses MultiLayerNeighborSampler, which is a node sampler. My question is, how does a node sampler can be used in sampling edges?

Training GraphSAGE for link prediction involves:

  1. Sampling a bunch of edges as well as negative examples for loss computation. EdgeDataLoader itself does this.
  2. For the edges and negative examples sampled, expand the neighbors of the incident nodes (i.e. the nodes those edges and negative examples touch) for computation by the GNN. This is the resposibility of MultiLayerNeighborSampler.

Thanks BarclayII! It totally makes sense.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.