Does EdgeDataLoader input argument "batch size" refer to "number of node" in each batch?

hellomirro · January 19, 2021, 4:56pm

Dear all, I found that in dgl.dataloadingEdgeDataLoader, the input arg “batch size” actually is number of nodes in each batch? Is that true? It is called edge dataloader, I thought it samples edges, why the batch size refers to the number of nodes?

mufeili · January 19, 2021, 5:37pm

Where did you find that “the input arg “batch size” actually is number of nodes in each batch”?

hellomirro · January 19, 2021, 6:09pm

Thanks Mufei! I use the node neighbor sampler “dgl.sampling.sample_neighbors” in the EdgeDataLoader, so the batch sizes refers to the node batch size. I am doing an edge classification, can I use this build-in node sampler, or I need to write an edge sampler as the input arg in the EdgeDataLoader?

mufeili · January 20, 2021, 2:27am

I don’t think this is where batch_size comes into effect. This is where batch_size comes into effect and it’s for edges. Samplers are initialized before EdgeDataLoader instances.

BarclayII · January 25, 2021, 7:15am

If you are using EdgeDataLoader, batch_size refers to the number of edges in the minibatch. You can still use the builtin node samplers; they will start from the incident nodes of the edges sampled in the minibatch.

hellomirro · February 1, 2021, 5:41am

Thanks BarclayII. Yes, you are right. In EdgeDataLoader , batch_size refers to the number of edges. I am looking at this graphsage example, in the EdgeDataLoader it uses MultiLayerNeighborSampler, which is a node sampler. My question is, how does a node sampler can be used in sampling edges?

BarclayII · February 1, 2021, 5:52am

Training GraphSAGE for link prediction involves:

Sampling a bunch of edges as well as negative examples for loss computation. EdgeDataLoader itself does this.
For the edges and negative examples sampled, expand the neighbors of the incident nodes (i.e. the nodes those edges and negative examples touch) for computation by the GNN. This is the resposibility of MultiLayerNeighborSampler.

hellomirro · February 1, 2021, 5:40pm

Thanks BarclayII! It totally makes sense.

system · March 3, 2021, 5:40pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.