Hi,
I was wondering if it was possible to use a MultiLayerNeighbourSampler where the number of input
nodes stays fixed? I believe this is what is done in the original GraphSAGE paper: “In this work, we uniformly sample a fixed-size set of neighbors, instead of using full neighborhood sets in Algorithm 1, in order to keep the computational footprint of each batch fixed.”
Currently, the number of input nodes can change for different minibatches with MultiLayerNeighbourSampler:
import dgl
from dgl.data import citation_graph as citegrh
import numpy as np
data = citegrh.load_cora()
graph = data[0]
adj = graph.adj(scipy_fmt='coo')
graph = dgl.graph((adj.row, adj.col)).to('cuda')
train_mask = torch.BoolTensor(data.train_mask)
sampler = MultiLayerNeighborSamplerReplace([3, 3])
train_nids = (torch.arange(0, graph.number_of_nodes())[train_mask]).to('cuda')
dataloader = dgl.dataloading.DataLoader(
graph, train_nids, sampler,
batch_size=32,
shuffle=True,
drop_last=False,
num_workers=0)
loader_iter = iter(dataloader)
input_nodes, output_nodes, mfgs = next(loader_iter)
print(len(input_nodes)) # 258
input_nodes, output_nodes, mfgs = next(loader_iter)
print(len(input_nodes)) # 222