NeighborSampler cannot sample all the neighbors

Hi,

Thanks for the sharing of the library. I am puzzled when I use the NeighborSampler on huge graph.

If I set the batch size as 1, and set the expand_factor as 30, and neighbor_type as ‘in’, I expect that all the neighbors should be extracted if the node’s in-degree is less than 30. However, I get the following results:
Id , Sample size, predecessors generated by function predecessors(), in-degree
|6250|torch.Size([5])|torch.Size([7])|7|
|—|---|—|---|
|6251|torch.Size([15])|torch.Size([15])|15|
|6252|torch.Size([2])|torch.Size([2])|2|
|6253|torch.Size([7])|torch.Size([7])|7|
|6254|torch.Size([1])|torch.Size([1])|1|
|6255|torch.Size([3])|torch.Size([3])|3|
|6256|torch.Size([10])|torch.Size([12])|12|
|6257|torch.Size([10])|torch.Size([14])|14|
|6258|torch.Size([12])|torch.Size([13])|13|
|6259|torch.Size([3])|torch.Size([3])|3|
|6260|torch.Size([9])|torch.Size([11])|11|

We can get that for some nodes, the number of sampled neighbors is smaller than the in-degree although the expand_factor is set as 30 (larger than the number of neighbors).

Can you check this or give an explanation about this phenomenon?

Thanks for your time and help.

This is weird. Just to be sure, can I take a look at all the arguments you use for NeighborSampler?

Thanks for your time. By the way, how to set the random seed to guide the sampling process in DGL, I set the random seeds for (pytorch, numpy and python) as follows.
np.random.seed(1234)
** torch.manual_seed(1234)**
** random.seed(1234)**
However, I still cannot reproduce the same results.

Whether the sampling procedure depends on ‘non-python’ environment? How to guarantee the reproductivity?

The sampling implementation is in C++ and currently we cannot fix it with a random seed.

The ability to set seed for C++ random sampler is implemented in this PR.

If you could provide your testing script, we can test it locally. Thanks