NeighborSampler cannot sample all the neighbors

xjtuwgt · July 16, 2019, 9:30pm

Hi,

Thanks for the sharing of the library. I am puzzled when I use the NeighborSampler on huge graph.

If I set the batch size as 1, and set the expand_factor as 30, and neighbor_type as ‘in’, I expect that all the neighbors should be extracted if the node’s in-degree is less than 30. However, I get the following results:
Id , Sample size, predecessors generated by function predecessors(), in-degree
|6250|torch.Size([5])|torch.Size([7])|7|
|—|---|—|---|
|6251|torch.Size([15])|torch.Size([15])|15|
|6252|torch.Size([2])|torch.Size([2])|2|
|6253|torch.Size([7])|torch.Size([7])|7|
|6254|torch.Size([1])|torch.Size([1])|1|
|6255|torch.Size([3])|torch.Size([3])|3|
|6256|torch.Size([10])|torch.Size([12])|12|
|6257|torch.Size([10])|torch.Size([14])|14|
|6258|torch.Size([12])|torch.Size([13])|13|
|6259|torch.Size([3])|torch.Size([3])|3|
|6260|torch.Size([9])|torch.Size([11])|11|

We can get that for some nodes, the number of sampled neighbors is smaller than the in-degree although the expand_factor is set as 30 (larger than the number of neighbors).

Can you check this or give an explanation about this phenomenon?

Thanks for your time and help.

mufeili · July 17, 2019, 2:00am

This is weird. Just to be sure, can I take a look at all the arguments you use for NeighborSampler?

xjtuwgt · July 18, 2019, 9:32pm

Thanks for your time. By the way, how to set the random seed to guide the sampling process in DGL, I set the random seeds for (pytorch, numpy and python) as follows.
np.random.seed(1234)
** torch.manual_seed(1234)**
** random.seed(1234)**
However, I still cannot reproduce the same results.

Whether the sampling procedure depends on ‘non-python’ environment? How to guarantee the reproductivity?

mufeili · July 19, 2019, 7:06am

The sampling implementation is in C++ and currently we cannot fix it with a random seed.

BarclayII · July 26, 2019, 6:44am

The ability to set seed for C++ random sampler is implemented in this PR.

zhengda1936 · August 12, 2019, 7:45am

If you could provide your testing script, we can test it locally. Thanks