Understanding dgl sampler sampler.cc

lwwlwwl · April 19, 2023, 6:08am

Hi,

I am trying to understand how the sampling stage works for node classification with graphsage, specifically C++ scripts. I am wondering which functions/files are involved. I checked the tutorial web and it seems it talked more about how to use the libraries. I want to learn how it works like how the neighborhood nodes of target nodes are selected etc.

I tried to add print statements to functions in sampler.cc, re-build from the source and run python dgl/examples/pytorch/graphsage/node_classification.py but it did not seem to work. How sampler.cc is involved in the training process?

Thanks in advance.

minjie · April 20, 2023, 1:55am

The high-level flow on the C++ side is like this:

The Python API will call into a C API called SampleNeighbors
It selects the proper format and then calls the corresponding operators. E.g., CSRRowWiseSampling
All operator implementations are under src/array/(cpu|cuda) . For example, the CPU CSRRowWiseSample operator is here. It is implemented using a template function (i.e., developer only need to provide how to get samples from each row while how rows are parallelized is handled by the template). The template definition is in the header file: dgl/rowwise_pick.h at master · dmlc/dgl · GitHub

system · May 20, 2023, 1:56am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.