Neighbor sampling and aggregation with graphbolt

nicksukie · June 30, 2024, 12:27pm

Hi there.

I’m migrating code that was originally written using DGL1.X when neighbor sampling was done with dgl.contrib.sampling.NeighborSampler.

The output of the sampled graph was a generator that generated objects of the NodeFlow type. With NodeFlow, it was possible to access graph data simply with layers[i].data. For example, below is my old code for message passing on the subgraph after neighbor sampling (which is from a documentation that apparently no longer exists; see: What's the best practice to train node embeddings for a big graph):

    def encode(self, data_flows, training=True):
        # print(data_flows)
        x = self.embeddings
        nf = next(iter(data_flows))
        nf.copy_from_parent()
        nf.layers[0].data['activation'] = x[nf.layers[0].data['feature']]
        for i, layer in enumerate(self.layers):

            h = nf.layers[i].data.pop('activation')
            h = F.dropout(h, p=self.dropout, training=training)
            nf.layers[i].data['h'] = h
            nf.block_compute(i,
                             fn.copy_src(src='h', out='m'),
                             lambda node : {'h': node.mailbox['m'].mean(dim=1)},
                             layer)

        h = nf.layers[-1].data.pop('activation')

        return h

But, now, with graphbolt, it seems obtaining the graph data and aggregating it is not so simple. next(iter(subgraphs)).sampled_subgraphs[0].sampled_csc results in a sparse matrix of csc format, but I still don’t know how to proceed with aggregation because the copy_from_parent() attribute no longer exists and 'CSCFormatBase' object has no attribute 'layers'

May I request some guidance on how to handle the output from graphbolt’s neighbor sampler for message passing on the subgraph?

Thanks in advance.

dyru · July 4, 2024, 12:06am

Here is an example of how to use graphbolt for node classification. Node Classification — DGL 2.3 documentation. The MiniBatch contains a blocks object, which includes the message flow graphs (MFGs) for message passing on the subgraph. The example also contains how to utilize the MFGs in model forward (Defining Model section).

system · August 3, 2024, 12:07am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.