How can i implement a "node v.s. node" contrastive learning mini-batch training?

I want to implement the mini-batch training of “node v.s. node” contrastive learning between two views of the same graph ? How easy should it be to implement?

The code I’m using now is as follows, and it’s slow. I hope it can be improved.

        indices = list(range(graph.num_nodes()))
        np.random.shuffle(indices)
        shuffle_indices = torch.tensor(indices).to(graph.device)

        sampler = dgl.dataloading.MultiLayerFullNeighborSampler(num_layers=2)
        dataloader1 = dgl.dataloading.DataLoader(g1, shuffle_indices, sampler, batch_size=batch_size, drop_last=False, shuffle=False, num_workers=4)
        dataloader2 = dgl.dataloading.DataLoader(g2, shuffle_indices, sampler, batch_size=batch_size,drop_last=False, shuffle=False, num_workers=4)
        
        for x, y in zip(dataloader1, dataloader2):
            assert torch.equal(x[1], y[1])
            blocks1, blocks2 = x[-1], y[-1]
            blocks1 = [b.to(device) for b in blocks1]
            blocks2 = [b.to(device) for b in blocks2]
            
            z1 = model.embedding(blocks1, blocks1[0].srcdata['feat'])
            z2 = model.embedding(blocks2, blocks2[0].srcdata['feat'])
            assert torch.isnan(z1).sum()==0
            assert torch.isnan(z2).sum()==0
            
            loss = model.get_loss([z1, z2])
        
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            
            loss_epoch += loss.item()

Well, I am dealing with the same issue too…
I think you can use the dgl.sampler.sample_blocks and dataloader in pytorch instead of dgl.

like the code below:

class GraphLoader(object):
    def __init__(self, graph, sampler, device):
        ...
    def get_blocks(self, seed_nodes):
        seeds = {}
        for ntype, node_id in seed_nodes.items():
                seeds[ntype] = node_id.to(self.device)
        return self.sampler.sample_blocks(self.graph, seeds)

Then you can iterate the pytorch’ s dataloader to get the nodes id and sample the blocks with GraphLoader, which can be used by dgl conv model directly.

BUT, it seems to have some bugs when using the official sampler in dgl.dataloading(only MultiLayerFullNeighborSampler I guess), which causes NaN error sometimes because of zero-dimension in tensor.

It can be fixed using a custom sampler as below.

class NeighborSampler:
    def __init__(self, fanouts):
        super().__init__()
        self.fanouts = fanouts

    def sample_blocks(self, g, seed_nodes):
        out_nodes = seed_nodes
        subgs = []
        for fanout in reversed(self.fanouts):
            sg = g.sample_neighbors(seed_nodes, fanout, replace=True)
            sg = dgl.to_block(sg, seed_nodes)
            seed_nodes = sg.srcdata[dgl.NID]
            subgs.insert(0, sg)
        return seed_nodes, out_nodes, subgs
1 Like

Is there a particular paper that you are interested in implementing?

GRACE in dgl examples.
It also only implements the full graph version.

All the methods of “same-scale contrast node-level” in this survey fall into this category, and I can’t figure out how to implement them efficiently with dgl.

1 Like

I think it might be a good idea to sample subgraphs and then obtain multiple views of the same subgraph for contrastive learning. This will be different from performing neighbor sampling. Are there any papers that have implemented mini-batch version of node-node contrastive learning?

I think the mini-batch may have better performance, because the sample subgraph will cause the neighbor information of some nodes to be lost.

I looked at a few classic methods, but I haven’t found a ready-made implementation for the time being. Such implementations are widely available in CV (SimCLR, etc.), although they have no structural information at all. The dataloader in torch supports passing in dataset and transform, where transform can return multiple images (like this paper).

The problem can be extended, is how to use dgl to do mini-batch training of multi-view data?

I think the mini-batch may have better performance, because the sample subgraph will cause the neighbor information of some nodes to be lost.

That’s true. However, for large graphs they can work well from the past observations.