Large graphs issues and questions

gudeh · May 3, 2023, 7:49pm

Hi everyone. I have a dataset like following image, with 9 graphs, 277K nodes and 974K edges in total:

I could have other two graphs, although they are too large, and currently I am just removing them.

So far I was performing my experiments with a GATconv from an example I found on the DGL documentation:

class GAT( nn.Module ):
    def __init__( self, in_size, hid_size, out_size, heads ):
        super().__init__()
        self.gat_layers = nn.ModuleList()
        self.gat_layers.append( dglnn.GATConv( in_size, hid_size, heads[0], activation = F.relu, allow_zero_in_degree = not SELF_LOOP ))
        self.gat_layers.append( dglnn.GATConv( hid_size*heads[0], hid_size, heads[1], residual=True, activation = F.relu, allow_zero_in_degree = not SELF_LOOP )) 
        self.gat_layers.append( dglnn.GATConv( hid_size*heads[1], out_size, heads[2], residual=True, activation=None, allow_zero_in_degree = not SELF_LOOP ) )

    def forward( self, g, inputs ):
        h = inputs
        for i, layer in enumerate( self.gat_layers ):
            h = layer( g, h )
            if i == 2:  # last layer 
                h = h.mean(1)
            else:       # other layer(s)
                h = h.flatten(1)
        return h

Although I am getting errors saying my GPU has no more free memory. For this reason I would like to try stochastic training, would you say it is a good idea?

I am currently using GraphDataLoader on my dataset, and I can’t even use a batch size of 2:

          train_dataset = DataSetFromYosys( currentDir, split, mode='train' )
          val_dataset   = DataSetFromYosys( currentDir, split, mode='valid' )
          test_dataset  = DataSetFromYosys( currentDir, split, mode='test'  )

          train_dataloader = GraphDataLoader( train_dataset, batch_size=1 )
          val_dataloader   = GraphDataLoader( val_dataset,   batch_size=1 )
          test_dataloader  = GraphDataLoader( test_dataset,  batch_size=1 )

Where DataSetFromYosys implements a DGLDataset.

That being said I have some questions:

Will I have to use “dgl.dataloading.DataLoader” instead of GraphDataLoader to implement a stochastic training? Shouldn’t GraphDataLoader have a sampler option parameter, like dgl.dataloading.DataLoader does?
I got confused with the terminology. What are the batches from GraphDataLoader? when I try to print the nodes of a batch it seems to be a union of graphs from the conventional DGLDataset. Can I make them a mini-batch? Maybe a graph batch and mini-batches are completely different things?
What is the difference between a batch and a sampler? I am having a hard time understanding what the sampler is. It seems so similar to a node embedding.
I am also trying to change from GATconv to SAGEconv, I believe I read somewhere that SAGE should be lighter when considering memory, is it correct? Why so, where can I understand this better? Modifying the node embedding should help with memory issues, right?

I couldn’t understand what the “train_nids” in this tutorial are about. How can I have a tensor of node IDs if I have multiple graphs? There is no explanation on how to define train_nids, or what it is.

BarclayII · May 4, 2023, 2:03am

Hmm a batch size of 2 should give you at most around 130K nodes and 470K edges. Depending on your hidden state size and the number of heads, using GATConv could be tight, because it needs to compute an attention score with O(E * num_heads) space complexity. One trick to save GPU consumption is to use PyTorch checkpoints: torch.utils.checkpoint — PyTorch 2.0 documentation. What was your hidden state size and the memory capacity of GPU?

In your case since the graphs are reasonably small, I think you could indeed use GraphDataLoader to iterate over multiple graph datasets, and sample a minibatch of graph datasets and compute node representations there (assuming you are doing node classification).

gudeh · May 4, 2023, 2:07am

I am actually doing node regression. How exactly can I sample a minibatch? Which part of the code should I do that?

I am calling the GAT function from the other message like this:

model = GAT( in_size, 256, out_size, heads=[4,4,6]).to( device )

I have a NVIDIA 1060:

I don’t understand how to build the mini batches. When I use GraphDataLoader and use a batch = 2 (removing some graphs I can do that), if I have 6 graphs on my training set the batched object is iterable over only 3 larger graphs. Is there a way I can make them smaller instead of larger?

gudeh · May 4, 2023, 2:31am

To use mini-batches you simply have to use a large batch_size value in GraphDataLoader, am I correct? I don’t understand why I set a batch_size of 100 and it generates only 1 graph.

BarclayII · May 11, 2023, 2:43am

If you are doing node regression, are you training a single model on all these circuits, or you would like to fit a single model for each individual circuit? If the latter, then I don’t think you should use GraphDataLoader. Instead, you should use dgl.dataloading.DataLoader for each graph separately.

This is also related to your question:

In general, if you are doing node regression, then the nodes in each dataset (each circuit in your case) should have a training-validation-test split.

gudeh · May 11, 2023, 2:49pm

Hi @BarclayII, I believe you are considering a transductive learning setup, correct? Although I am executing an inductive setup, where I train with 8 ciruits and test with 1, for example. This way I am using only a single model, I train it with train set and than test on a single graph.

How could I use stochastic training on this setup? Is it possible?

BarclayII · May 18, 2023, 2:11am

In this case, I think you could merge all these datasets into one single graph where the nodes in the training circuits become training nodes and the nodes in the test circuits become test nodes. You could merge the graphs using dgl.batch.

After merging, you can feed in dgl.dataloading.DataLoader with the merged graph and train it stochastically.

gudeh · May 18, 2023, 12:25pm

Interesting idea! Although, since I have circuits, it is extremely counter-intuitive, because the circuits should be treated as an independent datapoint each. If I implement what you suggest I would be treating each node as an independent datapoint. Furthermore, the point of my implementation is to be able to have an unseen circuit as input and infer on each of its nodes with a regression prediction.

Is there a way to use stochastic training without having to segregate the internal nodes of the graphs?

system · June 17, 2023, 12:26pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.