Implementing a batch of batches

florpi · December 28, 2019, 1:57pm

Hi everyone,

I have a big training set of about hundreds of independent graphs, with around 406660 nodes each. I’m trying to use dgl.batch to batch the different independent graphs, and then use dgl.contrib.sampling.NeighborSampler to randomly sample some of the nodes and their neighbourhood, since my independent graphs are pretty large. However, I get this error message:

Traceback (most recent call last):
  File "train_model.py", line 169, in <module>
    app.run(train)
  File "/home/cc91/.local/lib/python3.7/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/cc91/.local/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "train_model.py", line 123, in train
    expand_factor = expand_factor):
  File "/home/cc91/.local/lib/python3.7/site-packages/dgl/contrib/sampling/sampler.py", line 320, in __init__
    ThreadPrefetchingWrapper)
  File "/home/cc91/.local/lib/python3.7/site-packages/dgl/contrib/sampling/sampler.py", line 154, in __init__
    raise NotImplementedError("This loader only support read-only graphs.")
NotImplementedError: This loader only support read-only graphs.

However, as I understand the batched graphs are already readonly. Anyone has an idea of what is going on?

This is the part of the code concerned,

     optimizer = torch.optim.Adam(model.parameters(),
                              lr=config_dict['learning_rate'],
                              )
      criterion = nn.MSELoss()
      print("Start training...")
      model.train()
      start = time.time()
  
      for epoch in range(FLAGS.n_epochs):
          loss_list = []
          for batch, data in enumerate(train_dataloader):
              graph, labels = data
              print(type(graph))
              for nf in dgl.contrib.sampling.NeighborSampler(graph,
                                                          minibatch_size,
                                                         expand_factor = expand_factor):
                  nf.copy_from_parent()
                  nf.ndata['features'] = nf.ndata['features'].to(device)

                  labels = labels.to(device)
                  logits = model.forward(nf, nf.ndata['features'])
                  loss = criterion(logits, labels)

The output of print(type(graph)) is <class ‘dgl.batched_graph.BatchedDGLGraph’>, which seems to be correct.

mufeili · December 28, 2019, 6:33pm

This looks like a bug to me and I’ve reported it in issue #1148. Could you please try graph._graph.readonly(True) after graph, labels = data and see if the issue gets resolved?

florpi · December 29, 2019, 12:48pm

Yes, thank you. I think the documentation is outdated.

Also, I’m still a bit confused about how to use NeighborSampler. How can I access the graph ndata? I need it to do the forward pass (model.forward(graph, graph.ndata[‘features’])). Also, do I need to modify the update_all function when using NeighborSampler?

mufeili · December 29, 2019, 2:42pm

NeighborSampler returns instances of a different data structure called NodeFlow. You may find this tutorial to be helpful.

florpi · December 29, 2019, 3:48pm

Thanks! Do you know whether there is any pytorch example using NeighborSampler? It’d be helpful.

mufeili · December 29, 2019, 6:07pm

This one might be helpful: https://github.com/dmlc/dgl/tree/master/examples/pytorch/sampling