Hello,how to do mini_batch on one graph?

user1 · April 13, 2020, 3:37am

About mini-batch?
I would like to ask how to do mini-batch training. If you get all embeddings on a whole graph and then update the losses in batches using part of nodes, is this correct? I feel a little unreasonable.
The pseudo code is as follows
The main code is
all_embedding =model(whole_graph, all_node_id, all_edge_type, all_edge_norm)

for every epoch：  
     for  mini_batch_id in batches:
        all_embedding =model(whole_graph, all_node_id, all_edge_type, all_edge_norm)

        //get all node embedding on the whole graph
        //get all node embedding on the whole graph

        loss_every_batch = model.calc_loss(all_embedding, mini_batch_id)
        //get loss on every batch 
        optimizer.zero_grad()
        loss_every_batch.backward()
        optimizer.step()

Thank you!!!

ar795 · April 13, 2020, 3:53am

Mini batch on a graph can be done, using the method dgl.batch. The following tutorial is a good beginning to understand how it works.

user1 · April 13, 2020, 4:09am

Thank you for your reply!
I just have one graph,not many graphs?

ar795 · April 13, 2020, 4:55am

Okay,
Do you mean that the graph structure is fixed but the feature matrix is changing?

user1 · April 13, 2020, 5:50am

No,just like the code below. Calculate the embedding representation of some nodes in each batch. It’s like calculating by sampling some nodes. But I see that some people are still calculating the embedding of all nodes in each batch, which puzzles me, not sure if the above code is correct.

github.com

dmlc/dgl/blob/master/examples/pytorch/rgcn-hetero/entity_classify_mb.py

"""Modeling Relational Data with Graph Convolutional Networks
Paper: https://arxiv.org/abs/1703.06103
Reference Code: https://github.com/tkipf/relational-gcn
"""
import argparse
import itertools
import numpy as np
import time
import torch as th
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader
from functools import partial

import dgl
from dgl.data.rdf import AIFB, MUTAG, BGS, AM
from model import EntityClassify, RelGraphEmbed

class HeteroNeighborSampler:
    """Neighbor sampler on heterogeneous graphs

This file has been truncated. show original

user1 · April 13, 2020, 12:05pm

up up up go go go up

mufeili · April 14, 2020, 4:19am

This depends on if you are performing full graph training or mini-batch training. In full graph training, you update the representations of all nodes by simultaneously performing message passing over the full graph. In mini-batch training, we update the representations of nodes by performing message passing on a subgraph only, the loss will also be computed only on a subset of nodes.

minjie · April 14, 2020, 6:41am

In most cases we don’t compute loss over the entire node embedding tensor for mini-batch training. Below is a pseudo-code about how to achieve this. You can see it first samples a subgraph and then extracts the embeddings needed for loss computation.

for epoch in range(MAX_EPOCHS)：  
     for batch_id in range(NUM_BATCHES):
          batch_node = all_node_id[batch_id * BATCH_SIZE : (batch_id + 1) * BATCH_SIZE)
          # sample from the whole graph
          mini_batch = sample(whole_graph, batch_node)
          # get the embeddings needed by this batch
          batch_embedding = extract(whole_graph, all_embedding, mini_batch)

          loss_every_batch = model.calc_loss(batch_embedding)
          # get loss on every batch 
          optimizer.zero_grad()
          loss_every_batch.backward()
          optimizer.step()

See complete examples in our repo. They all follow the above training methodology.

GraphSAGE w/ mini-batch training: https://github.com/dmlc/dgl/tree/master/examples/pytorch/graphsage
RGCN w/ mini-batch training: https://github.com/dmlc/dgl/tree/master/examples/pytorch/rgcn-hetero

user1 · April 14, 2020, 12:41pm

The code seems to have some APIs without documentation to learn.
Can the previous ALL nn module be used directly in mini batch training? The previous nn module does not seem to be designed for mini batch.

user1 · April 14, 2020, 1:01pm

The training data of the above code is indeed obtained in batches. However, in each batch, the embedding of all nodes is calculated, and only a part of the nodes used in the calculation of loss in each batch .
In other words, in each batch, the aggregation operation is performed on the entire graph, and only a part of the nodes are used to calculate the loss when calculating the loss.

Isn’t that okay?
In this case, the parameters of all nodes are updated when backpropagating.

minjie · April 14, 2020, 1:26pm

The API doc will be online soon. Sorry for the delay. Here is a hands-on tutorial we are preparing for WWW’20. It covers the concept and usage of the new user experience for mini-batching sampling. https://github.com/dglai/WWW20-Hands-on-Tutorial/blob/master/large_graphs/large_graphs.ipynb .

They can! In fact, this is one of the goals of the whole new sampler API design. You can see here that we directly use the dgl.nn.SAGEConv module on sampled graphs.

user1 · April 15, 2020, 2:22am

@mufeili Although the embedding representation of the entire graph is calculated, only a part of the nodes are used. It seems that more parameters will be updated, which is slower than sample the nodes needed?

mufeili · April 15, 2020, 7:22am

That means you have some unnecessary computation, which will be slower.

user1 · April 15, 2020, 9:03am

But that’s not wrong, right?
I have been troubled with minibatch training. It is a compromise.

mufeili · April 15, 2020, 10:07am

Did you check the tutorial @minjie posted?

user1 · April 15, 2020, 10:24am

I have checked that tutorial and have a general understanding of the use of minibatch. There are some APIs such as in_subgraph which are not well understood.
I am doing a complex recommendation system task, sampling is not easy to achieve.

user1 · April 15, 2020, 10:25am

@mufeili

2020 2020 2020 2020 2020

minjie · April 15, 2020, 1:33pm

I don’t fully get your question. Are you asking about whether the entire embedding is being updated during backward propagation? The answer depends on how the backend framework (such as PyTorch) implement the gradient operation of an embedding lookup. If implemented correctly, only the node embeddings that are used during forward propagation are updated. I need to check whether this is the case for torch.nn.Embedding. If that is unfortunately not the case, you need to manually implement the gradient update part.

user1 · April 15, 2020, 2:40pm

感谢您的耐心解答，您说的问题确实是一方面，
我的主要问题是能不能每个批次计算的的时候都在所有节点上进行若干层聚合从而获取所有节点的表示（本来应该按所依赖的邻居进行取样的，但是我做的事情取样比较麻烦），但是每个批次计算损失的时候只利用上面获取的节点表示中批次中涉及到的节点计算损失进行反向传播。
我主要是不太确定这样做对不对，因为最近看到有人这么做。
如果更新的时候不涉及到不依赖的节点那样的话最好不过了，这样的话在整个图上计算其实好像也没太大问题。

minjie · April 15, 2020, 4:05pm

That is technically doable and correct, but I want to point out that it is equivalent to performing aggregation on the full neighborhood, which is exactly what g.in_subgraph is for.