Why does MultiLayerFullNeighborSampler consume a large amount of memory?

I’m trying to construct sub-graph by using MultiLayerFullNeighborSampler with 4 layers.

However, it takes a large amount of memory (the batch size is 2048, # of workers: 8).
It requires over 1TB memory (Actually, it rans out of memory after using 1TB).

The target dataset is twitter-2010.

It seems that sampling full neighbor can be simply done by using BFS algorithm (and others) for each batch.

What causes such memory usage?

Could provide the code script you are running? Are you using multiprocessing?

Here’s my test code.

Instantiating NodeDataLoader takes a large amount of memory

import argparse
import torch
import dgl
import numpy as np
import sys

from TwitterDataset import TwitterDataset
from SinaweiboDataset import SinaweiboDataset
from FriendsterDataset import FriendsterDataset
from ogb.nodeproppred import DglNodePropPredDataset

def load_dataset(dataset_name):
    if dataset_name.startswith('ogbn'):
        ogbn_dataset_root = '/home/shared/gnn_dataset/ogbn/dataset'
        return DglNodePropPredDataset(name = dataset_name,
                                      root = ogbn_dataset_root)
    elif dataset_name == 'twitter-2010':
        return TwitterDataset()
    elif dataset_name == 'friendster':
        return FriendsterDataset()
    elif dataset_name == 'sinaweibo':
        return SinaweiboDataset()
        assert False

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('-g', '--graph', type = str,
                        help = 'dataset', default = 'ogbn-products',
                        dest = "dataset")
    parser.add_argument('-b', '--batch', type = int,
                        help = 'batch size', default = 2048,
                        dest = "batch")
    args = parser.parse_args()

    print("Start loading dataset")
    dataset = load_dataset(dataset_name = args.dataset)
    graph, label = dataset[0]
    print("End loading dataset")

    fanout_list = ['full']
    num_layers = 4
    for fanout in fanout_list:
        print("Processing: {}".format(fanout))
        if fanout == 'full':
            sampler = dgl.dataloading.MultiLayerFullNeighborSampler(num_layers)
            sampler = dgl.dataloading.MultiLayerNeighborSampler([
                int(nbr) for nbr in fanout.split(',')])
        train_nids = dataset.get_idx_split()['train']
        dataloader = dgl.dataloading.NodeDataLoader(
            batch_size = args.batch,
            shuffle = True,
            drop_last = True,
            device = 'cpu',
            num_workers = 8)

        src_list = []
        for step, (input_nodes, output_nodes, mfgs) in enumerate(dataloader):
            for i, mfg in enumerate(mfgs):
                g = dgl.block_to_graph(mfg)
                #srcs = np.asarray(g.srcnodes['_N_src'].data[dgl.NID].to('cpu'))
                srcs = np.asarray(g.srcnodes['_N_src'].data[dgl.NID])

            if step == 1:

Could you try adding graph.create_formats_() after graph, label = dataset[0]? The reason is that some formats may be needed for sampling, and it’s created for each process. By creating it in the main process, the forked subprocess doesn’t need to recreate the format seperately.

Also sorry for the inconvenience and we’re glad to help solve this issue if it still exists

It still consumes over 1 TB although I call graph.create_formats_()

Is the problem caused by the subprocesses making duplicated graph formats?

Could you print the memory occupied by each process?

And could you check the result of graph.formats() after calling graph.create_formats_()?

graph.formats() prints {'created': ['coo', 'csr', 'csc'], 'not created': []} indicating all formats are created.

While the program is still in a running state, all processes (# of processes becomes larger than # of workers after 3 ~ 4 miniutes) uniformly consume memory.

This seems a bug. We’ll investigate it tomorrow. Thanks for reporting.

Thank you. I appreciate it.


I’m not really sure about the core problem here. One thing is that you are using 4 layers for full neighbor sampling, which is likely to be a very large graph (might includes more 50% nodes in the graph in some cases)

Another thing you can try is to change sampler = dgl.dataloading.MultiLayerFullNeighborSampler(num_layers) to sampler = dgl.dataloading. MultiLayerNeighborSampler([-1 for _ in range(num_layers)]). This should have exact the same output of FullNeighborSampler but with slightly different implementation, which is consistent to the normal NeighborSampler.

I suspect there’s problem in our MultiLayerFullNeighborSampler implementation, it used in_subgraph instead of sample_neighbors underlying, but I cannot verify it yet.

I also created an issue at MultiLayerFullNeighborSampler takes too much memory · Issue #3476 · dmlc/dgl · GitHub for tracking

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.