Neighbour sampling on cpu crashes for graphbolt heterographs

torch version ‘2.3.1+cu121’
dgl version ‘2.3.0+cu121’

import dgl
import dgl.graphbolt as gb
import dgl.nn as dglnn
import torch
from torch import nn


u, i = torch.randint(20, size=(1000,)), torch.randint(10, size=(1000,))
g = dgl.heterograph({('u', 'w', 'i'): (u, i), ('i', 'b', 'u'): (i, u)})

gg = dgl.graphbolt.from_dglgraph(g)

for i in range(100):
    print(i)
    n = gg.sample_neighbors({'u': torch.randint(10, (100,))}, fanouts=torch.tensor([-1]))

print(n)

crashes with different memory corruption errors ex

0
1
corrupted size vs. prev_size in fastbins

on gpu sampling works fine

1 Like

I have the same problem with gb.ItemSampler

train_set = dataset.tasks[0].train_set

gb_graph = dataset.graph
datapipe = gb.ItemSampler(train_set, batch_size=256, shuffle=True)
datapipe = datapipe.sample_neighbor(gb_graph, [5, 5])
dataloader = gb.DataLoader(datapipe)
data = next(iter(dataloader))

causes Segmentation fault

Let me see if I can quickly fix it. If not, I will delegate the issue to @frozenbugs.

Fixed the issue in #7719. The following code works now:

import dgl
import dgl.graphbolt as gb
import torch

device = "cpu"

u, i = torch.randint(20, size=(1000,)), torch.randint(10, size=(1000,))
g = dgl.heterograph({('u', 'w', 'i'): (u, i), ('i', 'b', 'u'): (i, u)})

gg = dgl.graphbolt.from_dglgraph(g).to(device)

for i in range(100):
    print(i)
    n = gg.sample_layer_neighbors({'u': torch.randint(10, (100,), device=device)}, fanouts=torch.tensor([-1]))
    print(n)

Thank you for reporting this issue @lodagon. I have filed a PR which should be available in tomorrow’s nightly build.