Deep walk Indexing error. Confusing Behavior

Implementing Deep walk should be a relative easy and smooth procedure.
However, I meet an indexing error that I cannot resolve.
Here’s my code:

my_dataset = GrowthGraphDataset(is_bidirectional=True, feature_list=config.feature_list)
g = my_dataset.load_built_graph("data/built_graph.bin")

# dataset = CoraGraphDataset()
# g = dataset[0]

print(g)
print(g.num_nodes())
print(g.ndata['train_mask'].shape)
print(g.ndata['test_mask'].shape)
print(g.nodes())

model = DeepWalk(
    g,
    walk_length=10
)
dataloader = DataLoader(torch.arange(g.num_nodes()), batch_size=8,
                        shuffle=True, collate_fn=model.sample)
optimizer = SparseAdam(model.parameters(), lr=0.01)
num_epochs = 5

model.train()

for epoch in range(num_epochs):
    for batch_walk in dataloader:
        print("batch_walk: ", batch_walk)
        print("bacth_walk shape: ", batch_walk.shape)
        loss = model(batch_walk)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    print('Epoch:', epoch, 'Loss:', loss.item())

If I use The coreDataset, everything is good
However, if i use my own dataset, I meet an indexing error:

raceback (most recent call last):
  File "deepwalk.py", line 62, in <module>
    embedding_table = train_embedding(g)
  File "deepwalk.py", line 28, in train_embedding
    loss = model(bacth_walk)
  File "/miniconda3/envs/graph_cpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/miniconda3/envs/graph_cpu/lib/python3.8/site-packages/dgl/nn/pytorch/network_emb.py", line 180, in forward
    batch_node_embed = self.node_embed(batch_walk).view(-1, self.emb_dim)
  File "/miniconda3/envs/graph_cpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/miniconda3/envs/graph_cpu/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 160, in forward
    return F.embedding(
  File "/miniconda3/envs/graph_cpu/lib/python3.8/site-packages/torch/nn/functional.py", line 2210, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

However, for coraDataset:

Graph(num_nodes=2708, num_edges=10556,
      ndata_schemes={'feat': Scheme(shape=(1433,), dtype=torch.float32), 'label': Scheme(shape=(), dtype=torch.int64), 'test_mask': Scheme(shape=(), dtype=torch.bool), 'val_mask': Scheme(shape=(), dtype=torch.bool), 'train_mask': Scheme(shape=(), dtype=torch.bool)}
      edata_schemes={})
2708
torch.Size([2708])
torch.Size([2708])
tensor([   0,    1,    2,  ..., 2705, 2706, 2707])

for my dataset

Graph(num_nodes=16479210, num_edges=21906495,
      ndata_schemes={'test_mask': Scheme(shape=(), dtype=torch.uint8), 'val_mask': Scheme(shape=(), dtype=torch.uint8), 'train_mask': Scheme(shape=(), dtype=torch.uint8), 'label': Scheme(shape=(1,), dtype=torch.int64), 'feat': Scheme(shape=(21,), dtype=torch.float64)}
      edata_schemes={})
16479210
torch.Size([16479210])
torch.Size([16479210])
tensor([       0,        1,        2,  ..., 16479207, 16479208, 16479209])

Checking the format, indicating it‘s the same
why will there be an indexing error?

I noticed that, when doing dataloading, the batch work will sample to node id which equal to -1 in my dataset, which is strange since on my dataset there’s not any node id equal to -1, as I printed above.
In cora dataset there issue does not exist.
here’s the print

batch_walk:  tensor([[13877061, 12754157,       -1,       -1,       -1,       -1,       -1,
               -1,       -1,       -1],
        [   78126,    54702,    46373,    46364,       -1,       -1,       -1,
               -1,       -1,       -1],
        [ 2381469,       -1,       -1,       -1,       -1,       -1,       -1,
               -1,       -1,       -1],

PS: the dataloader is from

from torch.utils.data import DataLoader
dataloader = DataLoader(torch.arange(g.num_nodes()), batch_size=128,
                        collate_fn=model.sample)

inspecting the collate_fn I see:
def sample(self, indices):
“”"Sample random walks

    Parameters
    ----------
    indices : torch.Tensor
        Nodes from which we perform random walk

    Returns
    -------
    torch.Tensor
        Random walks in the form of node ID sequences. The Tensor
        is of shape :attr:`(len(indices), walk_length)`.
    """
    return random_walk(self.g, indices, length=self.walk_length - 1)[0]

which cannot generate a -1 from my node id sequence
Can anyone give some hints?

are you using dgl.sampling.random_walk — DGL 1.1.2 documentation ? As mentioned in the doc page: If a random walk stops in advance, DGL pads the trace with -1 to have the same length. this could be the reason.

Hi Rehett! It seems that this is the point~ But How can I resolve it? I have messaged you in the slack.
My graph runs well with GraphSAGE, and I have already checked the node indices, so I’m fairly certain the issue is with the following code:

dataloader = DataLoader(torch.arange(g.num_nodes()), batch_size=128, collate_fn=model.sample)

The problem seems to be in the model.sample function:

def sample(self, indices):
    """Sample random walks
    Parameters
    ----------
    indices : torch.Tensor
        Nodes from which we perform random walk
    Returns
    -------
    torch.Tensor
        Random walks in the form of node ID sequences. The Tensor
        is of shape :attr:`(len(indices), walk_length)`.
    """
    return random_walk(self.g, indices, length=self.walk_length - 1)[0]

It uses the random_walk function you mentioned. It’s likely that the entire graph is not a connected graph. How should I resolve this issue with the -1 values?

please try with dgl.sampling.pack_traces — DGL 1.1.2 documentation which removed -1s

for reference: https://github.com/dmlc/dgl/blob/5a4174148536eacb77f18a20f7c3a1a8395cd343/examples/pytorch/graphsaint/sampler.py#L386-L395

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.