Implementing Deep walk should be a relative easy and smooth procedure.
However, I meet an indexing error that I cannot resolve.
Here’s my code:
my_dataset = GrowthGraphDataset(is_bidirectional=True, feature_list=config.feature_list)
g = my_dataset.load_built_graph("data/built_graph.bin")
# dataset = CoraGraphDataset()
# g = dataset[0]
print(g)
print(g.num_nodes())
print(g.ndata['train_mask'].shape)
print(g.ndata['test_mask'].shape)
print(g.nodes())
model = DeepWalk(
g,
walk_length=10
)
dataloader = DataLoader(torch.arange(g.num_nodes()), batch_size=8,
shuffle=True, collate_fn=model.sample)
optimizer = SparseAdam(model.parameters(), lr=0.01)
num_epochs = 5
model.train()
for epoch in range(num_epochs):
for batch_walk in dataloader:
print("batch_walk: ", batch_walk)
print("bacth_walk shape: ", batch_walk.shape)
loss = model(batch_walk)
optimizer.zero_grad()
loss.backward()
optimizer.step()
print('Epoch:', epoch, 'Loss:', loss.item())
If I use The coreDataset, everything is good
However, if i use my own dataset, I meet an indexing error:
raceback (most recent call last):
File "deepwalk.py", line 62, in <module>
embedding_table = train_embedding(g)
File "deepwalk.py", line 28, in train_embedding
loss = model(bacth_walk)
File "/miniconda3/envs/graph_cpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/miniconda3/envs/graph_cpu/lib/python3.8/site-packages/dgl/nn/pytorch/network_emb.py", line 180, in forward
batch_node_embed = self.node_embed(batch_walk).view(-1, self.emb_dim)
File "/miniconda3/envs/graph_cpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/miniconda3/envs/graph_cpu/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 160, in forward
return F.embedding(
File "/miniconda3/envs/graph_cpu/lib/python3.8/site-packages/torch/nn/functional.py", line 2210, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self
However, for coraDataset:
Graph(num_nodes=2708, num_edges=10556,
ndata_schemes={'feat': Scheme(shape=(1433,), dtype=torch.float32), 'label': Scheme(shape=(), dtype=torch.int64), 'test_mask': Scheme(shape=(), dtype=torch.bool), 'val_mask': Scheme(shape=(), dtype=torch.bool), 'train_mask': Scheme(shape=(), dtype=torch.bool)}
edata_schemes={})
2708
torch.Size([2708])
torch.Size([2708])
tensor([ 0, 1, 2, ..., 2705, 2706, 2707])
for my dataset
Graph(num_nodes=16479210, num_edges=21906495,
ndata_schemes={'test_mask': Scheme(shape=(), dtype=torch.uint8), 'val_mask': Scheme(shape=(), dtype=torch.uint8), 'train_mask': Scheme(shape=(), dtype=torch.uint8), 'label': Scheme(shape=(1,), dtype=torch.int64), 'feat': Scheme(shape=(21,), dtype=torch.float64)}
edata_schemes={})
16479210
torch.Size([16479210])
torch.Size([16479210])
tensor([ 0, 1, 2, ..., 16479207, 16479208, 16479209])
Checking the format, indicating it‘s the same
why will there be an indexing error?