The example is:
import dgl
import torch
import torch.nn as nn
from dgl.nn import GraphConv
from torch.optim import Adam
num_nodes = 5
emb_size = 5
g = dgl.rand_graph(num_nodes=num_nodes, num_edges=25)
embed = nn.Embedding(num_nodes, emb_size)
model = GraphConv(emb_size, 1)
optimizer = Adam(list(model.parameters()) + list(embed.parameters()), lr=1e-3)
labels = torch.zeros((num_nodes, 1))
criteria = nn.BCEWithLogitsLoss()
num_epochs = 5
for _ in range(num_epochs):
pred = model(g, embed.weight)
loss = criteria(pred, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
The question is: Is this approach necessarily better than other initialization method? Personally I think using nn.Embedding directly seems universally appropriate.