Is there an unsupervised GNN method that could generate edge weight?

cs001632 · April 13, 2021, 1:40pm

Could you kindly recommend an unsupervised GNN method that could generate edge weight (to rank the importance of a node’ neighbors to the node)?

mufeili · April 19, 2021, 5:53am

I assume this is resolved via alternative communication channels.

The answer is that there does not exist research that does exactly what you described. One possibility is to extend previous work on learning node embeddings with unsupervised learning.

Feel free for a follow-up discussion.

ZZy979 · April 19, 2021, 11:13am

SuperGAT
Paper link: How to Find Your Friendly Neighborhood: Graph Attention Design with Self-Supervision
The author designed a self-supervised task (link prediction) to perform unsupervised learning. The e_ij learned by graph attention just means neighbour’s importance.

cs001632 · April 19, 2021, 11:32am

Thank you for your suggestion! Your implement pytorch-tutorial/train.py at f1400e926e446f348668af7a4449f78c8a50e6f5 · ZZy979/pytorch-tutorial · GitHub is inspiring. However, we do not have label for training, and loss function must change: F.cross_entropy(logits[train_idx], labels[train_idx]). Could you kindly suggest how to change the loss function?

ZZy979 · April 19, 2021, 11:43am

By using attn_loss returned by the model only, the model is trained by link prediction task only, without node labels.

github.com

ZZy979/pytorch-tutorial/blob/master/gnn/supergat/train.py#L65


g, labels, num_classes, train_idx, val_idx, test_idx = load_data(args.dataset, args.ogb_root, args.seed, device)
features = g.ndata['feat']
model = SuperGAT(
    features.shape[1], args.num_hidden, num_classes, args.num_heads, args.attn_type,
    args.neg_sample_ratio, args.dropout, args.dropout
).to(device)
optimizer = optim.Adam(model.parameters(), lr=args.lr, weight_decay=args.weight_decay)
for epoch in range(args.epochs):
    model.train()
    logits, attn_loss = model(g, features)
    loss = F.cross_entropy(logits[train_idx], labels[train_idx])
    loss += args.attn_loss_weight * attn_loss
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    train_acc = accuracy(logits[train_idx], labels[train_idx])
    val_acc = evaluate(model, g, features, labels, val_idx)
    print('Epoch {:04d} | Loss {:.4f} | Train Acc {:.4f} | Val Acc {:.4f}'.format(
        epoch, loss.item(), train_acc, val_acc

The tensor e here is the edge feature you want. Each item e_ij means the importance of neighbour j to node i for edge <i, j>.

github.com

ZZy979/pytorch-tutorial/blob/master/gnn/supergat/model.py#L54


def forward(self, g, feat):
    """
    :param g: DGLGraph 同构图
    :param feat: tensor(N_src, d_in) 输入顶点特征
    :return: tensor(N_dst, K, d_out) 输出顶点特征
    """
    with g.local_scope():
        feat_src = self.fc(self.feat_drop(feat)).view(-1, self.num_heads, self.out_dim)
        feat_dst = feat_src[:g.num_dst_nodes()] if g.is_block else feat_src
        e = self.leaky_relu(self.attn(g, feat_src, feat_dst))  # (E, K, 1)
        g.edata['a'] = self.attn_drop(edge_softmax(g, e))  # (E, K, 1)
        g.srcdata['ft'] = feat_src
        # 消息传递
        g.update_all(fn.u_mul_e('ft', 'a', 'm'), fn.sum('m', 'ft'))
        out = g.dstdata['ft']  # (N_dst, K, d_out)
        if self.training:
            # 负采样
            neg_g = dgl.graph(
                self.neg_sampler(g, list(range(g.num_edges()))), num_nodes=g.num_nodes(),

cs001632 · April 19, 2021, 3:49pm

Thank you for your timely reply! I have adapted your code as follows for unsupervised learning.

This code has successfully run, however, the loss is around 0.6 in various datasets, such as cora . I have known that unsupervised learning is well-known to be lower accuracy. Might some trick greatly improve the performance?

ZZy979 · April 20, 2021, 2:26am

The attn_loss is from link prediction, while the accuracy is computed for node classification. As you’ve removed classification loss (cross entropy), the model is only trained by link prediction task and can’t learn any information about label.

As the author pointed out in the paper:

… it is hard to learn the relational importance from edges by simply optimizing graph attention for link prediction.

(The “relational importance” here means label information.)

So what do you want to do by unsupervised learning? For supervised downstream task?

cs001632 · April 20, 2021, 5:06am

Our data does not contain enough label, therefore, we want to train an unsupervised model. Then we can extract edge weight to select most essential neighbor for each node. Our aim is to select the most essential neighbor, what is your suggestion?
How to match the edge weight and edge name? In cora dataset, the shape of e is (9228,8,1), while g.edata has no attribute. How do I know which pair of nodes make up an edge?

ZZy979 · April 20, 2021, 6:32am

Even though you only have a few labled nodes, you can still perform node classification, and it’s helpful for the model to learn better graph attention (neighbour importance).

# suppose -1 means "no label"
train_mask = labels != -1
...
logits, attn_loss = model(g, features)
loss = F.cross_entropy(logits[train_mask], labels[train_mask])
loss += args.attn_loss_weight * attn_loss

As the author puts, node classification guides graph attention to give higher weight to neighbours with the same label, while link prediction helps learn graph structural information.

The shape of tensor e is (E, K, 1), and it’s matched with edges by edge id: g.edata['e'] = e.
Actually, it was originally an edge feature:

github.com

ZZy979/pytorch-tutorial/blob/master/gnn/supergat/attention.py#L46


        gain = nn.init.calculate_gain('relu')
        nn.init.xavier_normal_(self.attn_l, gain=gain)
        nn.init.xavier_normal_(self.attn_r, gain=gain)
    def forward(self, g, feat_src, feat_dst):
        el = (feat_src * self.attn_l).sum(dim=-1, keepdim=True)  # (N_src, K, 1)
        er = (feat_dst * self.attn_r).sum(dim=-1, keepdim=True)  # (N_dst, K, 1)
        g.srcdata['el'] = el
        g.dstdata['er'] = er
        g.apply_edges(fn.u_add_v('el', 'er', 'e'))
        return g.edata.pop('e')
class DotProductAttention(GraphAttention):
    """点积注意力"""
    def forward(self, g, feat_src, feat_dst):
        g.srcdata['ft'] = feat_src
        g.dstdata['ft'] = feat_dst
        g.apply_edges(lambda edges: {
            'e': torch.sum(edges.src['ft'] * edges.dst['ft'], dim=-1, keepdim=True)

“K” means the number of attention heads. You can take the mean of each head’s attention value as this:

github.com

ZZy979/pytorch-tutorial/blob/master/gnn/supergat/model.py#L68


        g.update_all(fn.u_mul_e('ft', 'a', 'm'), fn.sum('m', 'ft'))
        out = g.dstdata['ft']  # (N_dst, K, d_out)
        if self.training:
            # 负采样
            neg_g = dgl.graph(
                self.neg_sampler(g, list(range(g.num_edges()))), num_nodes=g.num_nodes(),
                device=g.device
            )
            neg_e = self.attn(neg_g, feat_src, feat_src)  # (E', K, 1)
            self.attn_x = torch.cat([e, neg_e]).squeeze(dim=-1).mean(dim=1)  # (E+E',)
            self.attn_y = torch.cat([torch.ones(e.shape[0]), torch.zeros(neg_e.shape[0])]) \
                .to(self.attn_x.device)
        if self.activation:
            out = self.activation(out)
        return out
def get_attn_loss(self):
    """返回自监督注意力损失（即连接预测损失）"""
    if self.training:

In DGL, g.edges() returns edges of g as a pair (source nodes, destination nodes), with index corresponding to edge ids.

>>> g = dgl.graph(([0, 0, 1], [1, 2, 2]))
>>> g.edges()
(tensor([0, 0, 1]), tensor([1, 2, 2]))
>>> g.find_edges(eid=0)
(tensor([0]), tensor([1]))

In practice, however, edge features are used in message passing, and you don’t need to know the source and destination nodes of each edge.

Hope my answers make sense to you.

cs001632 · April 20, 2021, 7:09am

ZZy979:

Even though you only have a few labled nodes, you can still perform node classification, and it’s helpful for the model to learn better graph attention (neighbour importance).
# suppose -1 means "no label"
train_mask = labels != -1
...
logits, attn_loss = model(g, features)
loss = F.cross_entropy(logits[train_mask], labels[train_mask])
loss += args.attn_loss_weight * attn_loss
As the author puts, node classification guides graph attention to give higher weight to neighbours with the same label, while link prediction helps learn graph structural information.

The shape of tensor e is (E, K, 1), and it’s matched with edges by edge id: g.edata['e'] = e.
Actually, it was originally an edge feature:

pytorch-tutorial/attention.py at master · ZZy979/pytorch-tutorial · GitHub

“K” means the number of attention heads. You can take the mean of each head’s attention value as this:

pytorch-tutorial/model.py at master · ZZy979/pytorch-tutorial · GitHub

In DGL, g.edges() returns edges of g as a pair (source nodes, destination nodes), with index corresponding to edge ids.
>>> g = dgl.graph(([0, 0, 1], [1, 2, 2]))
>>> g.edges()
(tensor([0, 0, 1]), tensor([1, 2, 2]))
>>> g.find_edges(eid=0)
(tensor([0]), tensor([1]))
In practice, however, edge features are used in message passing, and you don’t need to know the source and destination nodes of each edge.

Hope my answers make sense to you.

Thank you for your detailed response.

In fact, our graph composed of two kind of nodes, we have the label of all nodes, our aim is to select the essential neighbor for each node. Also, we have 2 types of edges (relationship), if possible, we want to select the essential neighbor of a node in each type of edges.
The output in your last suggestion is a tensor composed of numbers. I guess the numbers were ordered according to the input feature, then I just need to match the feature name to the number. Is that the right thing to do?

ZZy979 · April 20, 2021, 9:14am

These numbers are node ids, not input features. What do you mean by “feature name”?

cs001632 · April 20, 2021, 9:18am

nodetable
featurename type Attr
A type2 5
B type1 3
C type1 2
D type2 5
This is my input node table. Did node ids (0,1,2,3) in tensor of g.edges() is A,B,C,D?

ZZy979 · April 20, 2021, 9:27am

Sorry, but I don’t understand what this table means.
“Node id” has nothing to do with “node feature”.

cs001632 · April 20, 2021, 4:04pm

Thank for your good question on node id, I have managed to figure out it by reading documents and trial and error. One more question, what is the publication source of the unsupervised attr_loss in your implementation of supergat? Is it in the original paper?

ZZy979 · April 20, 2021, 11:48pm

Yes, attn_loss is from Eq. (5) in the paper.

system · May 20, 2021, 11:48pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.