Hello!

I was reading through HGTConv implementation (the latest version) - and I can’t seem to find the non-linear activation. In the documentation it’s written as

Where we sum residual connection and activation of the layer (which matches the paper schema https://arxiv.org/pdf/2003.01332).

However in the code we have:

```
h = g.dstdata["h"].view(-1, self.num_heads * self.head_size)
# target-specific aggregation
h = self.drop(self.linear_a(h, dstntype, presorted))
alpha = torch.sigmoid(self.skip[dstntype]).unsqueeze(-1)
if x_dst.shape != h.shape:
h = h * alpha + (x_dst @ self.residual_w) * (1 - alpha)
else:
h = h * alpha + x_dst * (1 - alpha)
if self.use_norm:
h = self.norm(h)
return h
```

It looks to me like no activation is done here at all. Can be bypassed by putting activation layer *after* HGTConv layer explicitly, but that doesn’t match the original paper.

Was this design intended?

Thank you!