In case there is no attribute about the graph?

Hi
As the tutorial shows, the input of GCN in GGL need the attribute (vector) of each node.
So, if there is no attribute and only the graph topology structure, how to use DGL to learn the embedding of nodes?
Thanks

1 Like

Hi,

You can use constant vector as node feature such as np.ones(10) or node’s degree as initial feature.

1 Like

Yeah,
Thanks for your reply.

@VoVAllen, I saw in the example for heterogeneous graphs, they used a one-hot vector for the node feature. What are the pros/cons of using one-hot encoding versus degree or constant vectors? I can see how degree would be important on its own, but wouldn’t that feature be approximately learned on its own through sampling/learning over edges (please forgive me if I’m not using the correct terminology, but I hope you know what I mean)? I guess the differences between one-hot encoding versus constant vectors is harder for me to know offhand.

You can also learn embeddings for the nodes. In that case, the “attribute” for each node is just a unique integer, and then that integer is used to lookup the embedding vector. For example:

class GCNEmbedding(gluon.Block):
    def __init__(self,
                 n_nodes,
                 embed_dim,
                 n_hidden,
                 n_classes,
                 n_layers,
                 activation,
                 dropout,
                 **kwargs):
        super(GCNEmbedding, self).__init__(**kwargs)
        self.dropout = dropout
        self.n_layers = n_layers
        self.embed_dim = embed_dim
        with self.name_scope():
            # Node embeddings 
            self.embed = gluon.nn.Embedding(n_nodes, embed_dim)
            self.layers = gluon.nn.Sequential()
            # GCN layers
            self.layers.add(GraphConv(embed_dim, n_hidden, activation))
            for i in range(1, n_layers-1):
                self.layers.add(GraphConv(n_hidden, n_hidden, activation))
            # output layer
            self.layers.add(GraphConv(n_hidden, n_classes))

    def forward(self, inputs):
        h = self.embed(inputs)
        h = self.layers(h)
        return h

And then you would set attributes on graph g like:

inputs = mx.nd.array(list(range(g.number_of_nodes()))).astype('int64')
g.ndata['features'] = inputs 
1 Like

I would like to ask this as well. The original GCN paper uses one-hot vectors for the initial feature on graphs that do not have features.
I believe a rule of thumb would be to feed as much information as possible to the graph as input. So the first choice would be features, then degree information/identity, and last constant vectors. If your graph neural network has normalization by degree like the original GCN or GraphSAGE, then I doubt if the degree information can be directly decoded from the learned graph embedding. (Please correct me if I am wrong.)

1 Like

Is it a wise choice to initialize embedding vectors with embeddings derived from another algorithm like node2vec?

1 Like

I think this would basically “jump start” your node-level feature learning.

If node2vec learns embeddings through the skip-gram model of predicting neighbors and non-neighbors of each node and then extracts/uses the middle layer of the model as the embedding weights, then this updating of the layer weights over time is basically what RGCN is doing if I understand correctly (given the task of predicting nodes’ neighbors).

So it seems like a good way to initialize the embedding vector if you have the data already; however, it is something that could be learned on it’s own (if your task is predicting neighbors). For predicting similar nodes, though (e.g., node classification), I don’t believe the middle layers of the RGCN model will learn the same kind of embedding, so it may be particularly helpful in that case.

If anyone knows otherwise, it would be great to know and to know why I am mistaken! Thanks!

1 Like