Feature as a list or an array

qillbel · May 24, 2020, 3:12pm

Hey all,

I was wondering if the node’s features are not an integer but a list or an array? For example, I have 5 features and each of them has a different length.
featA=[[0,0],[1,1],[1,1]…]
featB=[[0,0],[0,0],[1,1]…]
featC=[[0,0],[0,0],[1,1]…]
featD=[[0,0,1],[0,1,1],[1,1,1]…]
featE=[[0,0,1],[0,1,1],[1,1,1]…]

I constructed my GCN layers as followed:
# input layer
self.layers.append(GraphConv(5, 5, activation=activation))
# hidden layers
self.layers.append(GraphConv(5, 100, activation=activation))
self.layers.append(GraphConv(100, 66, activation=activation))
self.layers.append(GraphConv(66, 30, activation=activation))
self.layers.append(GraphConv(30, 3, activation=activation))
# output layer
self.layers.append(GraphConv(3, 2))

However, I got an error says “ValueError: Expected target size (94372, 2), got torch.Size([94372])” with 94372 is the total nodes i used for training.

Does anybody have this problem before? Thanks

mufeili · May 25, 2020, 8:15am

In general, node features can be a tensor of shape (V, *), where V is the number of nodes and * denotes an arbitrary number of additional dimensions. Below is an example:

import dgl
import torch

g = dgl.graph([(0, 1), (1, 2), (2, 3)])
g.ndata['h'] = torch.randn(g.number_of_nodes(), 2)
g.ndata['h'] = torch.randn(g.number_of_nodes(), 2, 2)
...

For the error you observed, it’s because you have one label per node so the final prediction of your model should be of shape (V, 1). That says, you need to change the output layer from GraphConv(3, 2) to GraphConv(3, 1).

qillbel · May 25, 2020, 1:07pm

Thanks @mufelli,

I thought 2 in self.layers.append(GraphConv(3, 2)) represents the number of target labels we aim to predict (in this case whether a node is labelled with 0 or 1). So if I have 5 target labels, how do I design my output layers to identify those 5 target labels?
I have made my nodes the same as the format in Cora dataset. For example
Nodes = [[ ‘id_node’, [node_feature], target_label ], …]
Nodes = [ [‘idNodeA’, [[1], [0], [0], [1, 1, 1], [1, 1, 1, 1]], 0], … ]
Having said that, each node in Nodes has 5 features with different lengths. But it fails with message
ValueError: expected sequence of length 1 at dim 2 (got 3) when I create a tensor with
g_dgl.ndata['feature'] = th.tensor([Nodes[i][1] for i in range(len(Nodes))])
So, I was wondering does the length of each feature must be equal?
I could flatten my features to [1, 0, 0, 1, 1, 1, 1, 1, 1, 1], but I’m not sure if this is actually the right approach as it essentially says that now each node in Nodes has 10 features. Is it the right approach?

Thanks a lot for the guidance.

qillbel · May 26, 2020, 9:57am

For this, I changed from GraphConv(3, 2) to GraphConv(3, 1) as suggested, but it still throw the same error ValueError: Expected target size (94372, 1), got torch.Size([94372])

Any idea??

mufeili · May 30, 2020, 8:30am

Binary classification and multi-class classification are typically handled differently. In binary classification we only have two classes and we can pass the model output to a sigmoid activation function to get the probability of the positive class. In this case we can have GraphConv(3, 1). In multiclass classification, we will pass the model output to a softmax activation function and we will need GraphConv(3, C), where C is the number of classes.
Unless there is a strong reason to treat these 5 features separately, I will go with flattening the 5 node features.
For ValueError: Expected target size (94372, 1), got torch.Size([94372]), you need to do labels.unsqueeze(-1) to change the shape of the labels from torch.Size([94372]) to torch.Size([94372, 1]).