Hello, I’d like some help with this error:
dgl._ffi.base.DGLError: Expect number of features to match number of nodes (len(u)). Got 42 and 145 instead.
I’ve already read the previous posts on the similar error and they did not help me with my doubt.
I am currently working on a graph classification task. I currently have a list of 150 elements (from 0-149) and. My dataset consists of graphs with nodes that are part of the 150 element list. Let’s say elements 0,9,143 are present as nodes in graph 1. And let’s say graph 2 consists of nodes 1,7,9,148. All of these nodes have a feature vector of size 1024. It is very common for the same nodes in different graphs to have different feature vectors. So node id 9 in graph 1 has a feature vector that is different from the feature vector in node 2.
Now that the context is clear, the error I am facing is that there is a mismatch between the number of nodes and the number of features. For some reason, dgl says that the number of nodes in graph 1 is 143 instead of 3. I assume it’s because the highest node id in graph 1 is 143 so it thinks that there are 143 nodes in that specific graph when in reality it’s just a node id (since dgl doesn’t allow string names for nodes) and the total number of nodes is 3 in that graph. So I am unable to create the dataset class correctly. Is there anything obvious that I am currently doing wrong here? Here’s the script I am using currently
import dgl
import urllib
import pandas as pd
import torch
from dgl.data import DGLDataset
from ast import literal_eval
edges = pd.read_csv('ade20k/val_edges_int.csv')
properties = pd.read_csv('ade20k/val_graph_int_properties.csv')
edges.head()
properties.head()
class ADE20kDataset(DGLDataset):
def __init__(self):
super().__init__(name='synthetic')
def process(self):
edges = pd.read_csv('ade20k/val_edges_int.csv', converters={'feature': literal_eval})
properties = pd.read_csv('ade20k/val_graph_int_properties.csv')
self.graphs = []
self.labels = []
# Create a graph for each graph ID from the edges table.
# First process the properties table into two dictionaries with graph IDs as keys.
# The label and number of nodes are values.
label_dict = {}
# num_nodes_dict = {}
for _, row in properties.iterrows():
label_dict[row['graph_id']] = row['label']
# num_nodes_dict[row['graph_id']] = row['num_nodes']
# For the edges, first group the table by graph IDs.
edges_group = edges.groupby('image_id')
# For each graph ID...
for graph_id in edges_group.groups:
# Find the edges as well as the number of nodes and its label.
edges_of_id = edges_group.get_group(graph_id)
src = edges_of_id['src'].to_numpy()
dst = edges_of_id['dst'].to_numpy()
feature = edges_of_id['feature'].to_numpy()
print(graph_id, len(feature[0]))
print("The {}th graph has {} nodes and {} edges.".format(graph_id, src, dst))
# num_nodes = num_nodes_dict[graph_id]
label = label_dict[graph_id]
# Create a graph and add it to the list of graphs and labels.
g = dgl.graph((src, dst))
g.ndata['feat'] = feature
self.graphs.append(g)
self.labels.append(label)
# Convert the label list to tensor for saving.
self.labels = torch.LongTensor(self.labels)
def __getitem__(self, i):
return self.graphs[i], self.labels[i]
def __len__(self):
return len(self.graphs)
dataset = ADE20kDataset()
graph, label = dataset[0]
print(graph, label)
print("The length of dataset is",len(dataset))
Any suggestions as to how to solve this error?