Hello, I am attempting to construct a heterogeneous graph consisting of node types such as user, item, and category, alongside edge types like view, buy, add to cart, and item belongs to category. The issue I am facing is that, when I introduce various types of edges, the user nodes end up being replicated multiple times. For instance, let’s consider users 1 and 2, along with products 1, 2, and 3. If user 1 purchases items 1 and 2, and views item 3, the graph erroneously adds three separate nodes for user 1 instead of just one.
can someone please help me.
Can you post your example code and output?
I don’t see the problem you mentioned in the test code:
graph_data = {
('user', 'purchases', 'item'): (torch.tensor([0, 0]), torch.tensor([0, 1])),
('user', 'views', 'item'): (torch.tensor([0]), torch.tensor([2]))
}
g = dgl.heterograph(graph_data)
print("ntypes:", g.ntypes)
print("etypes:", g.etypes)
print("canonical_etypes:", g.canonical_etypes)
print(g)
Which produces:
ntypes: ['item', 'user']
etypes: ['purchases', 'views']
canonical_etypes: [('user', 'purchases', 'item'), ('user', 'views', 'item')]
Graph(num_nodes={'item': 3, 'user': 1},
num_edges={('user', 'purchases', 'item'): 2, ('user', 'views', 'item'): 1},
metagraph=[('user', 'item', 'purchases'), ('user', 'item', 'views')])
thanks here’s my code:
df_categories_item = pd.read_csv('/content/product-categories.csv', sep = ';')
df_products = pd.read_csv('/content/products.csv', on_bad_lines = 'skip', sep = ';')
df_view = pd.read_csv('/content/train-item-views.csv')
df_purchase = pd.read_csv('/content/train-purchases.csv', sep = ';')
data_dict = {
('user', 'view', 'item'): (th.tensor(df_view['userId'].values.astype('int64')),
th.tensor(df_view['itemId'].values.astype('int64'))),
('user', 'purchase', 'item'): (th.tensor(df_purchase['userId'].values.astype('int64')),
th.tensor(df_purchase['itemId'].values.astype('int64'))),
('item', 'is_from', 'category'): (th.tensor(df_categories_item['itemId'].values.astype('int64')),
th.tensor(df_categories_item['categoryId'].values.astype('int64')))
}
g = dgl.heterograph(data_dict)
and also I tried using a small dataset, it also generates lots of users:
class CustomGraphDataset(DGLDataset):
def __init__(self):
super().__init__(name = 'hetera_graph')
def process(self):
node_types_df = pd.read_csv("/content/test_node_types.csv")
nodes_df = pd.read_csv("/content/test_nodes.csv")
edges_df = pd.read_csv("/content/test_edges.csv")
buys_edges = edges_df.loc[edges_df["edge_type"] == "buys"]
viewa_edges = edges_df.loc[edges_df["edge_type"] == "view"]
belongs_edges = edges_df.loc[edges_df["edge_type"] == "belongs"]
addToC_edges = edges_df.loc[edges_df["edge_type"] == "add_to_cart"]
type_id_dict = dict(zip(node_types_df['node_type'],
node_types_df['type_id']))
nodes = nodes_df['node_id'].tolist()
node_types = nodes_df['node_type'].map(type_id_dict).tolist()
edge_weights = th.from_numpy(edges_df["weight"].to_numpy())
data_dict = {
('user', 'view', 'item'): (th.tensor(viewa_edges['source'].values.astype('int64')),
th.tensor(viewa_edges['target'].values.astype('int64'))),
('user', 'buys', 'item'): (th.tensor(buys_edges['source'].values.astype('int64')),
th.tensor(buys_edges['target'].values.astype('int64'))),
('item', 'belongs', 'category'): (th.tensor(belongs_edges['source'].values.astype('int64')),
th.tensor(belongs_edges['target'].values.astype('int64'))),
('user', 'add_to_cart', 'item'): (th.tensor(addToC_edges['source'].values.astype('int64')),
th.tensor(addToC_edges['target'].values.astype('int64')))
}
self.graph = dgl.heterograph(data_dict)
print(self.graph)
for e_t in self.graph.etypes:
self.graph.edges[e_t].data["weight"] = th.from_numpy(
edges_df[edges_df["edge_type"] == e_t]['weight'].to_numpy())
def __getitem__(self, idx):
return self.graph
def __len__(self):
return 1
dataset = CustomGraphDataset()
dataset.process()
dataset[0]
output:
Graph(num_nodes={‘category’: 6, ‘item’: 9, ‘user’: 10},
num_edges={(‘item’, ‘belongs’, ‘category’): 6, (‘user’, ‘add_to_cart’, ‘item’): 2, (‘user’, ‘buys’, ‘item’): 4, (‘user’, ‘view’, ‘item’): 9},
metagraph=[(‘item’, ‘category’, ‘belongs’), (‘user’, ‘item’, ‘add_to_cart’), (‘user’, ‘item’, ‘buys’), (‘user’, ‘item’, ‘view’)])
(the graph has only 3 users)
This is strange. Did you check your dataset (csv files)? Is it possible to post a sample of the data you are using (the small dataset)?
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.