Using nodes versus node/edge features

the · December 2, 2021, 7:34pm

What is the advantage of using features versus nodes, for example with a heterogeneous graph? For example, with the following datasets, when to use a sample node with X number of features versus a sample node with edges connecting to location, time, type, amount, unit, temperature, etc.? And assuming we categorize the continuous data, such as temp>30=“hot”, >0 && <30=“normal”, <0=“cold”.

file:samples.csv
id,location-name,time,type,amount,unit,temperature
0,aa, 1235, red, 2309857, mi/l, 31
1,bb, 3456, blue, 203975, c/kg, 18

file:locations.csv
id,name,lat,lon
0,aa,43.39849,93.30492
1,bb,34.09384,94.02940

minjie · December 3, 2021, 7:45am

I don’t think there is a general solution to this. The dataset will likely play a crucial role too. The best way may be just try and see which one is better.

the · December 4, 2021, 2:01am

Interesting! Most of the examples use features and labels in the training phase.

How to go about training a heterograph without features and labels? In Ch.5 of the documentation, an example includes features and labels. But what if we remove this part, e.g.

import numpy as np
import torch

n_users = 1000
n_items = 500
n_follows = 3000
n_clicks = 5000
n_dislikes = 500
n_hetero_features = 10
n_user_classes = 5
n_max_clicks = 10

follow_src = np.random.randint(0, n_users, n_follows)
follow_dst = np.random.randint(0, n_users, n_follows)
click_src = np.random.randint(0, n_users, n_clicks)
click_dst = np.random.randint(0, n_items, n_clicks)
dislike_src = np.random.randint(0, n_users, n_dislikes)
dislike_dst = np.random.randint(0, n_items, n_dislikes)

hetero_graph = dgl.heterograph({
(‘user’, ‘follow’, ‘user’): (follow_src, follow_dst),
(‘user’, ‘followed-by’, ‘user’): (follow_dst, follow_src),
(‘user’, ‘click’, ‘item’): (click_src, click_dst),
(‘item’, ‘clicked-by’, ‘user’): (click_dst, click_src),
(‘user’, ‘dislike’, ‘item’): (dislike_src, dislike_dst),
(‘item’, ‘disliked-by’, ‘user’): (dislike_dst, dislike_src)})

# randomly generate training masks on user nodes and click edges
hetero_graph.nodes[‘user’].data[‘train_mask’] = torch.zeros(n_users, dtype=torch.bool).bernoulli(0.6)
hetero_graph.edges[‘click’].data[‘train_mask’] = torch.zeros(n_clicks, dtype=torch.bool).bernoulli(0.6)

minjie · December 6, 2021, 11:04am

If your graph doesn’t have features, you can initialize a learnable node embedding for it or precompute some node features from graph structure (e.g., node degree, positional encoding, etc.).

If you further wish to train without labels, it becomes an unsupervised setting. Please checkout our examples here: dgl/examples/pytorch/graphsage at master · dmlc/dgl · GitHub

system · January 5, 2022, 11:05am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.