Heterogeneous graph nodes/edges features

About Heterogeneous graph!

From the document of DGL, there is an assumption, nodes in the same type have no different features and share the same feature vector, right? What should I do, if my nodes in one type have different feature, these nodes in one type need different feature vector to distinguish each other.

Nodes of the same type can have different features; the same feature name only need to have the same shape. For instance, here is how you would assign a feature named x for a node type user:

g.nodes['user'].data['x'] = torch.randn(5, 4)

Here the first dimension of the assigned tensor represents different nodes. So you essentially assign each node a different 4-dimensional vector in this case.

Which particular document are you referring to? Could you give us the link? Maybe the content there is problematic.

https://docs.dgl.ai/guide/graph-heterogeneous.html#working-with-multiple-types

I generate a Heterogeneous graph by this link.
Thanks for your example! Maybe I make something wrong.

Now, I have two problems:

  1. My graph generate code is :
    data_dict = {
    ('user', 'follows', 'user'): (torch.tensor([0, 0, 0, 0]), torch.tensor([0, 0, 0, 0])),
    ('user', 'follows', 'topic'): (torch.tensor([0, 0, 0, 0]), torch.tensor([1, 2, 1, 1])),
    ('user', 'plays', 'game'): (torch.tensor([0, 0, 0, 0]), torch.tensor([3, 4, 1, 1]))
    }
    hg = dgl.heterograph(data_dict)
    after I run
    hg.nodes['user'].data['x'] = torch.randn(5, 4)
    the interpreter gives an error:
    dgl._ffi.base.DGLError: Expect number of features to match number of nodes (len(u)). Got 5 and 1 instead.

But if I run
hg.nodes['user'].data['x'] = torch.randn(1, 4)
there is no errors.
I don’t know how to fix it.

  1. I really want to get an example of generating a heterogeneous graph from disk file. I read the example of generating a homogeneous graph from disk file https://github.com/dglai/WWW20-Hands-on-Tutorial/blob/master/basic_tasks/1_load_data.ipynb. But I am still confused.

Re 1: Since all IDs appeared for user node type are 0, DGL can only infer that there is only one user node. So torch.randn(5, 4) fails but torch.randn(1, 4) succeeds. To specify the number of nodes manually, you need to specify the num_nodes_dict argument.

Re 2: Building a heterogeneous graph from disk files usually simply involves loading the connections for each edge type from the disk. An example would be loading user-follows-user, user-follows-topic and user-plays-game from three separate CSV files and construct the data_dict above.

1 Like

Re 1. : it work! Thanks a lot!

Re 2. : I think I know how to use it :smile:

Must nodes be sorted when constructing heterogeneous graphs? If not sorted, the number of nodes in the graph is the maximum ID.

if If three nodes in the graph are numbered 0,1,2 and use hg.data[‘x’] = torch.randn(3, 50) to generate features. Are the features of each node numbered sequentially?

  1. The node IDs should be consecutive integers starting from 0. If you do not explicitly specify the number of nodes by construction, it will be simply max node ID + 1.
  2. hg.data[‘x’][i] will give the feature 'x' of the node with ID i.

I have some pre-labeled nodes. Should I reorder them? Such as nodes 2, 5, 14, edges:2-5, 5-14 should be re labeled as nodes:0, 1, 2, edges: 0-1,1-2?

If you only have 3 nodes rather than 14 nodes, then yes.

Thanks. :grinning: . I’m trying to label edges in a heterogeneous graph. The following code doesn’t work. What should I do?

                g.edges['A'].data['x'] = torch.tensor(w_w_ids)

What’s the error message? Can you provide a code snippet for reproducing the issue?