Bipartite graph creation

Hi everyone.

I got the following datadict:

{('person', 'link1', 'thing'): (tensor([0, 0, 1, 1]), tensor([2, 3, 2, 3])), ('person', 'link2', 'person'): (tensor([0]), tensor([1]))}

However, when creating the graph, I get

Graph(num_nodes={'person': 2, 'thing': 4},
      num_edges={('person', 'link1', 'thing'): 4, ('person', 'link2', 'person'): 1},
      metagraph=[('person', 'thing', 'link1'), ('person', 'person', 'link2')])

I was supossed to have just 2 nodes of type “thing”, but I get 4 . What is the problem here? Just nodes 2 and 3 should appear, but they got duplicated.

Thank you all.

Graph creation assumes that the node IDs are always consectuive integers starting from 0. So if you supply [2, 3, 2, 3] then it will assume that the node IDs will be [0, 1, 2, 3], thus having four nodes.

If your node IDs are not consecutive integers starting from 0, I’m afraid you have to relabel them yourself.

1 Like

Thanks for your quick answer.

My nodes are consecutive integers, but 0 and 1 are type ‘person’ and 2 and 3 type ‘thing’.

How could I relabel them?

Sorry I meant each node type should have its own consecutive integers starting from 0. Like “person” nodes should have IDs starting from 0, and “thing” should also have IDs starting from 0. Same ID with different node types refer to different nodes.

Say that you have a pandas DataFrame with the following types and IDs

    type  ID
0      2  21
1      1  90
2      4  15
3      2  42
4      0  34
5      4  40
6      3  85
7      2  62
8      4  74
9      0  32
10     3  80
11     4   1
12     4  56
13     0  28
14     3  24
15     0  41
16     0  91
17     0  97
18     2  54

You could obtain per-type relabeled IDs via:

df['relabeled'] = df.groupby('type').transform(lambda x: np.arange(x.shape[0]))
    type  ID  relabeled
0      2  21          0
1      1  90          0
2      4  15          0
3      2  42          1
4      0  34          0
5      4  40          1
6      3  85          0
7      2  62          2
8      4  74          2
9      0  32          1
10     3  80          1
11     4   1          3
12     4  56          4
13     0  28          2
14     3  24          2
15     0  41          3
16     0  91          4
17     0  97          5
18     2  54          3

Then you can use type and relabeled columns to create your graph.

1 Like

If my nodes and relationships are already labelled by unique identifiers like below:

Person.csv

Name ID
john  a001
mary c0ba
jack  123a 

Relation.csv

ID1  ID2  relations
a001 c0ba friends
123a c0ba friends

Given that all node IDs in my data are already unique identifiers, do I also need to convert all node ids to consecutive integers from 0 to n for each type? Is this a usability limitation?

Yes. This is primarily for efficiency consideration since graph neural networks will eventually compute with tensors (which are indexed with consecutive integers by nature).

Can this conversion be internalized by dgl in data loader? In loading a record for each node type, just create an index identifier starting from 0?

Sounds like a reasonable feature. We will consider it when we develop pipelines for tabular data.