Bipartite graph creation

ogggcar · November 8, 2021, 4:00pm

Hi everyone.

I got the following datadict:

{('person', 'link1', 'thing'): (tensor([0, 0, 1, 1]), tensor([2, 3, 2, 3])), ('person', 'link2', 'person'): (tensor([0]), tensor([1]))}

However, when creating the graph, I get

Graph(num_nodes={'person': 2, 'thing': 4},
      num_edges={('person', 'link1', 'thing'): 4, ('person', 'link2', 'person'): 1},
      metagraph=[('person', 'thing', 'link1'), ('person', 'person', 'link2')])

I was supossed to have just 2 nodes of type “thing”, but I get 4 . What is the problem here? Just nodes 2 and 3 should appear, but they got duplicated.

Thank you all.

BarclayII · November 9, 2021, 2:11am

Graph creation assumes that the node IDs are always consectuive integers starting from 0. So if you supply [2, 3, 2, 3] then it will assume that the node IDs will be [0, 1, 2, 3], thus having four nodes.

If your node IDs are not consecutive integers starting from 0, I’m afraid you have to relabel them yourself.

ogggcar · November 9, 2021, 6:08am

Thanks for your quick answer.

My nodes are consecutive integers, but 0 and 1 are type ‘person’ and 2 and 3 type ‘thing’.

How could I relabel them?

BarclayII · November 9, 2021, 7:14am

Sorry I meant each node type should have its own consecutive integers starting from 0. Like “person” nodes should have IDs starting from 0, and “thing” should also have IDs starting from 0. Same ID with different node types refer to different nodes.

Say that you have a pandas DataFrame with the following types and IDs

    type  ID
0      2  21
1      1  90
2      4  15
3      2  42
4      0  34
5      4  40
6      3  85
7      2  62
8      4  74
9      0  32
10     3  80
11     4   1
12     4  56
13     0  28
14     3  24
15     0  41
16     0  91
17     0  97
18     2  54

You could obtain per-type relabeled IDs via:

df['relabeled'] = df.groupby('type').transform(lambda x: np.arange(x.shape[0]))

    type  ID  relabeled
0      2  21          0
1      1  90          0
2      4  15          0
3      2  42          1
4      0  34          0
5      4  40          1
6      3  85          0
7      2  62          2
8      4  74          2
9      0  32          1
10     3  80          1
11     4   1          3
12     4  56          4
13     0  28          2
14     3  24          2
15     0  41          3
16     0  91          4
17     0  97          5
18     2  54          3

Then you can use type and relabeled columns to create your graph.

lingvisa · November 11, 2021, 12:33am

If my nodes and relationships are already labelled by unique identifiers like below:

Person.csv

Name ID
john  a001
mary c0ba
jack  123a

Relation.csv

ID1  ID2  relations
a001 c0ba friends
123a c0ba friends

Given that all node IDs in my data are already unique identifiers, do I also need to convert all node ids to consecutive integers from 0 to n for each type? Is this a usability limitation?

BarclayII · November 11, 2021, 8:55am

Yes. This is primarily for efficiency consideration since graph neural networks will eventually compute with tensors (which are indexed with consecutive integers by nature).

lingvisa · November 11, 2021, 2:21pm

Can this conversion be internalized by dgl in data loader? In loading a record for each node type, just create an index identifier starting from 0?

BarclayII · November 15, 2021, 7:21am

Sounds like a reasonable feature. We will consider it when we develop pipelines for tabular data.

system · December 15, 2021, 7:21am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.