Creating a Heterogeneous Graph with nodes having multi label

qqfox · August 8, 2021, 4:36pm

Hi there,

I am working on building custom dataset for a Heterogeneous graph to perform node classification for my thesis. The dataset contains 3 types of nodes (like (A, B, C) and edge happens between AC, BC as pic below.
I read in document that each row is considered as 1 node, however in my graph, for example, the nodes can have multiple labels (node type C). This make the output of num_nodes is always higher than num_edges.
To sum up, I am not sure how to build my custom dataset properly as the labels can be considered important feature of nodes (A or B)
Any suggestions are highly appreciate with great thanks
Have a nice day to all!

BarclayII · August 9, 2021, 2:11am

Not sure what you meant by “labels”. Is it something you want to predict for node type C but available in node type A and B?

Maybe it will be helpful if you can describe the schema of your dataset and the task you are working on.

qqfox · August 9, 2021, 3:27am

Hi Barclayll,

Many thanks for your support
The schema is somehow like given a node type A or B, the output is to predict which node type C that the given node belongs/connected to.
The node type C (or the labels) is predefined. For example, I have:

Node type A is a resume in which the candidate has worked on several jobs before. These jobs belong to several occupational categories (type C / labels). Therefore, I want to link the node A - resume to these labels. This step is confused to me as how to keep this features of node A - resume. If separating each row is a node, I am not sure how to keep this representation.
Node type B is job post, and similarly, one job post also can belong to several labels/ node type C.
after solving how to present these relations/labeling, how to load the dataset as input to a GNNs models but still keep the features is a nut for me at this time.
Sorry for any annoyance. This is very important to me

BarclayII · August 9, 2021, 1:17pm

In this case, could you treat the whole problem as a multi-label node classification task? If your number of categories is few, usually there isn’t much benefit expressing the problem as link prediction, as each category will usually have a large number of connections which quickly smooth things out.

As per building a graph, you will usually start with a collection of entity tables consisting of node features, and a collection of binary relation tables consisting of connections between entities as well as the features of relations. In your case, it’s best starting off with a resume table (with resume’s own features only), a job post table (with job post’s own features only), and maybe a binary relation table between resumes and/or job posts.

qqfox · August 11, 2021, 1:31pm

Many thanks for your suggestion. you’re so helpful and kind as always. I will try this approach and update

system · September 10, 2021, 1:32pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.