Learning nodes and edges characteristics

Hi all,

I need some suggestions for my graph problem.
I have a graph with nodes that represent words. I have labelled all the nodes with 5 categories/classes and each node is also annotated with some features (e.g. a number of letters, numerics, and symbols, size of fonts, etc). All the edges in my graph are labelled with some weight too.

My question is, how can I create a model to learn/extract characteristics (or relationship) between categories/classes in my graph?
So that if I have an unlabelled graph, the model can predict which node belongs to which classes.
A simple line of codes/flowchart as an example can hugely help me too.

Thanks a lot all and sorry if this seems trivial for some of you.

best

1 Like

Hi, did you check our example of GCN, which essentially performs the node classification task you described here.

Thanks @mufeili for the response. I tried the example before which (I think) is similar to https://docs.dgl.ai/tutorials/basics/1_first.html#step-3-define-a-graph-convolutional-network-gcn.

But my understanding is the example only considers the structure of the graph (e.g. the number of neighbours, neighbours of neighbours), and not the features in the node itself (e.g. text size, how many symbols in the node, etc.). Correct me if I’m wrong.

So, is it possible to incorporate the features of node too into GCN?

Thanks so much @mufeili.

The example is slightly different from the tutorial, where we do use the node features in place of inputs in the tutorial.

Thanks @mufeili. I look at it, but i couldn’t understand how can I load my graph (networkx.Graph()) as an input graph to do training? Thanks a lot

You can do something like

import dgl

g = dgl.DGLGraph()
g.from_networkx(nx_g) # nx_g for some networkx graph

Thanks @mufeili and I’m sorry for the unclarity. I meant, in the example of GCN, it shows
python3 train.py --dataset cora --gpu 0 --self-loop

which (I think) uses cora graph as an input graph. So If I have external graph (other than cora), how shall I incorporate it? and does it automatically recognize all the feature in the nodes too?

Thank you very much

I understand it. You will need to create a dataset class/instance yourself as the dataset can have various different formats and there are also various different tasks. Let’s use networkx for an example. You can load the graph topology as my previous reply. For loading node features, you can refer to this thread.

Thanks @mufeili, I’m looking to his thread now and study it.

My dataset is actually in json format below, which I managed to build a graph out of it. As my aim is to predict which node belongs to which label in an unlabeled graph, maybe I can build a hetero graph in dgl and feed to GCN immediately. Do you think it is better approach rather than build a networks graph and convert it to hetero graph in dgl?

{
“1”: {
“label”: “lab1”,
“fea1”: 0,
“fea2”: 1,
“fea3”: 5,
},
“2”: {
“label”: “lab3”,
“fea1”: 1,
“fea2”: 0,
“fea3”: 2,
}
}

But then, once I have a hetero graph in dgl, it is difficult for me to apply the codes on my hetero graph. Can you perhaps refer me to a simple example where you have a constructed gdl graph and apply the GCN code over it?

Thanks so much.

One question is whether you really want to use heterogeneous graph. It seems that each node has three features feat1, feat2, feat3, which are just scalars. In this case, why not just concatenate the three features and work with a homogeneous graph?

Thanks @mufeili, I did a bit of reading and it was homogeneous graph that I need. In that case, the train.py in example of GCN, load data by using this line
data = load_data(args)
So can I just feed load_data with the dgl graph (assume that I managed to convert my networkX graph with its node and edge feature into the dgl graph) ?

Many thanks

That function uses some pre-defined data loading functions for several particular datasets. For a new dataset, you will need to develop a particular data class for it following load_data.

Thanks @mufeilli,

Do you have an example or tutorial to create a data class? or page i can go to see that?

Thanks

This is the dataset class we use for cora, citeseer, and pubmed. Is it possible for you to adapt that code and create a class for your own dataset?

Thanks @mufeili,

I can change my dataset to follow the format cora.sites and cora.content. But without step-by-step instruction, I find it is very technically difficult to continue on it and adapt the code in order use train.py in example of GCN for my dataset.

I appreciate your help so far, and I hope maybe there will be a documentation on how to use the code for another dataset in the future which I can look at.

Thanks a lot.