Learning nodes and edges characteristics

qillbel · March 18, 2020, 12:16pm

Hi all,

I need some suggestions for my graph problem.
I have a graph with nodes that represent words. I have labelled all the nodes with 5 categories/classes and each node is also annotated with some features (e.g. a number of letters, numerics, and symbols, size of fonts, etc). All the edges in my graph are labelled with some weight too.

My question is, how can I create a model to learn/extract characteristics (or relationship) between categories/classes in my graph?
So that if I have an unlabelled graph, the model can predict which node belongs to which classes.
A simple line of codes/flowchart as an example can hugely help me too.

Thanks a lot all and sorry if this seems trivial for some of you.

best

mufeili · March 18, 2020, 12:35pm

Hi, did you check our example of GCN, which essentially performs the node classification task you described here.

qillbel · March 18, 2020, 12:58pm

Thanks @mufeili for the response. I tried the example before which (I think) is similar to https://docs.dgl.ai/tutorials/basics/1_first.html#step-3-define-a-graph-convolutional-network-gcn.

But my understanding is the example only considers the structure of the graph (e.g. the number of neighbours, neighbours of neighbours), and not the features in the node itself (e.g. text size, how many symbols in the node, etc.). Correct me if I’m wrong.

So, is it possible to incorporate the features of node too into GCN?

Thanks so much @mufeili.

mufeili · March 18, 2020, 5:22pm

The example is slightly different from the tutorial, where we do use the node features in place of inputs in the tutorial.

qillbel · April 6, 2020, 7:25pm

Thanks @mufeili. I look at it, but i couldn’t understand how can I load my graph (networkx.Graph()) as an input graph to do training? Thanks a lot

mufeili · April 7, 2020, 12:35am

You can do something like

import dgl

g = dgl.DGLGraph()
g.from_networkx(nx_g) # nx_g for some networkx graph

qillbel · April 7, 2020, 8:10am

Thanks @mufeili and I’m sorry for the unclarity. I meant, in the example of GCN, it shows
python3 train.py --dataset cora --gpu 0 --self-loop

which (I think) uses cora graph as an input graph. So If I have external graph (other than cora), how shall I incorporate it? and does it automatically recognize all the feature in the nodes too?

Thank you very much

mufeili · April 7, 2020, 3:17pm

I understand it. You will need to create a dataset class/instance yourself as the dataset can have various different formats and there are also various different tasks. Let’s use networkx for an example. You can load the graph topology as my previous reply. For loading node features, you can refer to this thread.

qillbel · April 7, 2020, 7:42pm

Thanks @mufeili, I’m looking to his thread now and study it.

My dataset is actually in json format below, which I managed to build a graph out of it. As my aim is to predict which node belongs to which label in an unlabeled graph, maybe I can build a hetero graph in dgl and feed to GCN immediately. Do you think it is better approach rather than build a networks graph and convert it to hetero graph in dgl?

{
“1”: {
“label”: “lab1”,
“fea1”: 0,
“fea2”: 1,
“fea3”: 5,
},
“2”: {
“label”: “lab3”,
“fea1”: 1,
“fea2”: 0,
“fea3”: 2,
}
}

But then, once I have a hetero graph in dgl, it is difficult for me to apply the codes on my hetero graph. Can you perhaps refer me to a simple example where you have a constructed gdl graph and apply the GCN code over it?

Thanks so much.

mufeili · April 8, 2020, 7:57am

One question is whether you really want to use heterogeneous graph. It seems that each node has three features feat1, feat2, feat3, which are just scalars. In this case, why not just concatenate the three features and work with a homogeneous graph?

qillbel · April 9, 2020, 1:18pm

Thanks @mufeili, I did a bit of reading and it was homogeneous graph that I need. In that case, the train.py in example of GCN, load data by using this line
data = load_data(args)
So can I just feed load_data with the dgl graph (assume that I managed to convert my networkX graph with its node and edge feature into the dgl graph) ?

Many thanks

mufeili · April 9, 2020, 5:13pm

That function uses some pre-defined data loading functions for several particular datasets. For a new dataset, you will need to develop a particular data class for it following load_data.

qillbel · April 10, 2020, 2:22pm

Thanks @mufeilli,

Do you have an example or tutorial to create a data class? or page i can go to see that?

Thanks

mufeili · April 10, 2020, 5:43pm

This is the dataset class we use for cora, citeseer, and pubmed. Is it possible for you to adapt that code and create a class for your own dataset?

qillbel · April 10, 2020, 7:03pm

Thanks @mufeili,

I can change my dataset to follow the format cora.sites and cora.content. But without step-by-step instruction, I find it is very technically difficult to continue on it and adapt the code in order use train.py in example of GCN for my dataset.

I appreciate your help so far, and I hope maybe there will be a documentation on how to use the code for another dataset in the future which I can look at.

Thanks a lot.