Question about dataset

liulangmagong · April 6, 2020, 9:42am

Hello, I want to know How do I use my own data set

mufeili · April 6, 2020, 10:12am

Can you elaborate a bit more about your scenario?

What task is this dataset for?
What’s the format of your dataset?
Do you have one graph or multiple graphs?
Do you have node/edge features?

liulangmagong · April 6, 2020, 11:42am

I want to know if i have a new dataset,no preprocessing, and i want to put it into our model, Is there a tutorial for pre-processing data to help us process the data into a form that can be entered into the model

mufeili · April 6, 2020, 12:58pm

Currently we don’t have something like that. If you have a particular dataset that you want to work on, you can share more information on it.

liulangmagong · April 6, 2020, 1:11pm

Ok thank you very much

qillbel · April 8, 2020, 3:23am

Hi mufeili,

Imagine I have many small graphs (approximately 1000 nodes for each graph) and I have all nodes labelled or annotated in all graphs. In this case, I assigned 1 (out of 6 labelled) to each node. Along with the label, I do have 5 features (2 binary features with Yes or No, and 3 multi class features with range between 0-20).

For the edge, I have undirected edges in each graph with 2 features weight (float) and feel (binary integer).

My aim is to do predict an unlabelled/unannotated graphs after training. Does node classification is a what I need to look at? or do you have any suggestion? a simple implementation would help me a lot.

Thanks all

mufeili · April 8, 2020, 7:50am

It sounds like you want to perform graph classification. See if this tutorial helps.

qillbel · April 9, 2020, 1:13pm

Thanks for the reply and I’m sorry I typed my question wrongly. What I meant is to do node classification. I come across GCN, but it seems its implementation takes one big graph instead of multiple disconnected graphs as I have.

So, could you perhaps recommend a suitable technique for me?

Many thanks

mufeili · April 9, 2020, 5:11pm

You can use dgl.batch to batch multiple disconnected graphs into multiple big graphs. Essentially you are combining the batched graph training in graph classification and the node classification example.