Question about dataset

Hello, I want to know How do I use my own data set

1 Like

Can you elaborate a bit more about your scenario?

  • What task is this dataset for?
  • What’s the format of your dataset?
  • Do you have one graph or multiple graphs?
  • Do you have node/edge features?
1 Like

I want to know if i have a new dataset,no preprocessing, and i want to put it into our model, Is there a tutorial for pre-processing data to help us process the data into a form that can be entered into the model

Currently we don’t have something like that. If you have a particular dataset that you want to work on, you can share more information on it.

Ok thank you very much

Hi mufeili,

Imagine I have many small graphs (approximately 1000 nodes for each graph) and I have all nodes labelled or annotated in all graphs. In this case, I assigned 1 (out of 6 labelled) to each node. Along with the label, I do have 5 features (2 binary features with Yes or No, and 3 multi class features with range between 0-20).

For the edge, I have undirected edges in each graph with 2 features weight (float) and feel (binary integer).

My aim is to do predict an unlabelled/unannotated graphs after training. Does node classification is a what I need to look at? or do you have any suggestion? a simple implementation would help me a lot.

Thanks all

It sounds like you want to perform graph classification. See if this tutorial helps.

Thanks for the reply and I’m sorry I typed my question wrongly. What I meant is to do node classification. I come across GCN, but it seems its implementation takes one big graph instead of multiple disconnected graphs as I have.

So, could you perhaps recommend a suitable technique for me?

Many thanks

You can use dgl.batch to batch multiple disconnected graphs into multiple big graphs. Essentially you are combining the batched graph training in graph classification and the node classification example.