Help with a Binary Classification model

Hello DGL Community!
I’m sorry if that post is too specific but I’m trying to get my binary classification GNN model to learn from my small database and it seems like I have a serious problem. I will really appreciate any help or insights on how to deal with it.
I have a database of 48 graphs with 4 node features each. Each graph contains about 150 nodes and I’m trying to do graph classification with binary labels. The problem is, that every time I run the code, I get a different set of predictions.
I have two guesses here:

  1. I have a bug in my code that cause that issue - I already checked that the trainset is not randomized and it isn’t.
  2. The model is too unstable because of a bad database (too small).

Here is a link to my code

The objective is a proof of concept so good accuracy is not necessary but I have to make it work. Does anyone have any suggestions?

Many thanks!
David

Hi David:

You may try control variable method to debug your code.

  1. If you doubt something wrong with the dataset, replace it with a buildin dataset to see whether it can train.
  2. Try your dataset with build in model to see whether it cause same problem.

Due to random initializations and random order of iterations in dataloader, it is natural that every time the model converges to a different point, hence having different predictions.
That being said, it’s hard to tell what may go wrong by inspecting the code alone (especially since your code does not crash), and we need more information. For instance, does your training loss decrease? Does your training accuracy decrease? Different answers to the questions above will lead to different debugging processes (e.g. if your training loss decreases but your validation/test accuracy is horrible, then overfitting is the problem. If your training loss doesn’t decrease at all, then you need to see if the parameter gradients make sense or not).

Thank you very much for your response!

  1. Actually, I thought about doing so but the thing is, I used the function load_graphs from a bin file I saved all graphs in and most datasets I found are in a completely different format so this test is a bit tricky.
  2. This one is a little easier and I based my code on an already built-in model from the DGL documentation and only changed a few things to make it work with my dataset.

Maybe I’ll try to be more specific on my problem in a reply below.

Thanks.

Hi and many thanks for the response!
What part of the dataloader is randomized? Because I actually removed the random part from the sampler. I did it because the labels are in a different list so I didn’t want them to mix up with the wrong graphs. As I tried checking, the graph objects can contain the graph labels as well. I will try to add randomization later so we will make sure that it learns correctly.

Right now I’m not sure about a few things and I would like your insights:

  1. If I want to do a binary classification instead of multi-class one, is it OK what I did there with binary_cross_entropy and adding the argmax function for the prediction? The thing is, that the fully connected layer returns two values in a tensor for me and I need a prediction of 1/0. I’m not sure I did it in the correct way.
  2. During debugging, I add an error of not including gradients. I’m guessing that it happened because of the argmax function and I added the line “loss = torch.autograd.Variable(loss, requires_grad = True)”. I wanted to make sure that this is the right way and that I don’t have to implement gradient descent on my own this way.
  3. Is learning on a batch size of only 1 graph each time is OK or it has to be more than that? From what I checked in the documentation, I got that it is only for the performance of the code and not actually the results. I hope I’m not doing a mistake there. My dataset is quite small so no issue in looping on each graph separately.

Here is a link to my code again: AIPDORCS/InitialGNN.py at master · Alonf4/AIPDORCS (github.com)

I’m sorry about how long this reply is and I’d really appreciate your help as I’m not an expert.
Thank you very much!
Alon

It seems that you are adding argmax during loss computation. I don’t think that works because argmax is not a continuous function, hence no gradients.

This is also not correct because you are essentially computing the loss gradient to the loss itself. The rest of the model parameters will not receive any gradients. I would recommend you go through the PyTorch tutorials. There are also a couple binary classification tutorials online.

This should be fine.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.