Inductive binary node classification setup

Hello everyone !
I have a question regarding the inductive binary node classification setup.
To be clear let’s consider the following setup :
I have a training dataset that consists of N graphs. each graph G_i, i =1,2,…,N

  1. For each graph the aim is to predict a binary label for its nodes. Is this the same task you presented in the GAT on the PPI dataset(that case was multiclass)?
  2. Can we use GrapheSage the same way you used GAT in that example ?
  3. Graphs in the dataset are independent , to generate the node level emebeddings I will batch them, the crossy entropy loss will not be affected (since when we batch we just construct a giant DGLgraph and then we compute the loss for all nodes)
    Thanks in advance.
  1. For each graph the aim is to predict a binary label for its nodes. Is this the same task you presented in the GAT on the PPI dataset(that case was multiclass)?

Yes, it’s the same, except that PPI is for multi-label binary classification.

  1. Can we use GrapheSage the same way you used GAT in that example ?

Yes, you only need to change the loss function and evaluation metric.

  1. Graphs in the dataset are independent , to generate the node level emebeddings I will batch them, the crossy entropy loss will not be affected (since when we batch we just construct a giant DGLgraph and then we compute the loss for all nodes)

Right.

1 Like

Thanks @mufeili for you reply.
regarding the question 2. you mean I have to use a BCELoss that’s it ?

Yes, or BCEWithLogitsLoss as in the GAT example.

1 Like

Hi @mufeili,
I Just implemented my approach according to the setup I described before. but the loss is not getting decreasing
I checked gradients and I found that they are in the order of 10**-18 ,which means they are close to zero.
any suggestions please ?

How many data points do you have for the positive class and the negative class? If there is a class imbalance issue, you may want to perform some up sampling/down sampling in data loading or reweight data points in the loss computation.

1 Like

Exactly the dataset is imbalanced , you think the focal loss can solve the problem ?

I will start with re-weighting the loss for each data point or up/down sampling. Focal loss might be also worth trying.

1 Like

I defined the loss function as follows:
loss_fcn = torch.nn.BCEWithLogitsLoss(pos_weight=torch.tensor(len(neg)/len(pos)))
where len(neg) is the number of the nodes with label = 0 and len(pos) represents the ones with label =1. But the loss decreases only from the first epoch to the second one and then it remains unchanged