PinSAGE - hit rate dropping

MLK · September 29, 2021, 8:29am

Hello,

I’m preparing kind of a recommendation system using the PinSAGE implementation.
When I try to train my solution I see a strange behaviour with the hit rate. It’s going up with about 20 epochs and after that there is a big drop (you can see the difference between hit rate on 20th epoch and 30th epoch) and it is slowly going up but never reaches this first peak. I make hit rate calculation every 10 epochs. I also print the average loss.

It looks like that:

Using backend: pytorch
Epoch 0 Loss: 122.11399285888672

Epoch: 0
Hit rate: 0.3664026345325731

Epoch 1 Loss: 59.51660490036011
Epoch 2 Loss: 39.41987954711914
Epoch 3 Loss: 28.36335273361206
Epoch 4 Loss: 20.957138288497926
Epoch 5 Loss: 15.979142642974853
Epoch 6 Loss: 12.33058010673523
Epoch 7 Loss: 9.658337178230285
Epoch 8 Loss: 7.630575436592102
Epoch 9 Loss: 6.068511297225952
Epoch 10 Loss: 4.863150416135788

Epoch: 10
Hit rate: 0.39968226096408715

Epoch 11 Loss: 3.9031551814079286
Epoch 12 Loss: 3.154141663789749
Epoch 13 Loss: 2.5607559390068055
Epoch 14 Loss: 2.0545004740953448
Epoch 15 Loss: 1.6670999923944474
Epoch 16 Loss: 1.3494259605407715
Epoch 17 Loss: 1.082959036052227
Epoch 18 Loss: 0.8667914329767227
Epoch 19 Loss: 0.6906157224774361
Epoch 20 Loss: 0.5525875874459744

Epoch: 20
Hit rate: 0.4047609399672613

Epoch 21 Loss: 0.4435155667066574
Epoch 22 Loss: 0.35093367248773577
Epoch 23 Loss: 0.28542937837541105
Epoch 24 Loss: 0.23810184639692306
Epoch 25 Loss: 0.20522870375216007
Epoch 26 Loss: 0.18567989604175092
Epoch 27 Loss: 0.1728948667049408
Epoch 28 Loss: 0.16246438314020634
Epoch 29 Loss: 0.15602478300035
Epoch 30 Loss: 0.1513732394874096

Epoch: 30
Hit rate: 0.3332715864761519

Epoch 31 Loss: 0.14752700643241407
Epoch 32 Loss: 0.14355326479673386
Epoch 33 Loss: 0.1410906110405922
Epoch 34 Loss: 0.13842755448818206
Epoch 35 Loss: 0.13563246604055165
Epoch 36 Loss: 0.13422372179478406
Epoch 37 Loss: 0.13225355531275274
Epoch 38 Loss: 0.13025877215713264
Epoch 39 Loss: 0.12920271898061036
Epoch 40 Loss: 0.12738460614532232

Epoch: 40
Hit rate: 0.33801580333625986

Epoch 41 Loss: 0.12646480268985033
Epoch 42 Loss: 0.12467606457322836
Epoch 43 Loss: 0.12327845005691052
Epoch 44 Loss: 0.12198871609568596
Epoch 45 Loss: 0.12085428339242935
Epoch 46 Loss: 0.11977985402941704
Epoch 47 Loss: 0.11893932873010635
Epoch 48 Loss: 0.11785990198701621
Epoch 49 Loss: 0.11675936836749315
Epoch 50 Loss: 0.11487281288206577

Epoch: 50
Hit rate: 0.3439139145899462

Is it a normal behaviour or something can be wrong inside my implementation?

Best regards

BarclayII · September 30, 2021, 6:26am

Seems that the training loss is decreasing but the validation hit rate decreases after some time. This may be related to overfitting. Did you monitor the validation hit rate every epoch? I would suggest tuning the hyperparameters a bit (dropout, learning rate, etc.).

MLK · September 30, 2021, 7:14am

Thank you for your answer. I didn’t monitor the validation hit rate every epoch, because it takes a lot of additional time, but I can try to do that for tests.
Ok, I’ll try to tune hyperparameters. You’ve mentioned dropout, did you mean that line?:

github.com

dmlc/dgl/blob/master/examples/pytorch/pinsage/layers.py#L119

    
      
                  return torch.stack(projections, 1).sum(1)
          
          
class WeightedSAGEConv(nn.Module):
              def __init__(self, input_dims, hidden_dims, output_dims, act=F.relu):
                  super().__init__()
          
          
        self.act = act
                  self.Q = nn.Linear(input_dims, hidden_dims)
                  self.W = nn.Linear(input_dims + hidden_dims, output_dims)
                  self.reset_parameters()
                  self.dropout = nn.Dropout(0.5)
          
          
    def reset_parameters(self):
                  gain = nn.init.calculate_gain('relu')
                  nn.init.xavier_uniform_(self.Q.weight, gain=gain)
                  nn.init.xavier_uniform_(self.W.weight, gain=gain)
                  nn.init.constant_(self.Q.bias, 0)
                  nn.init.constant_(self.W.bias, 0)
          
          
    def forward(self, g, h, weights):
                  """

MLK · October 1, 2021, 7:24am

I decreased learning rate a little bit and I think that situation looks even worse. Now it is similar as in the post above, but it happens later and slower after about 80 epochs and after about 120 epochs hit_rate starts to slowly increasing.:

Epoch: 80
Hit rate: 0.4032423859350381

Epoch 81 Loss: 0.2581634405851364
Epoch 82 Loss: 0.2428457249403
Epoch 83 Loss: 0.23148725454509259
Epoch 84 Loss: 0.22190409618616105
Epoch 85 Loss: 0.21298118795454501
Epoch 86 Loss: 0.20520291109383107
Epoch 87 Loss: 0.19696748492121696
Epoch 88 Loss: 0.1918014925122261
Epoch 89 Loss: 0.18545100907981396
Epoch 90 Loss: 0.1799795409142971

Epoch: 90
Hit rate: 0.38402259884072565

Epoch 91 Loss: 0.17612866361439228
Epoch 92 Loss: 0.17288435357809068
Epoch 93 Loss: 0.16822114707529545
Epoch 94 Loss: 0.1663521445095539
Epoch 95 Loss: 0.162057680696249
Epoch 96 Loss: 0.15988175658881665
Epoch 97 Loss: 0.15793034437298775
Epoch 98 Loss: 0.15561307513713837
Epoch 99 Loss: 0.15429003842175007
Epoch 100 Loss: 0.15292980536073447

Epoch: 100
Hit rate: 0.3525585282978707

Epoch 101 Loss: 0.1505687129944563
Epoch 102 Loss: 0.1489531233087182
Epoch 103 Loss: 0.1476532801836729
Epoch 104 Loss: 0.146007649526
Epoch 105 Loss: 0.14482088580727578
Epoch 106 Loss: 0.14373772313445807
Epoch 107 Loss: 0.14308841288834812
Epoch 108 Loss: 0.14182694736868143
Epoch 109 Loss: 0.14015855582058429
Epoch 110 Loss: 0.14003006000071763

Epoch: 110
Hit rate: 0.3334166145347977

Epoch 111 Loss: 0.13900795098394156
Epoch 112 Loss: 0.13840806238353254
Epoch 113 Loss: 0.13697638155519962
Epoch 114 Loss: 0.1364779077321291
Epoch 115 Loss: 0.13613245136290789
Epoch 116 Loss: 0.13490713879466057
Epoch 117 Loss: 0.13382226914912462
Epoch 118 Loss: 0.13362179309129715
Epoch 119 Loss: 0.13275445359200239
Epoch 120 Loss: 0.1324360818490386

Epoch: 120
Hit rate: 0.33185395635459364

Epoch 121 Loss: 0.13163742046058177
Epoch 122 Loss: 0.13078813998401165
Epoch 123 Loss: 0.13059430740028619
Epoch 124 Loss: 0.12974424210190774
Epoch 125 Loss: 0.12964461026340723
Epoch 126 Loss: 0.12887703198194503
Epoch 127 Loss: 0.12836620657145978
Epoch 128 Loss: 0.12777837323397398
Epoch 129 Loss: 0.12724771236628293
Epoch 130 Loss: 0.12627794947475196

Epoch: 130
Hit rate: 0.3394316804286148

Epoch: 140
Hit rate: 0.34672062423262323

Epoch: 150
Hit rate: 0.35297450167557454

MLK · October 7, 2021, 11:24am

I tried tuning that with changing parameters such as “random_walk_lenght”, “num_random_walks”, “num_neighbors” and learning_rate, but situation is always similar. It depends on learning rate parameter when it starts to happen. If the learning rate is lower, this situation will occur later, and if the learning rate is higher, it will occur faster. In both possibilities it happens near the similar value of a hit_rate.
Is it possibile that it is maximum value that graph can achieve? Or maybe it could be something wrong with my dataset?

Best regards

MLK · November 2, 2021, 10:41am

I tried tuning different parameters, but all of my tests are having similar behaviour. Is it possible that this maximum value (before drop) is the maximum value that this graph can achieve?

Best regards

BarclayII · November 8, 2021, 6:56am

I think so.

In general, if your dataset is not large and feature is not rich, then GNNs like PinSAGE may not gain much benefit over traditional methods like MF or FM.

system · December 8, 2021, 6:57am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.