If batch training is done with DGL.batch() in GATConv, will attention coefficients be generated for the edges of uncorrelated subgraphs and affect subsequent aggregations? If so, how can I avoid this problem in batch training? I hope there are kind people to solve my confusion!
If I understood you correctly, are you suspecting that the softmax for attention coefficients is normalized across all the edges in the graph? Softmax in GAT is normalized across only the neighbors (i.e. summing up the coefficient of incoming edges of any node should give you 1). So batching graphs together will not affect subsequent aggregations in any way.
I understand, thank you for the answer！