so I would like to know how the batching for graph classification works.
I understand what is achieved by block diagonalizing the batch. But I dont understand how you apply the matrix to that block matrix.
For example I have A (adjacency matrix of size [n x n] ) and my feature matrix X [n x p] where p is the number of features. So my weight matrix W has to be size [p x f] where f is an arbitrary size.
In that sense A x X x W would give me a [n x f]-sized matrix.
So now when I want to batch multiple graphs. I create a sparse block diagonal matrix for each A and X of the graphs in my batch. I can still multiply these two matrices. However , how would then the weight matrix be applied to that sparse matrix.
For example lets say I have a batch size of three and all three graphs have the same number of nodes n.
Then the new A is of size [3n x 3n] X is then of size [3n x p] and when multiplying these two matrices. I get a matrix of size [3n x p]. And my W matrix is [pxf] The resulting matrix is then [3n x f]. how do I then get then three (for each graph one ) prediction from that output matrix