Hello! I am a computer science student working on an explanation for a GNN and I stumbled over the DGL implementation of the SubgraphX which is of great interest to me but unfortunately, I have some trouble understanding the code. I would be very grateful if someone could be so kind as to explain it to me, especially the following points:
Line 139 in the subgraphx.py file reads:
split_point = num_nodes
and I don’t quite grasp why the split_point is assigned like this? Isn’t this just the “last node”?
In the lines 204 - 208:
# Get the largest weakly connected component in the subgraph.
nx_graph = to_networkx(new_subg.cpu())
largest_cc_nids = list(
max(nx.weakly_connected_components(nx_graph), key=len)
)
only the largest weakly connected component in the subgraph without a particular chosen node is selected but why shouldn’t one look at all components?
Furthermore, during my tests on many different graphs, the solution [0] appears disproportionally often but the node labels are randomly assigned (numbered somehow) and looking at the graphical representation there doesn’t seem to be anything special about node 0 either. Am I maybe overlooking something obvious here?
Any help would be really appreciated, I don’t know why this is so confusing to me but despite looking at both the original paper and the code I just don’t get how exactly one translates to the other or why node 0 seems to be so special in my experiments.
Thank you!