Questions about the SubgraphX implementation

karstean · September 26, 2023, 2:17pm

Hello! I am a computer science student working on an explanation for a GNN and I stumbled over the DGL implementation of the SubgraphX which is of great interest to me but unfortunately, I have some trouble understanding the code. I would be very grateful if someone could be so kind as to explain it to me, especially the following points:

Line 139 in the subgraphx.py file reads:

        split_point = num_nodes

and I don’t quite grasp why the split_point is assigned like this? Isn’t this just the “last node”?

In the lines 204 - 208:

            # Get the largest weakly connected component in the subgraph.
            nx_graph = to_networkx(new_subg.cpu())
            largest_cc_nids = list(
                max(nx.weakly_connected_components(nx_graph), key=len)
            )

only the largest weakly connected component in the subgraph without a particular chosen node is selected but why shouldn’t one look at all components?

Furthermore, during my tests on many different graphs, the solution [0] appears disproportionally often but the node labels are randomly assigned (numbered somehow) and looking at the graphical representation there doesn’t seem to be anything special about node 0 either. Am I maybe overlooking something obvious here?

Any help would be really appreciated, I don’t know why this is so confusing to me but despite looking at both the original paper and the code I just don’t get how exactly one translates to the other or why node 0 seems to be so special in my experiments.

Thank you!

kunmukh · October 6, 2023, 9:22pm

Hey Karstean,

I ported the SubgraphX module to DGL from DIG (author’s original repo). Let me try to answer your questions:

line 139: This was a shapely implementation that I found from the author’s original repo, (https://github.com/divelab/DIG/blob/dig-stable/dig/xgraph/method/shapley.py#L198). I understood it is a placeholder updated after the first iteration.
line 204-208: I found this heuristic also from the author’s original repo, (https://github.com/DS3Lab/GraphFramEx/blob/main/code/explainer/subgraphx.py#L264). I think they did this because in a large graph there can be many smaller subgraphs; therefore they chose the largest one as that has the highest probability of having the highest shapely value.
For the node[0] issue, I have not encountered it. I would recommend testing the output on a much simpler MUTAG dataset. I created a visualization script to visualize the mutag dataset (external ref: mutag-subgraphx.py · GitHub).

Best!
Kunal Mukherjee
PhD Student @ UTD

karstean · October 12, 2023, 11:02am

Hey Kunal,

thank you so much for your answer! This helps me a lot!

system · November 11, 2023, 11:02am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.