Wrong nodes removed from graph

gudeh · March 14, 2023, 10:02pm

Hi guys.

My implementation has something wrong when removing nodes with “graph.remove_nodes()”. I am trying to remove all nodes which have features or labels with values of “-1”, although on some situations I can still see nodes with -1 on feature values.

Maybe the association of node IDs and the ndata I am inputing is wrong. Are the ndata values supposed to be sorted? I thought that having an “id” column on my pandas dataframe would be enough.

It is hard to determine which nodes were removed, since DGL renumbers the nodes.

gudeh · March 14, 2023, 10:04pm

As you can see I had some -1 values on placementHeat label, I don’t see them anymore after removing the nodes. Although, I still see features with -1.

gudeh · March 14, 2023, 10:24pm

From this post: Graph Node Ordering, it seems it is expected for the source of ndata values to be ordered (I suppose it doesn’t matter if you have and ‘id’ column on the dataframe).

Hope for someone to clarify if my interpretation is correct. If so, it would be nice for this situtation to be more clear in the documentation, I see no mentioning of ordering of input data here: dgl.DGLGraph.ndata — DGL 1.1 documentation

I just sorted the nodes_data dataframe and I don’t see any negative values after removed_nodes. I will investigate further to find out.

Rhett-Ying · March 15, 2023, 12:11am

could you just share how you generate the node IDs whose feature or label is -1 and remove from graph?

gudeh · March 15, 2023, 1:49am

Here is the code where I select the values with -1, they end up in “idsToRemove”.

Furthermore, I attempted to use “isolated_nodes” from: How to remove isolated nodes in a DGL graph?, but I still can’t remove the “allow_zero_in_degree=True” in my model definition., for the code to run. Meaning I have two problems, it seems.

def _process_single( self, designPath ):
		nodes_data = pd.read_csv( designPath / 'preProcessedGatesToHeat.csv', index_col = 'id' )
		nodes_data = nodes_data.sort_index()
		edges_data = pd.read_csv( designPath / 'DGLedges.csv')
		edges_src  = torch.from_numpy( edges_data['Src'].to_numpy() )
		edges_dst  = torch.from_numpy( edges_data['Dst'].to_numpy() )

		df = nodes_data[ listFeats + [ labelName ] ]
		df_wanted = np.logical_and ( np.where( df[ rawFeatName ] > 0, True, False ), np.where( df[ labelName ] >= 0, True, False ) )
		df_wanted = np.invert( df_wanted )
		removedNodesMask = torch.tensor( df_wanted )
		idsToRemove = torch.tensor( nodes_data.index )[ removedNodesMask ]
				
		self.graph = dgl.graph( ( edges_src, edges_dst ), num_nodes = nodes_data.shape[0] )
		self.graph.ndata[ featName ] =  torch.tensor( nodes_data[ listFeats ].values )
		self.graph.ndata[ labelName  ]  = ( torch.from_numpy ( nodes_data[ labelName   ].to_numpy() ) )
		self.graph.ndata[ secondLabel ] = ( torch.from_numpy ( nodes_data[ secondLabel ].to_numpy() ) )


		print("\n---> BEFORE REMOVED NODES:")
		print( "\tself.graph.nodes()", self.graph.nodes().shape, "\n", self.graph.nodes() )
		print( "\tself.graph.ndata\n", self.graph.ndata )		

		self.graph.remove_nodes( idsToRemove )
		
################################################################################################
		isolated_nodes = ( ( self.graph.in_degrees() == 0 ) | ( self.graph.out_degrees() == 0 ) ).nonzero().squeeze(1)
		print( "isolated_nodes:", isolated_nodes.shape, "\n", isolated_nodes )
		self.graph.remove_nodes( isolated_nodes )
################################################################################################

		
		
		print("\n---> AFTER REMOVED NODES:")
		print( "\tself.graph.nodes()", self.graph.nodes().shape, "\n", self.graph.nodes() )
		print( "\tself.graph.ndata\n", self.graph.ndata )		

		return self.graph

gudeh · March 15, 2023, 1:53am

As I mentioned, I sorted the ndata and I couldn’t see any -1 value after nodes being removed, meaning it seems that it worked. Solving one of my problems, maybe.

Does ndata source values need to be sorted before doing “self.graph.ndata[ … ] = torch.tensor(…)”? Just to be sure.

Rhett-Ying · March 16, 2023, 12:37am

why do you sort the original ndata before constructing the DGLGraph? I think you could just construct the graph first, then remove those nodes whose feat/label is -1. After this, you could further filter the isolated nodes from the new graph and remove them from it. No sort at all.

Another thing I noticed is you used | instead of & when filter isolated nodes, is this what you mean to do?

gudeh · March 16, 2023, 1:12am

I sorted the source data because the values are unordered with relation to their ID. Isn’t this a requirement to build the graph properly?

gudeh · March 16, 2023, 3:49pm

I mean, to even construct the graph (your first tip), I need to sort my data either way. Considering it is not sorted with relation to the ID. Correct?

You can see here how my data looks when unsorted. Notice how it already starts with the node with ID 1857.

Rhett-Ying · March 23, 2023, 2:25am

I sorted the source data because the values are unordered with relation to their ID. Isn’t this a requirement to build the graph properly?

the value you mean here is the node data/feature? the id field is used to identify node IDs and edges are constructed from these IDs? And any other fields in your table are regarded as node data/feature. Is this what you want?

could you share a minimum repro including the table, code snippet?

system · April 22, 2023, 2:25am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.