Am I missing something about fn.copy_u?

florpi · February 21, 2020, 4:43pm

Hi everyone,

I am trying to check that copy_u does what I want, but it doesn’t seem to be the case. I’d like to sum the features of the 5 nearest neighbours to a given node. However, although the edges seem to be correct, when I sum up the features I don’t get the same answer as I do it within a for loop (as can be seen in my dirty_graph function). Should I be using a different function rather than copy_u?

import numpy as np                                                                                                   
import dgl                                                                                                           
import dgl.function as fn                                                                                            
from scipy.spatial import cKDTree                                                                                    
                                                                                                                    
                                                                                                                    
def generate_fake_data(n_particles):                                                                                 
   pos  = np.random.random((n_particles,3)).astype(np.float32)                                                      
   mass = np.random.normal(loc=13, scale = 3, size=n_particles).astype(np.float32)                                  
   return pos , mass                                                                                                
                                                                                                                    
def build_test_graph_distance(positions,                                                                             
       features,                                                                                                    
       distance = 1.,                                                                                               
       n_neighbors = 5                                                                                              
       ):                                                                                                           
   tree = cKDTree(positions,)                                                                                       
   distances, edges = tree.query(positions,                                                                         
           k=n_neighbors+1,                                                                                         
           )                                                                                                        
   edges = [(idx, dest) for idx, destination in enumerate(edges) for dest in destination \                          
                                           if ((idx!=dest)&(dest<len(positions)))]                                  
   print(edges)                                                                                                     
   src, dst = zip(*edges)                                                                                           
   G = dgl.DGLGraph()                                                                                               
   G.add_nodes(len(positions))                                                                                      
   G.add_edges(src, dst)                                                                                            
   G.ndata["masses"] = features                                                                                     
   G.update_all(fn.copy_u('masses', 'm'), fn.sum('m', 'sum_mass'))                                                  
   labels = G.ndata['sum_mass']                                                                                     
   return G, labels                                                                                       
                                                                                                                    
def dirty_graph(positions,                                                                                           
       features,                                                                                                    
       n_neighbors=5                                                                                                
       ):                                                                                                           
                                                                                                                    
   labels = []                                                                                                      
   for i in range(len(positions)):                                                                                  
       difference = positions - positions[i]                                                                        
       distance = np.linalg.norm(difference, axis=-1)                                                               
       idx = np.argsort(distance)[1:n_neighbors+1]                                                                  
       labels.append(np.sum(features[idx]))                                                                         
   return labels                                                                                                    
                                                                                                                    
                                                                                                                    
if __name__=='__main__':                                                                                             
                                                                                                                    
   pos, mass = generate_fake_data(10)                                                                               
                                                                                                                    
   graph, labels= build_test_graph_distance(pos, mass, )                                                            
   dirty_labels = dirty_graph(pos, mass)                                                                            
   np.testing.assert_almost_equal(labels.numpy(), dirty_labels, decimal=1)

mufeili · February 22, 2020, 11:51am

Hi, I just tried your code and cannot reproduce the warning. Did you check the values? Is it possible that warnings are due to numerical issues?

florpi · February 22, 2020, 12:11pm

Interesting. I get the following output

Mismatch: 80%
Max absolute difference: 61.832886
Max relative difference: 0.91820663
 x: array([ 70.4,  58. ,  52.2,  75. ,  29.1,  62.9, 129.2,  62.2, 125.5,
        40.4], dtype=float32)
 y: array([70.4, 73.4, 65.9, 75. , 71.6, 59.5, 67.3, 74.1, 69.9, 67.7],
      dtype=float32)

mufeili · February 23, 2020, 9:52am

Wierd. Can you update the code snippet and include the code used to print the output above?

florpi · February 23, 2020, 9:54am

It’s the same code snippet. This bit in the main,

is doing the printing when they are not equal. If they are equal it shouldn’t print anything, which I assume it is what happens to you? Can you please try using more samples? (That means changing the value of 10 in generate_fake_data)

mufeili · February 23, 2020, 10:29am

Ok. I know what’s going on here.

edges = [(idx, dest) for idx, destination in enumerate(edges) for dest in destination if ((idx!=dest)&(dest<len(positions)))]

should be changed to

edges = [(src, idx) for idx, sources in enumerate(edges) for src in sources if ((idx != src) & (src<len(positions)))]

Since we want to sum the distances with respect to the nearest neighbors, the nearest neighbors should be source nodes of messages rather than destination nodes.

florpi · February 23, 2020, 10:35am

That’s great! I completely misunderstood the meaning of source and destination. Thank you