Are there benchmarks on the speed of user-defined message/reduce functions?

Are there benchmarks on the speed of using user-defined message/reduce functions? Besides built-in functions that invoke g-SDDMM or g-SpMM for message passing, I’m curious about the performance of running user-defined message functions and reduce functions using DGL. How well do the node_batch and edge_batch methods perform under different conditions, such as sparse/dense graphs and large/small graphs? Are there statistics that you can share?

As you can see, with the progress of GNN (Graph Neural Networks), there are various message functions. I believe that having access to such statistics would be highly beneficial for the community. It would not only help in understanding the performance nuances of DGL under various conditions but also assist in optimizing our own implementations more effectively.

And I would really appreciate it if there would be a guide on how to write DGL code efficiently. For example, is it better not to perform calculations on nodes (like linear(edges.src['fv']) ) in an edge_udf ? And similarly, does performing calculations (e.g. linear(nodes.data['fv'])) in a node_udf would risk loss of efficiency because of the “degree bucketing”?

There is no benchmark for that unfortunately.

To make message passing more efficient, a typical practice is to reduce the operations on edges as the number of edges is usually much larger than the number of nodes. This paper Graphiler summarizes a couple of common patterns. Although the paper aims at automating such optimization, manually rewriting the codes should give similar speedup.

1 Like

Thanks, @minjie. Your response was really helpful. I will definitely look into your paper. BTW, I’ve started to use the dgl.ops API, and I find everything becomes clearer. :smiley:

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.