Question about the algorithm metis for distributed graph partition

Hello, I am very curious about whether the metis algorithm used in distributed training has been specially processed or tuned? Because when I used the graph partition function, I found that when using the product dataset containing 100 million edges, partitioning the graph structure only took about 70s, which was much better than the result I used in the C++ version of metis, I am Curious how this is done.

DGL’s METIS partition uses the same C++ METIS implementation and I don’t think we have specific tuning. How did you measure the run time for both implementation? Maybe you included the time of I/O in C++?

@zhengda1936 and @Rhett-Ying can correct me if I’m wrong.

Thanks for your reply, I found that it is indeed the more time-consuming that we have considered more steps.