Hello, thank you very much for seeing this question. Recently, when using DGL’s distribution, I encountered some places that I don’t understand. I hope someone can help me answer it.
- For the partitions of the metis and random algorithms, I saw such outputs when I ran them. According to my understanding, it should indicate that the vertices and samples are completely in this partition. Then I don’t quite understand why the random algorithm that seems to have a better hit rate takes longer to calculate.
#metis
part 0, train: 38358 (local: 34303), val: 5958 (local: 5958), test: 13926 (local: 13926)
part 1, train: 38358 (local: 36497), val: 5958 (local: 5314), test: 13926 (local: 12176)
…
#random
part 0, train: 38358 (local: 38358), val: 5958 (local: 5844), test: 13926 (local: 13920)
part 2, train: 38358 (local: 38152), val: 5958 (local: 5958), test: 13926 (local: 13750)
…
- After running the metis algorithm partition and graphsage multiple times, I found that the hit rate of each vertex seems to be different every time. Of course, the time efficiency is better when the hit rate is high. But I’m curious what causes the difference in the partition hit ratio. In my understanding, the metis algorithm and the training set and verification set are all determined, so shouldn’t the result of the partition be the same every time?
3.Sampling node, I always think that the process of sampling and feature extraction occurs in dataloader = DistDataLoader(args), but according to the running situation, it seems that feature extraction occurs in the process of enumerate(dataloader)? I probably don’t know much about the internal implementation of this.