How to effeciently implement metapath-based random walk?

bean · November 4, 2020, 2:23am

I want to implement a self-defined random walk based on a given metapath. It seems DGL uses some C code to implement the actual process. Is there any way I can do this?

I have tried random walk based on networkx, but it super slow.

BarclayII · November 4, 2020, 5:24am

dgl.sampling.random_walk supports metapath-based random walks and you can also supply transition probabilities.

Do you have any further requirements?

bean · November 4, 2020, 3:47pm

Yeah, I need to give each node a different weight.
In some cases, the unnormalized transition probability will be all zero, I need the random walk to stop here. Can this be supported by dgl.sampling.random_walk?

BTW. Have you tested the performance of the implementation on some giant graph? (e.g. 10^8 nodes)

BarclayII · November 5, 2020, 2:02am

No. The sum of the transition probabilities from a node must be positive.

From what I can tell uniform random walks are very efficient while non-uniform random walks will be slower than that. There’s still space to optimize for non-uniform random walks.

bean · November 5, 2020, 2:08am

So how can I change the code to fufill my need? Is there any advice? thank you

BarclayII · November 5, 2020, 2:15am

If you know which node has the sum of transition probabilities 0, you can replace the probabilities with some other value, and after calling dgl.sampling.random_walk you can manually terminate the generated trace afterwards if a sampled trace visits one of those nodes. You can use np.isin to identify whether the elements in an array appears in another array.

bean · November 5, 2020, 2:19am

Thank you, I’ll try it.

bean · November 8, 2020, 12:35pm

I got another question, in random_walk, do we have to ensure the start node and the end node of a metapath are the same? (so it can extend to any length)

BarclayII · November 9, 2020, 3:22am

You don’t have to. However, the metapath should connect. Say you have this graph:

g = dgl.heterograph({
    ('A', 'AB', 'B'): (ss, dd),
    ('B', 'BA', 'A'): (dd, ss)})

Then

# works
dgl.sampling.random_walk(g, [0, 0, 0], metapath=['AB'])
# works
dgl.sampling.random_walk(g, [0, 0, 0], metapath=['AB', 'BA'])
# works; you extend your metapath to any length like this.
dgl.sampling.random_walk(g, [0, 0, 0], metapath=['AB', 'BA', 'AB', 'BA', 'AB', 'BA'])
# fails; edge type AB does not connect with AB.
dgl.sampling.random_walk(g, [0, 0, 0], metapath=['AB', 'AB'])

stupidoge · August 28, 2023, 4:00pm

@BarclayII Hi Barclay, I still cannot figure the function out. In the example in DGL document

g2 = dgl.heterograph({
    ('user', 'follow', 'user'): ([0, 1, 1, 2, 3], [1, 2, 3, 0, 0]),
    ('user', 'view', 'item'): ([0, 0, 1, 2, 3, 3], [0, 1, 1, 2, 2, 1]),
    ('item', 'viewed-by', 'user'): ([0, 1, 1, 2, 2, 1], [0, 0, 1, 2, 3, 3])

dgl.sampling.random_walk(
    g2, [0, 1, 2, 0], metapath=['follow', 'view', 'viewed-by'] * 2)

What is the second argument [0, 1, 2, 0] means? the [0, 1, 2, 0] is from the item type, or from user type? Both of them have the 0, 1, 2 nodes as their id.

minjie · August 30, 2023, 2:55pm

Duplicate of Random Walk document is not clear - #4

minjie · August 30, 2023, 2:55pm