How to effeciently implement metapath-based random walk?

I want to implement a self-defined random walk based on a given metapath. It seems DGL uses some C code to implement the actual process. Is there any way I can do this?

I have tried random walk based on networkx, but it super slow.

dgl.sampling.random_walk supports metapath-based random walks and you can also supply transition probabilities.

Do you have any further requirements?

Yeah, I need to give each node a different weight.
In some cases, the unnormalized transition probability will be all zero, I need the random walk to stop here. Can this be supported by dgl.sampling.random_walk?

BTW. Have you tested the performance of the implementation on some giant graph? (e.g. 10^8 nodes)

No. The sum of the transition probabilities from a node must be positive.

From what I can tell uniform random walks are very efficient while non-uniform random walks will be slower than that. There’s still space to optimize for non-uniform random walks.

So how can I change the code to fufill my need? Is there any advice? thank you

If you know which node has the sum of transition probabilities 0, you can replace the probabilities with some other value, and after calling dgl.sampling.random_walk you can manually terminate the generated trace afterwards if a sampled trace visits one of those nodes. You can use np.isin to identify whether the elements in an array appears in another array.

Thank you, I’ll try it.

I got another question, in random_walk, do we have to ensure the start node and the end node of a metapath are the same? (so it can extend to any length)

You don’t have to. However, the metapath should connect. Say you have this graph:

g = dgl.heterograph({
    ('A', 'AB', 'B'): (ss, dd),
    ('B', 'BA', 'A'): (dd, ss)})

Then

# works
dgl.sampling.random_walk(g, [0, 0, 0], metapath=['AB'])
# works
dgl.sampling.random_walk(g, [0, 0, 0], metapath=['AB', 'BA'])
# works; you extend your metapath to any length like this.
dgl.sampling.random_walk(g, [0, 0, 0], metapath=['AB', 'BA', 'AB', 'BA', 'AB', 'BA'])
# fails; edge type AB does not connect with AB.
dgl.sampling.random_walk(g, [0, 0, 0], metapath=['AB', 'AB'])

@BarclayII Hi Barclay, I still cannot figure the function out. In the example in DGL document

g2 = dgl.heterograph({
    ('user', 'follow', 'user'): ([0, 1, 1, 2, 3], [1, 2, 3, 0, 0]),
    ('user', 'view', 'item'): ([0, 0, 1, 2, 3, 3], [0, 1, 1, 2, 2, 1]),
    ('item', 'viewed-by', 'user'): ([0, 1, 1, 2, 2, 1], [0, 0, 1, 2, 3, 3])

dgl.sampling.random_walk(
    g2, [0, 1, 2, 0], metapath=['follow', 'view', 'viewed-by'] * 2)

What is the second argument [0, 1, 2, 0] means? the [0, 1, 2, 0] is from the item type, or from user type? Both of them have the 0, 1, 2 nodes as their id.

Duplicate of Random Walk document is not clear - #4