I generated a bipartite graph using the Stochastic Block Model strategy from NetworkX. In this graph, users of type A have an 80% probability of having an edge to item type X and 20% to item type Y; also, users of type B have a 20% of probability to have an edge to item type X and 80% to item type Y.
Ids are sequential: 0-199 (users A), 200-399 (users B), 400-429 (item X), and 430-479 (item Y).
n_user_groups = 2
sizes = [200, 200, 30, 50]
probs = [
[0, 0, 0.8, 0.2],
[0, 0, 0.2, 0.8],
[0.8, 0.2, 0, 0],
[0.2, 0.8, 0, 0],
]
g = nx.stochastic_block_model(sizes, probs, seed=0)
max_user_id = sum(sizes[:n_user_groups])
normalize = lambda edge: (edge[0], edge[1] - max_user_id)
user_edges = list(map(normalize, list(g.edges())))
swap = lambda edge: (edge[1], edge[0])
item_edges = list(map(swap, user_edges))
graph = dgl.heterograph({
('user', 'watched', 'item'): user_edges,
('item', 'watched-by', 'user'): item_edges
}
)
Using PinSAGESampler in this graph, I notice the neighbors selected are not what I was expecting. Let’s take the node of id 0 as an example; I expected around 80% of the chosen neighbors in the sampler to be part of user group A (id 0-199), but this was not what happened. In a lot of empirical tests, this percentage was around 30-50%.
sampler = dgl.sampling.PinSAGESampler(
graph,
'user', # target node type
'item', # auxiliar node type
3, # random walk max lengh / 2
0.1, # restart prob
100, # n random walks
100 # number of neighbors
)
seeds = torch.LongTensor([0])
frontier = sampler(seeds)
frontier.all_edges(form='uv')
print( frontier.all_edges(form='uv'))
sum( frontier.all_edges(form='uv')[0] > 200)
output
(tensor([234, 0, 96, 106, 66, 165, 49, 221, 224, 311, 313, 390, 7, 369,
132, 160, 152, 149, 143, 141, 139, 392, 125, 119, 114, 394, 101, 100,
1, 92, 90, 239, 312, 346, 297, 285, 269, 264, 263, 252, 164, 235,
350, 233, 377, 213, 200, 173, 322, 37, 30, 42, 31, 21, 78, 80,
82, 84, 9, 26, 341, 286, 287, 288, 292, 296, 29, 338, 25, 22,
280, 327, 335, 330, 284, 283, 281, 20, 279, 278, 270, 32, 33, 262,
261, 257, 256, 34, 250, 385, 373, 375, 376, 16, 378, 379, 382, 384,
371, 386]), tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0]))
tensor(56)
Does anyone know why this is happening?