I’m reading the example code of graphsage. I’m trying to understand the behaviour of EdgeDataLoader
with negative sampling so I tried it on a testing graph. However, I’m confused about the output generated by this data loader.
The graph for tesing:
import networkx as nx
import torch as th
import dgl
import numpy as np
def build_karate_club_graph():
src = np.array([1, 2, 2, 3, 3, 3, 4, 5, 6, 6, 6, 7, 7, 7, 7, 8, 8, 9, 10, 10,
10, 11, 12, 12, 13, 13, 13, 13, 16, 16, 17, 17, 19, 19, 21, 21,
25, 25, 27, 27, 27, 28, 29, 29, 30, 30, 31, 31, 31, 31, 32, 32,
32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 33, 33])
dst = np.array([0, 0, 1, 0, 1, 2, 0, 0, 0, 4, 5, 0, 1, 2, 3, 0, 2, 2, 0, 4,
5, 0, 0, 3, 0, 1, 2, 3, 5, 6, 0, 1, 0, 1, 0, 1, 23, 24, 2, 23,
24, 2, 23, 26, 1, 8, 0, 24, 25, 28, 2, 8, 14, 15, 18, 20, 22, 23,
29, 30, 31, 8, 9, 13, 14, 15, 18, 19, 20, 22, 23, 26, 27, 28, 29, 30,
31, 32])
u = np.concatenate([src, dst])
v = np.concatenate([dst, src])
return dgl.DGLGraph((u, v))
def plotGraph(G):
nx_G = G.to_networkx().to_directed()
pos = nx.kamada_kawai_layout(nx_G)
nx.draw(nx_G, pos, with_labels=True, node_color=[[.7, .7, .7]])
# show the graph for clear relation check
import matplotlib.pyplot as plt
plt.figure(figsize=(8,8))
G = build_karate_club_graph()
plotGraph(G)
Here is the negative sampler (the one in the graphsage unsupervised code):
class NegativeSampler(object):
def __init__(self, g, k, neg_share=False):
self.weights = g.in_degrees().float() ** 0.75
self.k = k
self.neg_share = neg_share
def __call__(self, g, eids):
src, _ = g.find_edges(eids)
n = len(src)
if self.neg_share and n % self.k == 0:
dst = self.weights.multinomial(n, replacement=True)
dst = dst.view(-1, 1, self.k).expand(-1, self.k, -1).flatten()
else:
dst = self.weights.multinomial(n*self.k, replacement=True)
src = src.repeat_interleave(self.k)
return src, dst
I created the dataloader by the following code:
# edges to compute output
train_eids = th.tensor([1, 2])
print(print("edges to compute output, src, dst are: ", G.find_edges(train_eids)))
fanouts = [1] # List of neighbors to sample for each GNN layer, let's say just one layer and one neighbor.
sampler = dgl.dataloading.MultiLayerNeighborSampler(fanouts)
negative_sampler = NegativeSampler(G, 2) # 2 negative samples per positive
# define the dataloader:
dataloader = dgl.dataloading.EdgeDataLoader(
G,
train_eids,
sampler,
exclude=None,
negative_sampler=negative_sampler,
batch_size=1,
shuffle=True,
drop_last=False,
pin_memory=True)
I’m testing the dataloader by:
for step, (input_nodes, pos_graph, neg_graph, blocks) in enumerate(dataloader):
assert sum(th.eq(pos_graph.nodes(), neg_graph.nodes())) == \
neg_graph.number_of_nodes() == \
pos_graph.number_of_nodes()
print("************ step-{} **********".format(step))
print("input_nodes: ", input_nodes)
print("pos_graph {} edges: ".format(pos_graph.number_of_edges()), pos_graph.edges())
print("pos_graph {} nodes: ".format(pos_graph.number_of_nodes()), pos_graph.nodes())
# neg_graph edges number == number defined in NegativeSampler()
print("neg_graph {} edges: ".format(neg_graph.number_of_edges()), neg_graph.edges())
print("neg_graph {} nodes: ".format(neg_graph.number_of_nodes()), neg_graph.nodes())
for b in blocks:
print("\tblock nodes number: ", b.number_of_nodes())
print("\tblock nodes: ", b.nodes("_U"))
I’ve run the last code block several time. One of the ouput is as following:
************ step-0 **********
input_nodes: tensor([ 2, 1, 21, 26, 32, 19, 29])
pos_graph 1 edges: (tensor([0]), tensor([1]))
pos_graph 4 nodes: tensor([0, 1, 2, 3])
neg_graph 2 edges: (tensor([0, 0]), tensor([2, 3]))
neg_graph 4 nodes: tensor([0, 1, 2, 3])
block nodes number: 11
block nodes: tensor([0, 1, 2, 3, 4, 5, 6])
************ step-1 **********
input_nodes: tensor([ 2, 0, 6, 24, 28, 17, 5, 27])
pos_graph 1 edges: (tensor([0]), tensor([1]))
pos_graph 4 nodes: tensor([0, 1, 2, 3])
neg_graph 2 edges: (tensor([0, 0]), tensor([2, 3]))
neg_graph 4 nodes: tensor([0, 1, 2, 3])
block nodes number: 12
block nodes: tensor([0, 1, 2, 3, 4, 5, 6, 7])
There are some results that I’m confused:
- From which graph are the
pos_graph
andneg_graph
sampled from? TheG
or theblocks
in each step? - Why the
pos_graph.edges()
always be 0, 1? The edges between 0, 1 is surely not the only positive edge in the given block or the graph. - The nodes in
neg_graph
is always the same aspos_graph
, does this means the input parametereids
inNegativeSampler
's call function is the sampled positiveeids
? And what about theg
parameter, is it the entire origin graphG
? Or just sampled blocks? - For the sampled blocks, why the number of nodes in the
b.nodes("_U")
does not equal tob.number_of_nodes()
?
Could somebody give me some explanation? Thanks for your attention!