Changing Graph Structure after Construction and Running Models with Partial Edges

mufeili · December 25, 2020, 4:19am

Isn’t find_edges part of the normal Uniform Negative Sampler implementation?

My bad. You are right. Also for your implementation, the negative edges do not necessary need to have the same source nodes as the positive edges.

By this, are you now saying that we should choose negative edges and aim to minimize their score and have positive edges as the new “negative examples”? Basically doing the opposite of the link prediction negative sampler? Would this have a benefit over the previous method?

No. There will be two EdgeDataLoader instances sampling positive edges only. It’s just that the first EdgeDataLoader treats real edges as positive edges and the second EdgeDataLoader treats the specific edges as positive edges. I’m not sure which approach will be more clean or efficient.

hockeybro12 · December 27, 2020, 4:02am

Oh, isn’t it done that like though in the default implementation?

dst = F.randint(shape, dtype, ctx, 0, g.number_of_nodes(vtype))

I’m sorry, I still don’t understand. The second EdgeDataLoader treats the edges that want to be pushed apart as positive edges. And then what, the objective for those is modified? So they receive a different loss? How will this work together?

I

Would there be a performance difference in terms of learning the GNN? It seems to me that both methods should learn the same thing.

mufeili · December 27, 2020, 5:53am

Oh, isn’t it done that like though in the default implementation?
dst = F.randint(shape, dtype, ctx, 0, g.number_of_nodes(vtype))

In principle, you can do similar things for the source nodes as well so that the negative edges no longer depend on the particular positive edges.

I’m sorry, I still don’t understand. The second EdgeDataLoader treats the edges that want to be pushed apart as positive edges. And then what, the objective for those is modified? So they receive a different loss? How will this work together?
Would there be a performance difference in terms of learning the GNN? It seems to me that both methods should learn the same thing.

This depends on how you implement them. They can do similar things in principle.

hockeybro12 · December 28, 2020, 10:04pm

I see.

Assuming I decide to do the first option (updating the negative sampler to give me the negative edges that need to be pushed apart), how would I actually do this? In the sample code I posted earlier, I used ids_to_push as a dictionary to store the destination nodes that the source nodes must be pushed apart from (so basically if node 5 was in the dictionary and needed to be pushed apart from node 6, then the negative edge for node 5 (dst) would be node 6). If I do this, would I just call EdgeDataLoader with my custom sampler and pass in the extra ids_to_push dict as an argument? Would this be the best and most efficient way?

After that, I need to figure out in which cases the destination nodes were chosen based on ids_to_push, so that I can minimize the score for those pair of nodes so that they end up getting pushed apart. How do I do this?

mufeili · December 29, 2020, 7:03pm

Assuming I decide to do the first option (updating the negative sampler to give me the negative edges that need to be pushed apart), how would I actually do this? In the sample code I posted earlier, I used ids_to_push as a dictionary to store the destination nodes that the source nodes must be pushed apart from (so basically if node 5 was in the dictionary and needed to be pushed apart from node 6, then the negative edge for node 5 (dst ) would be node 6). If I do this, would I just call EdgeDataLoader with my custom sampler and pass in the extra ids_to_push dict as an argument? Would this be the best and most efficient way?

You can define a class that takes in ids_to_push for __init__.

After that, I need to figure out in which cases the destination nodes were chosen based on ids_to_push , so that I can minimize the score for those pair of nodes so that they end up getting pushed apart. How do I do this?

Why do you want to know that? Can’t you directly follow the user guide and minimize scores on the negative samples?

hockeybro12 · December 30, 2020, 7:05am

Are you saying that I change nothing about my model except for adding a negative sampler to specifically sample certain destination nodes based on if they need to be pushed apart? I guess this will work, but shouldn’t I add a penalty or something to the loss to emphasize those ones in particular? Kind of like how you did with the delta term at the start of our discussion?

hockeybro12 · December 30, 2020, 7:13am

@mufeili

Also, it seems like the code I gave earlier doesn’t work directly, as src is actually a tensor with a lot of node ID’s, so I can’t just do a direct lookup. I could iterate and then build the return tensor (maybe by converting everything to numpy first), but do you have any ideas of a more efficient way?

def _generate(self, g, eids, canonical_etype):
   # self.ids_to_push is a dictionary mapping from source node to destination node that should be selected whenever the source node exists in the dictionary
    _, _, vtype = canonical_etype
    shape = F.shape(eids)
    dtype = F.dtype(eids)
    ctx = F.context(eids)
    shape = (shape[0] * self.k,)
    src, _ = g.find_edges(eids, etype=canonical_etype)
    src = F.repeat(src, self.k, 0)
    if self.ids_to_push is not None and src in self.ids_to_push:
        print("in the push")
        dst = ids_to_push[src]
    else:
        dst = F.randint(shape, dtype, ctx, 0, g.number_of_nodes(vtype))
    return src, dst

mufeili · January 4, 2021, 4:18am

You can do something similar to the loss function here.

mufeili · January 4, 2021, 4:23am

I don’t think you necessarily need to push away some nodes from the source nodes and you just want to separate some particular pairs of nodes, right? In this case, you don’t necessarily need to sample negative destination nodes from a distribution conditioned upon the source nodes. You can simply sample negative pairs of nodes directly.

system · February 3, 2021, 4:23am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.