Define custom negative samplers when UVA enabled

Hi, I am trying to train an SAGE model on a large heterogenous graph (with billion+ edges).

The positive samples are generated using dgl.dataloading.EdgeDataLoader, and use_uva
was set to True to boost the sampling process.

To my goal, a custom negative sampler is needed. But when I add this, the training time increases rapidly (almost 10x). Would it be something wrong with the negative sampler? To support UVA sampling, what should I do in the negative sampler?


There may be something wrong such as device of graph/feature/indices. could you share more details of customized negative sampler and how do you use it in EdgeDataLoader?

BTW, could you try with dgl.dataloading.DataLoader as dgl.dataloading.EdgeDataLoader is deprecated since 0.8.

In fact, I use dgl.dataloading.DataLoader for edge sampling in the training script.

For the customized negative sampler, it needs to pick up specific edges of the source nodes that meet the condition (e.g., edge feature weights < 1). The code might be like the below:

def get_negative_edges(g, seed_edges):

    # get the device of edge id tensor
    device = seed_edges.device

    # source nodes
    src, _ = g.find_edges(seed_edges, etype=etype)

    # find all out edges of source nodes
    out_edges = [g.out_edges(s, form='eid', etype=etype) for s in src] # list of tensors, length = batch_size

    # filter the edges
    neg_edges = [out_e[g.edata['weights'][canonical_etype][out_e] < 1] for out_e in out_edges]

    # randomly sample one negative edge
    neg_list = []
    for neg_e in neg_edges:
        if neg_e.shape[0] > 0: 
            tar_ntype = canonical_etype[-1]
            neg_list.append(g.nodes(tar_ntype)[torch.randint(0, g.num_nodes(tar_ntype), (1, ))].to(device))

    # edge form: eid -> uv
    ret = g.find_edges(, etype=etype)

    return ret

It seems that some operations used in the negative sampler could not directly be done with UVA enabled, which means that there might be unnecessary data transport between CPU and GPU.
Now I am trying to rewrite this sampling function. Could you please provide some suggestions for it? Thanks!

How do you confirm the slow-down is caused by UVA as you have many calls within the negative sampler: find_edges, indexing, explicit for loop which are non-trivial?
As for the intermediate variables such as src, out_edges and so on are probably not pinned and do not benefit from UVA. You could check via .is_pinned().
In order to check the transportation between CPU and GPU, you could utilize tools such as Nsight.

Your code seems to be looping over tensors with Python loops. For instance,

    out_edges = [g.out_edges(s, form='eid', etype=etype) for s in src] # list of tensors, length = batch_size

I think that’s why your function runs slow. You will need to avoid using Python loops to get reasonable performance.
For instance,

[g.out_edges(s, form='eid', etype=etype) for s in src]

could be replaced with

g.out_edges(src, form='eid', etype=etype).split(
    g.out_degrees(src, etype=etype).tolist())

i.e. get all out-edges from all source nodes with a single API, and then split it up afterwards.

Yes, I agree with you. The use of list comprehension is not an appropriate choice in this case.

As your code shows (thanks!), when operating on tensors, it would be better to use Pytorch bulit-in APIs to get good performance.

As for my question, one solution can also be taking one of the negative candidates as one node feature. By randomly sampling and updating the negative sample as node feature before every epoch starts, it is even faster than the previous training.

1 Like