Using node embeddings after distributed training for downstream task

rbrugaro · February 16, 2023, 11:56pm

I have a distTensor node_emb that was defined as follows that contains the node embeddings i want to use for a downstream task.

    node_emb = dgl.distributed.DistTensor(
        (
            g.num_nodes(),
            emb_layer.module.emb_size
            if isinstance(emb_layer, th.nn.parallel.DistributedDataParallel)
            else emb_layer.emb_size,
        ),
        th.float32,
        "eval_embs",
        persistent=True,
    )

as suggested Distributed Node Classification — DGL 1.0.1 documentation I have the ID mapping saved at partitioned time but the assignment

orig_node_emb = torch.zeros(node_emb_tensor.shape, dtype=node_emb.dtype)
orig_node_emb[nmap] = node_emb

gives error

orig_node_emb[nmap] = node_emb
TypeError: can’t assign a DistTensor to a torch.FloatTensor

how do I cast the DistTensor to a torch.FloatTensor for the downstream task?

peizhou001 · February 27, 2023, 6:59am

Hi @rbrugaro, attach [<Indexes>] to convert it to a FloatTensor by collecting data corresponding to <Indexes>, for your case:

orig_node_emb[nmap] = node_emb[:]

system · March 29, 2023, 7:00am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.