Access node features from dgl.dataloading.EdgeDataLoader

I am training with an dgl.dataloading.EdgeDataLoader , creating blocks that represent the different DGLGraphs. In every step I update the node representations which are stored in outputs for every node type. I now want to access the updated representation of the source and target node of each edge to create an edge embedding.

I am confused about the relation of the positive graph, the blocks and the entire graph to each other.

How can I safely access the feature representation of the nodes of an edge here? I am unsure whether I should access the positive graph or the blocks[0] graph.

For a better idea of the code, here is a code snippet explaining the relevant part:

train_loader = dgl.dataloading.EdgeDataLoader(...)

for step, (input_nodes, positive_graph, negative_graph, blocks) in enumerate(train_loader):
    # create input features for source and destination nodes
    input_features = (blocks[0].srcdata['feature'], blocks[0].dstdata['feature'])
    # forward pass by model
    # outputs are feature representations for every node type
    outputs = model(input_features, blocks)

My assumtion is that I would have to access both positive_graph and negative_graph and generate the edge embeddings for both graphs correspondingly.

My second question is if I know the node index of a node in the positive graph, how can I access the corresponding node feature in the outputs?

For clarification, outputs have the following format, mapping the node type to the corresponding feature vector of all nodes for this node type.

outputs = {node_type : th.Tensor()}

Can you tell me if the following is the proper way to go and why it does not work?

According to this idea, I first try to access the edge IDs for every edge type, and then find the corresponding node IDs.

edge_ids = positive_graph.edges['drug-protein']
Out[32]: EdgeSpace(data={'_ID': tensor([ 94, 106,  51,  96])})

Yet, when I want to find the node indices for these edges, I get an error:

positive_graph.find_edges(eid=edge_ids, etype='drug-protein')
Traceback (most recent call last):
...
  File "<ipython-input-34-a0205889f04f>", line 1, in <module>
    positive_graph.find_edges(eid=edge_ids, etype='drug-protein')
  File ".../python3.8/site-packages/dgl/heterograph.py", line 2896, in find_edges
    eid = utils.prepare_tensor(self, eid, 'eid')
  File "/.../python3.8/site-packages/dgl/utils/checks.py", line 37, in prepare_tensor
    data = F.tensor(data)
  File "/.../python3.8/site-packages/dgl/backend/pytorch/tensor.py", line 40, in tensor
    return th.as_tensor(data, dtype=dtype)
ValueError: could not determine the shape of object type 'HeteroEdgeDataView'
/.../python3.8/site-packages/dgl/base.py:45: DGLWarning: DGLGraph.__len__ is deprecated.Please directly call DGLGraph.number_of_nodes.
  return warnings.warn(message, category=category, stacklevel=1)
/.../lib/python3.8/site-packages/dgl/base.py:45: DGLWarning: DGLGraph.__len__ is deprecated.Please directly call DGLGraph.number_of_nodes.
  return warnings.warn(message, category=category, stacklevel=1)
/.../lib/python3.8/site-packages/dgl/base.py:45: DGLWarning: DGLGraph.__len__ is deprecated.Please directly call DGLGraph.number_of_nodes.
  return warnings.warn(message, category=category, stacklevel=1)

It seems that you are already discussing it with @BarclayII in Slack. It will be great if you can add an answer here once resolved so others can benefit from it as well.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.