How to extract embedding from unsupervised GraphSage model?

Hello, I’m new to DGL.

I ran the unsupervised GraphSage implementation and I wanted to extract the the hidden representations after the last epoch.

Is there any code example to extract hidden representations for each input node? (I’m new to PyTorch >_<).

Thanks!

1 Like

I think you just need to take the output of the inference layer here. To save pytorch tensors, this post might help.

Your answer is incredibly helpful! Thx!

Hello Mufei,

If I want to extract graph embeddings from a GAT or. a GIN network, is it the same process? i.e., use output of the inference layer?
I’m getting
" if len(g.ntypes) > 1:
AttributeError: ‘DGLGraph’ object has no attribute ‘ntypes’ ".

Here’s the inference layer I’m trying to use in GAT:

def inference(self, g, x, batch_size, device):
        nodes = torch.arange(g.number_of_nodes())
        for l, layer in enumerate(self.gat_layers):
            y = torch.zeros(g.number_of_nodes(), self.num_hidden if l != len(self.gat_layers) - 1 else num_classes)
            for start in tqdm.trange(0, len(nodes), batch_size):
                end = start + batch_size
                batch_nodes = nodes[start:end]
                block = dgl.to_block(dgl.in_subgraph(g, batch_nodes), batch_nodes)
                input_nodes = block.srcdata[dgl.NID]
                h = x[input_nodes].to(device)
                h_dst = h[:block.number_of_dst_nodes()]
                h = layer(block, (h, h_dst))
                if l != len(self.layers) - 1:
                    h = self.activation(h)
                    h = self.dropout(h)
                y[start:end] = h.cpu()
            x = y
        return y

Thanks so much in advance.

Yes, the process will be similar.

The issue you encountered has nothing to do with embedding extraction and I’m sorry for that. We are in the process of merging DGLGraph and DGLHeteroGraph and the support for DGLGraph has been broken in the nightly built version. Could you switch to the latest stable version of DGL? Note that you will need to first uninstall DGL and need to verify that by importing DGL.

Did that and the error remains.
Could you please confirm if the version of DGL that I’m using is the latest and if the error message is pertinent to this version?

Here’s my dgl version:

dgl.version
‘0.4.3post2’

The error says:

File "/home/balakkvj/anaconda3/lib/python3.7/site-packages/dgl/transform.py", line 976, in in_subgraph
    if len(g.ntypes) > 1:
AttributeError: 'DGLGraph' object has no attribute 'ntypes'

I see. in_subgraph is only supported for DGLHeteroGraph. Can you construct your graph as a DGLHeteroGraph with dgl.graph?

OK. Did that and got past that point.
However, now the error message is

File "/home/balakkvj/anaconda3/lib/python3.7/site-packages/dgl/nn/pytorch/conv/gatconv.py", line 125, in forward
    feat_src = self.fc_src(h_src).view(-1, self._num_heads, self._out_feats)

  File "/home/balakkvj/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 594, in __getattr__
    type(self).__name__, name))
AttributeError: 'GATConv' object has no attribute 'fc_src'

It seemed the GATConv needed in_feats to define self.fc_src so, I edited the main function in gatconv.py and redefined fc_src, but then the error message changed to

Expected object of scalar type Double but got scalar type Float for argument #2 'mat2' in call to _th_mm

Not sure if I use the same GATConv for heterographs or is there a different version?

By the way, thanks so much for all your help and suggestions.

1 Like

I printed the graph before conversion to heterograph and seems like node and edge attributes are missing.

Here’s the DGLGraph

DGLGraph(num_nodes=2238, num_edges=7264,
ndata_schemes={‘node_attributes’: Scheme(shape=(64,), dtype=torch.float64)}
edata_schemes={‘edge_attributes’: Scheme(shape=(64,), dtype=torch.float64)})

And here’s the graph after conversion to Heterograph

Graph(num_nodes=2238, num_edges=7264,
      ndata_schemes={}
      edata_schemes={})

I fixed this ndata_schemes problem.
Is the following conversion from DGLGraph to DGLHeterograph erroneous?

subgraph1, feats, labels1 = data1
print(subgraph1)
nxg = subgraph1.to_networkx(node_attrs=[‘node_attributes’], edge_attrs=[‘edge_attributes’])
subgraph = dgl.graph(nxg)
subgraph.ndata[‘node_attributes’] = feats
subgraph.edata[‘edge_attributes’] = subgraph1.edata[‘edge_attributes’]
print(subgraph)

Now the output of converted graph looks like

Graph(num_nodes=2238, num_edges=7264,
ndata_schemes={‘node_attributes’: Scheme(shape=(64,), dtype=torch.float64)}
edata_schemes={‘edge_attributes’: Scheme(shape=(64,), dtype=torch.float64)})

However, after all this, I still get the fc_src error

AttributeError: ‘GATConv’ object has no attribute ‘fc_src’

  1. We are working on merging DGLGraph and DGLHeteroGraph and there will be no differences between them in the future.
  2. How did you initialize a GATConv instance? With layer(block, (h, h_dst)), you need to initialize GATConv by passing a 2-tuple of int to in_feats.

I’m calling the following (from one of the tutorial scripts) to initialize the GATConv:

self.gat_layers.append(GATConv(in_dim, num_hidden, heads[0],feat_drop, attn_drop, negative_slope, False, self.activation))

It runs very smooth without the inference function. The errors appear only when I call the `layer(block, (h, h_dst)) in the inference scope.

As I said, you need to use a 2-tuple of int for in_dim, representing the number of features for source nodes and destination nodes. Since you did not have any issues during training, did you use sampling during training? If you did not use sampling-based training, you shall not do sampling for inference as well.

Hi @balakkvj I’ve fixed that(now the there is no fc_src and fc_dst, we use fc for both homograph and heterograph) in the master branch, please try our nightly build version and see if the problem persists.

1 Like