How to extract embedding from unsupervised GraphSage model？

ask06 · June 18, 2020, 9:17am

Hello, I’m new to DGL.

I ran the unsupervised GraphSage implementation and I wanted to extract the the hidden representations after the last epoch.

Is there any code example to extract hidden representations for each input node? (I’m new to PyTorch >_<).

Thanks!

mufeili · June 22, 2020, 9:30am

I think you just need to take the output of the inference layer here. To save pytorch tensors, this post might help.

ask06 · June 23, 2020, 8:03am

Your answer is incredibly helpful! Thx!

balakkvj · July 27, 2020, 4:12am

Hello Mufei,

If I want to extract graph embeddings from a GAT or. a GIN network, is it the same process? i.e., use output of the inference layer?
I’m getting
" if len(g.ntypes) > 1:
AttributeError: ‘DGLGraph’ object has no attribute ‘ntypes’ ".

Here’s the inference layer I’m trying to use in GAT:

def inference(self, g, x, batch_size, device):
        nodes = torch.arange(g.number_of_nodes())
        for l, layer in enumerate(self.gat_layers):
            y = torch.zeros(g.number_of_nodes(), self.num_hidden if l != len(self.gat_layers) - 1 else num_classes)
            for start in tqdm.trange(0, len(nodes), batch_size):
                end = start + batch_size
                batch_nodes = nodes[start:end]
                block = dgl.to_block(dgl.in_subgraph(g, batch_nodes), batch_nodes)
                input_nodes = block.srcdata[dgl.NID]
                h = x[input_nodes].to(device)
                h_dst = h[:block.number_of_dst_nodes()]
                h = layer(block, (h, h_dst))
                if l != len(self.layers) - 1:
                    h = self.activation(h)
                    h = self.dropout(h)
                y[start:end] = h.cpu()
            x = y
        return y

Thanks so much in advance.

mufeili · July 27, 2020, 5:44am

Yes, the process will be similar.

The issue you encountered has nothing to do with embedding extraction and I’m sorry for that. We are in the process of merging DGLGraph and DGLHeteroGraph and the support for DGLGraph has been broken in the nightly built version. Could you switch to the latest stable version of DGL? Note that you will need to first uninstall DGL and need to verify that by importing DGL.

balakkvj · July 28, 2020, 12:13am

Did that and the error remains.
Could you please confirm if the version of DGL that I’m using is the latest and if the error message is pertinent to this version?

Here’s my dgl version:

dgl.version
‘0.4.3post2’

The error says:

File "/home/balakkvj/anaconda3/lib/python3.7/site-packages/dgl/transform.py", line 976, in in_subgraph
    if len(g.ntypes) > 1:
AttributeError: 'DGLGraph' object has no attribute 'ntypes'

mufeili · July 28, 2020, 5:43am

I see. in_subgraph is only supported for DGLHeteroGraph. Can you construct your graph as a DGLHeteroGraph with dgl.graph?

balakkvj · July 29, 2020, 12:00am

OK. Did that and got past that point.
However, now the error message is

File "/home/balakkvj/anaconda3/lib/python3.7/site-packages/dgl/nn/pytorch/conv/gatconv.py", line 125, in forward
    feat_src = self.fc_src(h_src).view(-1, self._num_heads, self._out_feats)

  File "/home/balakkvj/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 594, in __getattr__
    type(self).__name__, name))
AttributeError: 'GATConv' object has no attribute 'fc_src'

It seemed the GATConv needed in_feats to define self.fc_src so, I edited the main function in gatconv.py and redefined fc_src, but then the error message changed to

Expected object of scalar type Double but got scalar type Float for argument #2 'mat2' in call to _th_mm

Not sure if I use the same GATConv for heterographs or is there a different version?

By the way, thanks so much for all your help and suggestions.

balakkvj · July 29, 2020, 12:18am

I printed the graph before conversion to heterograph and seems like node and edge attributes are missing.

Here’s the DGLGraph

DGLGraph(num_nodes=2238, num_edges=7264,
ndata_schemes={‘node_attributes’: Scheme(shape=(64,), dtype=torch.float64)}
edata_schemes={‘edge_attributes’: Scheme(shape=(64,), dtype=torch.float64)})

And here’s the graph after conversion to Heterograph

Graph(num_nodes=2238, num_edges=7264,
      ndata_schemes={}
      edata_schemes={})

I fixed this ndata_schemes problem.
Is the following conversion from DGLGraph to DGLHeterograph erroneous?

subgraph1, feats, labels1 = data1
print(subgraph1)
nxg = subgraph1.to_networkx(node_attrs=[‘node_attributes’], edge_attrs=[‘edge_attributes’])
subgraph = dgl.graph(nxg)
subgraph.ndata[‘node_attributes’] = feats
subgraph.edata[‘edge_attributes’] = subgraph1.edata[‘edge_attributes’]
print(subgraph)

Now the output of converted graph looks like

Graph(num_nodes=2238, num_edges=7264,
ndata_schemes={‘node_attributes’: Scheme(shape=(64,), dtype=torch.float64)}
edata_schemes={‘edge_attributes’: Scheme(shape=(64,), dtype=torch.float64)})

However, after all this, I still get the fc_src error

AttributeError: ‘GATConv’ object has no attribute ‘fc_src’

mufeili · July 29, 2020, 5:58am

We are working on merging DGLGraph and DGLHeteroGraph and there will be no differences between them in the future.
How did you initialize a GATConv instance? With layer(block, (h, h_dst)), you need to initialize GATConv by passing a 2-tuple of int to in_feats.

balakkvj · July 29, 2020, 6:24am

I’m calling the following (from one of the tutorial scripts) to initialize the GATConv:

self.gat_layers.append(GATConv(in_dim, num_hidden, heads[0],feat_drop, attn_drop, negative_slope, False, self.activation))

It runs very smooth without the inference function. The errors appear only when I call the `layer(block, (h, h_dst)) in the inference scope.

mufeili · July 29, 2020, 10:15am

As I said, you need to use a 2-tuple of int for in_dim, representing the number of features for source nodes and destination nodes. Since you did not have any issues during training, did you use sampling during training? If you did not use sampling-based training, you shall not do sampling for inference as well.

zihao · August 3, 2020, 7:25am

Hi @balakkvj I’ve fixed that(now the there is no fc_src and fc_dst, we use fc for both homograph and heterograph) in the master branch, please try our nightly build version and see if the problem persists.