I’ve been using pytorch_geometric and I found that it doesn’t support sparse node feature matrix. I wonder whether dgl supports node feature matrix as a sparse tensor ? For batch_norm, gcn, etc. operations.

What do you mean by `For batch_norm, gcn, etc. operations`

? Those torch operator also doesn’t support sparse tensor. Why do you want sparse tensor?

Because my node feature matrix is a bipartite graph, hence there’re some columns(features) for one set of nodes and the rest columns(features) for the other set of nodes, which means there’re many zeros in the node feature matrix.

I want to increase my batch_size to accelerate the training.

Could you build a heterogeneous graph with your sparse node feature matrix as another edge type?

Like, if your node feature matrix is a sparse NxM matrix, then you can build a heterogeneous graph with two edge types: one for the original graph, and another edge type connecting your N nodes with M nodes of another type.

Sorry, I don’t know exactly what you mean. I guess you think I can build my graph as a bipartite graph with shape `NxM`

? However, I have no idea to use this graph with `dgl`

. And I think most operations will change their behavior like `batch_norm, gcn`

right? Since the graph has different layout.

Essentially say you have a graph with N nodes whose features are NxM sparse matrix. You can convert it into a heterogeneous graph with two node types: one node type, say A, with N nodes, and the other, say B, with M nodes. There will also be two edge types: one edge type, say “AA”, connecting from node type A to node type A as usual, and another edge type, say “BA”, connecting from node type B to node type A, whose adjacency matrix is your sparse feature matrix.

After you do this, you can apply whatever message passing on both edge types simultaneously with a heterogeneous graph model (say with RGCN).

If you still want to apply a homogeneous graph model (say GCN) and you don’t want to go heterogeneous, then you can first perform a message passing on the edge type “BA” so that the messages are gathered from node type B to node type A, using:

```
g.nodes['B'].data['x'] = node_embedding
g.update_all(fn.u_mul_e('x', 'edge_weight', 'm'), fn.sum('m', 'y'), etype='BA')
```

This is equivalent to performing a sparse-dense matrix multiplication. After that, you can take an edge type subgraph and do GCN, keeping the semantics of batch norm and GCN etc the same as you do right now:

```
g_aa = dgl.edge_type_subgraph(g, ['AA'])
pred = model(g_aa, g_aa.nodes['A'].data['y'])
```