Questions about DGL HeteroGraphs that you never dared to ask

(Also for @qillbel)

The array assigned for dgl.NTYPE and dgl.ETYPE should be an int64 vector, that is, it must be one-dimensional. Moreover, the values should be the index of the node type names and edge type names to be passed to dgl.to_hetero.

So let’s say that you have node types and edge types as follows:

g_nx_pet = networkx.Graph([(1, 2), (1, 3)])
g_dgl_pet = dgl.graph(g_nx_pet)
ntypes = ['tit','hea']
etypes = ['connects']

Then this doesn’t work since the array is two-dimensional float:

g_dgl_pet.ndata[dgl.NTYPE] = torch.tensor([[1.], [0.], [1.]])
g_dgl_pet.edata[dgl.ETYPE] = torch.tensor([[1.],[1.],[1.],[1.]])
hg_dgl_pet = dgl.to_hetero(g_dgl_pet, ntypes, etypes)

This also doesn’t work since there’s no 2nd edge type in your edge type list (the node and edge type IDs are labeled from 0).

g_dgl_pet.ndata[dgl.NTYPE] = torch.LongTensor([1, 0, 1])
g_dgl_pet.edata[dgl.ETYPE] = torch.LongTensor([1, 1, 1, 1])
hg_dgl_pet = dgl.to_hetero(g_dgl_pet, ntypes, etypes)

This will work:

g_dgl_pet.ndata[dgl.NTYPE] = torch.LongTensor([1., 0., 1.])
g_dgl_pet.edata[dgl.ETYPE] = torch.LongTensor([0.,0.,0.,0.])
hg_dgl_pet = dgl.to_hetero(g_dgl_pet, ntypes, etypes)
hg_dgl_pet.metagraph.edges()
# OutMultiEdgeDataView([('hea', 'tit'), ('hea', 'hea'), ('tit', 'hea')])
2 Likes

Massive thanks @BarclayII, finally got the logic behind this and now it works perfectly!

Back with more questions. :slight_smile:

  1. Is it correct that dgl.nn modules cannot be used with DGLHeteroGraphs?

More generally, I would be very interested in learning how graph neural networks handle typed nodes. If someone can point me to a useful, relatively simple explanation, I would be very thankful.

  1. Is it possible to get a readout for a DGLHeteroGraph by averaging node features, if the node features are of different dimensionality?

Hi,
for the first question, (well its general version) RGCN or HAN(Hetero attention networks) are a good answers.
For the second question I have the same issue , not clear for me how can we apply readout on an heterograph or a batched heterograph

1 Like

To add to the answer from @ar795, currently DGL GNN modules support unidirectional bipartite graphs as well. Simply supply a unidirectional bipartite graph as well as a pair of feature tensors on source/destination types and you should be good.

module = dgl.nn.SAGEConv(...)
g = dgl.bipartite(..., 'user', 'clicks', 'item')
result = module(g, (user_features, item_features))

For the second problem, currently we don’t have a one-liner for batched-heterograph readout, and you may need to do that yourself. @mufeili could probably add his thought on this.

1 Like

@thiippal See if the workaround here is good for you.

1 Like

Thank you @thiippal for this thread! Perfect for my question:

Is it possible to have multiple feature matrices for one node type?

I have a heterograph consisting of lets say node type A, B and C. There are n nodes of node type A.
I would like to add a feature matrix of (n x l) and another feature matrix of (n x k) for node type A. For the other node type B with m nodes, I would also like to add multiple feature matrices, that will have different sizes. Let’s say we add a feature matrix of (m x p) and one matrix of (m x q) for node type B.

How could I implement this?

Thank you in advance for your answer!

Hi @sopkri, does the example below help?

import dgl
import torch

g = dgl.heterograph({
    ('user', 'follows', 'user'): [(0, 1), (1, 2)],
    ('user', 'plays', 'game'): [(0, 0), (1, 0), (1, 1), (2, 1)],
    ('developer', 'develops', 'game'): [(0, 0), (1, 1)],
})
g.nodes['user'].data['h1'] = torch.randn(3, 1)
g.nodes['user'].data['h2'] = torch.randn(3, 2)
1 Like

Hi,
Please, I have a questions regarding graph representation learning :
Is it possible to learn a graph-level representation for an heterograph ?
Thanks.

If you have multiple heterogeneous graphs and you want to perform graph property prediction, then I think graph-level representations for heterogeneous graphs are natural.

Thanks for your reply.
In the case of an heterograph with multiple types of nodes and multiple types of edges, we can obtain a node level representation (using for example RGCN model), can we based on these representation generate a vector representation (using some readout functions the same way we did with homogeneous graphs)

In general we can develop something like below following our DGLHeteroGraph.multi_update_all API:

def multi_node_readout(g, nfeats_dict, readout_dict, cross_reducer=None):
    """Readout for heterogeneous graphs based on node features.

    Parameters
    ----------
    g : DGLHeteroGraph
    nfeats_dict : dict
        Mapping node type to the corresponding node features
    readout_dict : dict
        Mapping node type to the corresponding readout function for 
        node features
    cross_reducer : str or None
        The way to combine readout from different node types, which can be 
        sum, min, max, mean, stack. If None, simply return them in a list.
    """

We can also have a counterpart of the one above for edge features dgl.multi_edge_readout(). What do you think?

1 Like

Thank you @mufeili for the reply. This is exactly what I was looking for! Sometimes it can be so easy and logical.

1 Like

Is there a function to generate a train/validation/test split for a heterograph?

I am trying to do link prediction on a heterograph with 3 different node and 6 different edge types.
What I am considering here is that for a heterograph, it would be necessary to ensure that

  1. for every edge type, there is the same proportion of edges in the train and test sets, so that there is no bias towards any type of edges
  2. all nodes are connected after sampling and there are no nodes without edges

I was thinking of some function similar to the generate_sampled_graph_and_labels() function for the RGCN.

Is there any function in the DGL library that could produce test and train heterographs that fulfilled these ( 1. and 2. ) conditions?

Thank you a lot in advance!

Hi,
I was wondering what is the recommended practice to store a big scale (~600K) HeteroGraphs to disk?
It seems that dgl.data.utils.save_graphs does not support HeteroGraph right now.

My current approach is to store hetero edges as dict and store corresponding node features in np.array. However, the data loading becomes incredible slow since I need to reconstruct every HeteroGraph from edge dict. Moreover, I could not pre-loaded HeteroGraphs in main memory because the whole dataset is too huge.

Thanks!

Sorry for the delay to answer. Regarding the multi_node_readout , you wrote that we apply readout function for each type of nodes and then based on the resulting readouts we combine them using the cross reducer parameter. This answers perfectly my question. However, by generating a representation of the whole graph this way, we will not lose a lot of information ? ( is it the best way to get a graph level representation ?)

Another question about’ heterograph’ graph level representation learning:
Given an heterograph with N types of nodes and M types of edges, we can pass it through a RGCN-based architecture and get then a node level representation for all type of nodes( dictionnary of hidden representations).
To obtain the graph level representation : can we just concat all the node representations ? do you think that applying other readout function(sum, average) will lead to a poor quality of the representation ?

You mean the graph classification? You can try concat all the node representations just like homogeneous graph classification.

For the first question, the easiest way is to split the raw input data by split triplets according to edge type. And we can keep the split files.
If you follows generate_sampled_graph_and_labels(), adding reverse edges ensure that nodes in the sampled has connected edges.

Can you try the approach suggested in this issue?