Any Options for Non-numeric Graph Attributes?

Are there any options in DGL for binary classification of graphs with non-numeric attributes?

The graph, and its nodes and edges have several attributes of type string and some of the strings are of variable length (some long, some short).

Anything useful in DGL?


Unfortunately we don’t support non-tensor types on DGLGraph, because we expect all data on graph can be computed.
You can add extra fields, such as g.string_data = {...}, or use an extra variable to store this. You can preprocess those by RNN (LSTM), bag of words, or any sentence-level representations. Then set those representations as the node/edge features. Hope this could help!

Thanks for the tip.
I’m familiar with bag-of-words and some others.
Can you point me to an example using RNN (LSTM) for this?
Are there any utilities in DGL for this?

No dgl currently can not handle non-numerical features.

If you need to store discrete attributes (e.g. user, word, …) you can first turn them into index (integer).

It seems in your case, each node has a variable-length sequence. My suggestion is that you do not have to store the features on DGLGraph but process them outside of the DGL (e.g. use an RNN/LSTM to get a fixed-length feature of each sequence).

If you need to explicitly model the interactions between two items belonging to the sequence of different nodes, I recommend you to split each node into several nodes where each node represent a single item in the sequence, and build a new graph upon these nodes.