Timestamp and categorical vertex features

mbaddar · May 20, 2019, 6:23am

I am creating a link prediction model on a user identity graph. Nodes represent users interactions with different components of a website including cookies, page views etc so many features are timestamps, cookie ids.

As far as I understand we can only use tensor-like features in DGL. So is there a way to represent other types of features in the learning process?

zhengda1936 · May 20, 2019, 9:34am

do you want to store features with different lengths? This might be difficult to handle.

In your use case, it seems that you can store timestamps and cookie ids as integers. If they are strings, you need to preprocess them.

mbaddar · May 22, 2019, 8:39am

I understand that I need to preprocess them. So from your answer I get that it doesn’t matter if the tensor contains numbers representing non-numeric features. Cookie ids cannot be added/multiplied.

minjie · May 22, 2019, 5:35pm

@mbaddar, that’s a good question. We currently don’t support node features other than tensor types. In DL world, people usually try to embed different kinds of input into fixed-length vectors. For example, one approach is to train a Char-LSTM to model the cookie ids (if cookie ids are important features in your application). Timestamps, page views could also be treated as real value inputs (with some proper normalization tricks). Categorical values are similar to words so word embedding techniques could be applied here. Once feature vectors are obtained, DGL can then handle the later pipeline like graph propagation and so on.

mbaddar · May 23, 2019, 6:35am

Thank you very much @minjie for your answer. I did not fully get what would a char-LSTM do for cookie ids. My use case is just to predict if two users have logged in from different devices or at two different points of time using different cookies/identifiers. I guess cookie id would be treated in that case as a categorical feature. Please feel free to correct me

zhengda1936 · May 24, 2019, 1:34am

For cookie Ids, you can use hot-encoding and use an embedding matrix to get its embedding. You can perform numeric operations on the embeddings of cookie Ids.