Process_movielens1m code does not encode user_id and movie_id

rny · October 16, 2022, 7:33pm

Hello,

I am new to GNN and also DGL. I was following pinsage example folder to do link prediction using movielens dataset. (code is here dgl/examples/pytorch/pinsage at master · dmlc/dgl · GitHub)

I noticed that in the process_movielens1m.py file, user_id and movie_id are not label encoded. Is there a reason for that?
A follow up question, when recommendations are generated in the evaluation.py are they the original movie_ids? if not how can I map them to the original ids? (see code: dgl/evaluation.py at master · dmlc/dgl · GitHub)

@BarclayII may be you can answer that question thanks!

BarclayII · October 17, 2022, 8:19am

user_id and movie_id themselves are not used in the model, so I didn’t encode them.

They are not original IDs. Although you could add them as a feature during graph construction (e.g. in dgl/process_movielens1m.py at master · dmlc/dgl · GitHub, you add another feature named original_id or something in the same fashion as gender or age).

rny · October 17, 2022, 10:40pm

Thanks for your quick response. one follow up question, when I print out h_item calculated via h_item = torch.cat(h_item_batches, 0)), its shape is of (number_of_unique_item_ids, embd_dim). so h_item corresponds to item node embeddings, doesnt it?

BarclayII · October 20, 2022, 2:08am

Yes.
(20 characters limit)

system · November 19, 2022, 2:08am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.