Heterogeneous graph and features vector size equalization

PanMem · November 21, 2021, 8:56pm

Hi,

Based on the article https://docs.dgl.ai/en/latest/guide/training-link.html, I created a simple link prediction program. Heterogeneous graph, two types of nodes (user, product).

In all DGL documentation, the characteristics are given as follows:

hetero_graph.nodes ['user']. data ['feature'] = torch.randn (n_users, n_hetero_features)
hetero_graph.nodes ['item']. data ['feature'] = torch.randn (n_items, n_hetero_features)

What should I do when the feature vector is different for the user and the product? For example, a product has 10 features (price, category, promotion) and the user has only 2 (gender, country). Of course, the data is numerical.

Can I somehow process before network training? Eg dimension reduction?

Do I have to extend or change the model? I am using the model from the DGL documentation:

class Model(nn.Module):
    def __init__(self, in_features, hidden_features, out_features):
        super().__init__()
        self.sage = SAGE(in_features, hidden_features, out_features)
        self.pred = DotProductPredictor()
    def forward(self, g, neg_g, x):
        h = self.sage(g, x)
        return self.pred(g, h), self.pred(neg_g, h)

Thanks in advance for your answer

BarclayII · November 22, 2021, 7:07am

Yes. For instance, you could project each feature into a hidden embedding with torch.nn.Embedding and sum them up before feeding into your Model.

PanMem · November 22, 2021, 10:19pm

Thank you very much! I’m already starting to analyze and learn torch.nn.Embedding (maybe I’ll come back with questions).

However, I have one more question. If I create two types of users with features e.g. [0,0,0,0,0,1,1,1,1,1] and [1,1,1,1,1,0,0,0,0, 0] and two types of elements with the features [0,0,1,1] and [1,1,0,0]. Then I will create a graph with a combination of ‘user’, ‘clicked’, ‘item and’ item ‘,’ clicked-by ‘,’ user ', use torch.nn.Embedding and calculate the embedding correctly.

How do I apply PCA embedding dimensions reductions and visualize it (scatter_matrix). Can I see the appropriate distribution of users and items? For example, a user with identical features clicking on items with similar features close to each other. Is this a good way to check the code/algorithms?