Merging edge type strategy

Hi, I am working with a biological knowledge graph. In my graph, there are many edge types between some node types, for example: drug_binds_gene, drug_inhibits_gene, drug_activates_gene, etc.

I was thinking if maybe a possible strategy would be to have a single drug-gene edge type, but have a categorical edge feature such as [0, 0, 1, 0, 0, 1, 0] telling if the drug binds or inhibits or activates the gene.

Based on your experience, would that make sense/work? Thanks!!

I am planning on using Heterogeneous Graph Transformer (HGT) model. At this moment I have over 70 edge types, and with this strategy I would reduce it to 30.

Pros:
Faster and smaller;
Better generalization / less overfitting?

Cons:
Adding new edge types to a pretrained model can be cumbersome;
Edges might be less specialized and decrease accuracy;

That more or less depends on whether you are going to learn separate parameters for each edge type. If you have too many edge types then indeed you have the risk of overfitting. I would recommend grouping similar edge types together based on my experience. I’m not sure if it’s easy to do for drug knowledge graph though since I’m not an expert (e.g. is binding/inhibiting/activating so different that it’s worthwhile to allocate dedicated parameters? I’m not sure).

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.