Hi, I am working with a biological knowledge graph. In my graph, there are many edge types between some node types, for example: drug_binds_gene, drug_inhibits_gene, drug_activates_gene, etc.
I was thinking if maybe a possible strategy would be to have a single drug-gene edge type, but have a categorical edge feature such as [0, 0, 1, 0, 0, 1, 0] telling if the drug binds or inhibits or activates the gene.
Based on your experience, would that make sense/work? Thanks!!
I am planning on using Heterogeneous Graph Transformer (HGT) model. At this moment I have over 70 edge types, and with this strategy I would reduce it to 30.
Pros:
Faster and smaller;
Better generalization / less overfitting?
Cons:
Adding new edge types to a pretrained model can be cumbersome;
Edges might be less specialized and decrease accuracy;