I had a question about the optimizer and emb_optimizer in the entity_classify_mp.py, which has been answered by Soji Adeshina. He explained it very clearly so I shared his answer here. I think it would be helpful for those who want to learn from this code for how to do node classification.
My Question: “in the entity_classify_mp.py, there are an optimizer (ln 301) and an embed_optimizer (ln 306). Since optimizer includes the parameters in the embed_layer, why do we still need an embed_optimizer here? What does this embed_optimizer do? What does dgl_sparse mean?”
His Answer:“There are two optimizers because the embed_layer is the sparse implementation of embedding. In that case the main optimizer will not have the Node embedding/parameters because we need to use the sparse implementation of Adam to update the sparse node embedding/parameters. dgl_sparse is a flag of whether to use the dgl version of sparse embedding or the default pytorch version. dgl version may be faster. If embed_layer is using dense embeddings then you’re right that we don’t need another embed_optimizer.”
“The embed layer is doing two things there. For nodes that have features there are parameters that are used to project from the original feature-dim to the hidden dim. That part is always dense and is updated by the main optimizer. The second thing is for node that don’t have features, it’s embedding them directly to hidden dim. That’s the part that is sparse and is updated by the sparse optimizer”
“So embed_layer.embeds is actually the projection parameter and embed_layer.node_embeds or embed_layer.dgl_emb are the node embeddings”