Prediction of chemical properties

rookiecoder-chen · November 11, 2019, 3:03pm

Hello, I just contacted with deep learning and DGL. I hope to be able to use your regression model to predict molecular properties. How can I process my data and make it become the graph input of DGL?

mufeili · November 11, 2019, 4:24pm

We have several utilities for converting SMILES/RDKit molecule instances into DGLGraphs, see mol_to_bigraph, smiles_to_bigraph, mol_to_complete_graph and smiles_to_complete_graph.

rookiecoder-chen · November 12, 2019, 12:19am

Excuse me, can you tell me the general process of chemical property prediction with DGL? How do I use the methods in model zoo? What format should I process my own dataset into?

mufeili · November 12, 2019, 2:47am

An overview of chemical property prediction is as follows:

Prepare data
- Convert molecules into graphs
- Initialize node/edge features, mainly with atom typing and bond typing
- You may take a look at the example of Tox 21.
Prediction pipeline
- Message passing to update node (atom) representations
- Compute molecule level representations out of node (atom) and edge (bond) representations, which may be considered as a learned fingerprint.
- Use a feedforward neural network that takes the molecule representations (vectors) as input and then output the prediction.
The complete pipelines can be found at https://github.com/dmlc/dgl/tree/master/examples/pytorch/model_zoo/chem/property_prediction. While we’ve implemented some models, you may still need to modify them a bit.

If you are not familiar with DGL and GNN, you may want to check this tutorial first.

rookiecoder-chen · November 12, 2019, 10:42am

Hello, when I was running python expression.py - m MPNN - d Tox21, there was an error (import error: cannot import name ‘concatenator’ from ‘DGL. Data. Chem’). What’s the reason? I installed the CPU version of DGL

mufeili · November 12, 2019, 12:23pm

I assume you mean “regression.py” by “expression.py”
“Tox21” is for classification task rather than regression.
Have you installed the latest version of DGL and cloned the latest repo? If not you can try so and see if the ImportError still exists.
Generally, MPNN is not a fast model and training it on CPU can take a looooong time.

rookiecoder-chen · November 12, 2019, 1:26pm

Yes, my DGL version is 0.4, and I cloned repo last night. I checked the code in repo and found no concatfeaturizer in dgl.data.chem or dgl.data.chem.utils.

The error is reported in the first line of configure.py: from dgl.data.chem import baseatomfeaturizer, canonicalatomfeaturizer, concatfeaturizer\

atom_type_one_hot, atom_degree_one_hot, atom_formal_charge, atom_num_radical_electrons, \

atom_hybridization_one_hot, atom_total_num_H_one_hot, BaseBondFeaturizer

mufeili · November 12, 2019, 1:40pm

Our latest version is 0.4.1, could you please try updating to that version? The “ConcatFeaturizer” can be found here.

rookiecoder-chen · November 12, 2019, 1:59pm

It works. The program is downloading the alchemy dataset. Thank you very much. I should take a look at the details of the code.

rookiecoder-chen · November 13, 2019, 12:46pm

Hello, I have two small questions:

After I tried to install DGL of GPU, I found that Python regression - m MGCN - d Alchemy ran very slowly, and it didn’t seem to occupy GPU, but CPU was occupied by 100%.
I found that there are two files in the alchemy dataset: dev_graphs.bin and dev_smiles.txt. Do I just need to prepare the smiles file? Where are the 12 quantum mechanical properties in alchemy? What’s in dev_graphs.bin? How to view the contents?

mufeili · November 14, 2019, 7:51am

For future readers, this has been answered in a different thread.

monsterZeng · November 12, 2020, 9:14am

sorry, I can’t find any information in your URL, please can you re-send it? Thank you.

mufeili · November 14, 2020, 8:47am

What information are you looking for?

monsterZeng · November 19, 2020, 3:01am

Thanks for your reply and I have found this tutorial. I have a question. Now I have picked the attention score of a molecule using GAT. I want to color molecule edge using attention score rather than normal network graph. How can I do?

mufeili · November 19, 2020, 3:27am

RDKit has a nice support for visualizing molecules and you can use attention weights to color the bonds. For an example, see the visualization section here.

monsterZeng · November 19, 2020, 5:59am

Thanks for your reply, I will try it.