Prediction of chemical properties

Hello, I just contacted with deep learning and DGL. I hope to be able to use your regression model to predict molecular properties. How can I process my data and make it become the graph input of DGL?

We have several utilities for converting SMILES/RDKit molecule instances into DGLGraphs, see mol_to_bigraph, smiles_to_bigraph, mol_to_complete_graph and smiles_to_complete_graph.

Excuse me, can you tell me the general process of chemical property prediction with DGL? How do I use the methods in model zoo? What format should I process my own dataset into?

An overview of chemical property prediction is as follows:

  • Prepare data
    • Convert molecules into graphs
    • Initialize node/edge features, mainly with atom typing and bond typing
    • You may take a look at the example of Tox 21.
  • Prediction pipeline
    • Message passing to update node (atom) representations
    • Compute molecule level representations out of node (atom) and edge (bond) representations, which may be considered as a learned fingerprint.
    • Use a feedforward neural network that takes the molecule representations (vectors) as input and then output the prediction.
  • The complete pipelines can be found at https://github.com/dmlc/dgl/tree/master/examples/pytorch/model_zoo/chem/property_prediction. While we’ve implemented some models, you may still need to modify them a bit.

If you are not familiar with DGL and GNN, you may want to check this tutorial first.

1 Like

Hello, when I was running python expression.py - m MPNN - d Tox21, there was an error (import error: cannot import name ‘concatenator’ from ‘DGL. Data. Chem’). What’s the reason? I installed the CPU version of DGL

  1. I assume you mean “regression.py” by “expression.py”
  2. “Tox21” is for classification task rather than regression.
  3. Have you installed the latest version of DGL and cloned the latest repo? If not you can try so and see if the ImportError still exists.
  4. Generally, MPNN is not a fast model and training it on CPU can take a looooong time.

Yes, my DGL version is 0.4, and I cloned repo last night. I checked the code in repo and found no concatfeaturizer in dgl.data.chem or dgl.data.chem.utils.

The error is reported in the first line of configure.py: from dgl.data.chem import baseatomfeaturizer, canonicalatomfeaturizer, concatfeaturizer\

atom_type_one_hot, atom_degree_one_hot, atom_formal_charge, atom_num_radical_electrons, \

atom_hybridization_one_hot, atom_total_num_H_one_hot, BaseBondFeaturizer

Our latest version is 0.4.1, could you please try updating to that version? The “ConcatFeaturizer” can be found here.

It works. The program is downloading the alchemy dataset. Thank you very much. I should take a look at the details of the code.

Hello, I have two small questions:

  1. After I tried to install DGL of GPU, I found that Python regression - m MGCN - d Alchemy ran very slowly, and it didn’t seem to occupy GPU, but CPU was occupied by 100%.

  2. I found that there are two files in the alchemy dataset: dev_graphs.bin and dev_smiles.txt. Do I just need to prepare the smiles file? Where are the 12 quantum mechanical properties in alchemy? What’s in dev_graphs.bin? How to view the contents?

For future readers, this has been answered in a different thread.

sorry, I can’t find any information in your URL, please can you re-send it? Thank you.

What information are you looking for?

Thanks for your reply and I have found this tutorial. I have a question. Now I have picked the attention score of a molecule using GAT. I want to color molecule edge using attention score rather than normal network graph. How can I do?

RDKit has a nice support for visualizing molecules and you can use attention weights to color the bonds. For an example, see the visualization section here.

Thanks for your reply, I will try it.