Poor performance of DGL's MPNN.py compared to other benchmarks

Hello, I’m trying to train a custom MPNN to predict aqueous solubility via the ESOL/Delaney dataset. Existing benchmarks on this dataset (https://github.com/swansonk14/chemprop, search for “results”) suggest a test set RMSE of 0.6 should be achievable. When I train DGL’s MPNN.py, the results are significantly inferior – depending on how I set hyperparameters, I typically get a test set RMSE of 0.9 - 1.2, which is worse than a regression forest with RDKit features.

Has anyone here had success with MPNN.py in achieving good results?

Hi, thank you for the report. I will take a look at this.

Thank you!

For reference:
DeepChem has an example analysis script for the dataset I used here: https://github.com/deepchem/deepchem/blob/master/examples/delaney/delaney_graph_conv.py

With ChemProp, you can run
python train.py --data_path data/delaney.csv --dataset_type regression --save_dir /path/to/directory

Neither requires any hyperparameter tuning to get results that, while not state-of-the-art, do beat models using conventional RDKit features.

1 Like


I’ve done an implementation for MPNN on the ESOL dataset here. If you want to use it you can install the branch from source and run python regression.py -d ESOL -m MPNN -s RMSE. Currently it can reach a test score around 0.8, better than the example released by DeepChem. Basically I just included more features as in chemprop. I haven’t performed a careful tuning so it can probably be better.

The chemprop code can generally reach 0.7. Strictly speaking it is for a different model reported in Analyzing Learned Molecular Representations for
Property Prediction
. Their code also did more tricks like feature normalization, learning rate decay, etc.

I will let you know if there is an update.