Question about Alchemy dataset

Hello, I have two small questions:

  1. After I tried to install DGL of GPU, I found that Python regression - m MGCN - d Alchemy ran very slowly, and it didn’t seem to occupy GPU, but CPU was occupied by 100%.

  2. I found that there are two files in the alchemy dataset: dev_graphs.bin and dev_smiles.txt. Do I just need to prepare the smiles file? Where are the 12 quantum mechanical properties in alchemy? What’s in dev_graphs.bin? How to view the contents?

  1. Did you observe a low GPU usage or simply no GPU usage? If GPU is not used at all, make sure you have installed the GPU enabled PyTorch and DGL. GPU usage tends to be low for GNN based models as the computation is relatively light. You may increase GPU usage by setting a larger batch size in the configure. I guess the high CPU usage is due to the graph (molecule) batching operation, for which we in fact always construct new graphs.

  2. As the data processing step can be long, we’ve preprocessed the Alchemy dataset by converting the molecules into DGLGraphs, preparing node/edge features. If you want to check the original file/labels, you can set from_raw=True in dataset initialization, in which case we will automatically download the original data files and process the dataset from scratch.

    The dev_graphs.bin stores the preprocessed DGLGraphs for training. You can load them with load_graphs as in L212.