Testing free energy of hydration features via Jazzy for solubility prediction.
This is a first of the “lets-test-this” posts: Google Colab notebook is here.
TLDR: Jazzy features does
not have positive impact on solubility prediction.
Intro: hydrogen bonds drive all
kind of interactions, and no doubt are important for drug design process. A
nice package from AstraZeneca was published recently that predict of
hydrogen-bond strengths and free energies of hydration of small molecules. It
looks like that the main application of Jazzy is for local settings – protein-ligand
strength estimation, SAR studies, but my interest was to test it on compound solubility
prediction.
Goal: testing if free energy
of hydration (without free parameters) from Jazzy has an impact on prediction
of compound solubility.
Setup:
- Data was taken from SolCuration, that's a derivate from Meng et al paper (AqSol) + additional curated datasets. In case of duplicated measurements, a median was taken and compounds with SD > 0.5 were removed (~1500 drop).
- Three types of descriptors were used: Morgan fingerprints, 2D descriptors from rdkit, Jazzy features
- 5-fold random cross-validation splits (no repeats)
Results:
Despite authors showed some positive correlation between compound activities and Jazzy features but for “global” solubility prediction (logS) results are not very promising, and Morgan alone or combination of Morgan + 2D descriptors showing better performance in a default catboost regression setup (no HP search).
Descriptors |
R2 |
Mean absolute percentage error |
RMSE |
Jazzy |
0.483166 |
0.697701 |
1.209996 |
Morgan |
0.647810 |
0.636827 |
1.005246 |
combo_Morgan_Jazzy |
0.709274 |
0.522402 |
0.912780 |
combo_2D_Jazzy |
0.755690 |
0.422069 |
0.832237 |
2D |
0.756753 |
0.416461 |
0.830284 |
combo_2D_Jazzy_Morgan |
0.770472 |
0.431028 |
0.806384 |
combo_Morgan_2D |
0.772232 |
0.412204 |
0.803286 |
It might be way more impactful if Jazzy features are used for GNN-like architecture, but for tabular data representation 2D/Morgan features are providing better results.
Links:
Comments
Post a Comment