Testing free energy of hydration features via Jazzy for solubility prediction.

March 21, 2023

This is a first of the “lets-test-this” posts: Google Colab notebook is here.

TLDR: Jazzy features does not have positive impact on solubility prediction.

Intro: hydrogen bonds drive all kind of interactions, and no doubt are important for drug design process. A nice package from AstraZeneca was published recently that predict of hydrogen-bond strengths and free energies of hydration of small molecules. It looks like that the main application of Jazzy is for local settings – protein-ligand strength estimation, SAR studies, but my interest was to test it on compound solubility prediction.

Goal: testing if free energy of hydration (without free parameters) from Jazzy has an impact on prediction of compound solubility.

Setup:

Data was taken from SolCuration, that's a derivate from Meng et al paper (AqSol) + additional curated datasets. In case of duplicated measurements, a median was taken and compounds with SD > 0.5 were removed (~1500 drop).
Three types of descriptors were used: Morgan fingerprints, 2D descriptors from rdkit, Jazzy features
5-fold random cross-validation splits (no repeats)

Results:

Despite authors showed some positive correlation between compound activities and Jazzy features but for “global” solubility prediction (logS) results are not very promising, and Morgan alone or combination of Morgan + 2D descriptors showing better performance in a default catboost regression setup (no HP search).

Descriptors	R2	Mean absolute percentage error	RMSE
Jazzy	0.483166	0.697701	1.209996
Morgan	0.647810	0.636827	1.005246
combo_Morgan_Jazzy	0.709274	0.522402	0.912780
combo_2D_Jazzy	0.755690	0.422069	0.832237
2D	0.756753	0.416461	0.830284
combo_2D_Jazzy_Morgan	0.770472	0.431028	0.806384
combo_Morgan_2D	0.772232	0.412204	0.803286

It might be way more impactful if Jazzy features are used for GNN-like architecture, but for tabular data representation 2D/Morgan features are providing better results.

Links:

Search This Blog

High-Dimensional Pharmacology

Testing free energy of hydration features via Jazzy for solubility prediction.

Comments

Post a Comment

Popular posts from this blog

Paper comment: "MELLODDY: cross pharma federated learning at unprecedented scale unlocks benefits in QSAR without compromising proprietary information"

Can we "talk" with the data? A tiny case for testing pandas_ai for human clearance data

Kaggle Solubility competition: overview