Testing free energy of hydration features via Jazzy for solubility prediction.

 This is a first of the “lets-test-this” posts: Google Colab notebook is here.

TLDR: Jazzy features does not have positive impact on solubility prediction.

Intro: hydrogen bonds drive all kind of interactions, and no doubt are important for drug design process. A nice package from AstraZeneca was published recently that predict of hydrogen-bond strengths and free energies of hydration of small molecules. It looks like that the main application of Jazzy is for local settings – protein-ligand strength estimation, SAR studies, but my interest was to test it on compound solubility prediction.

Goal: testing if free energy of hydration (without free parameters) from Jazzy has an impact on prediction of compound solubility.

Setup:

  • Data was taken from SolCuration, that's a derivate from Meng et al paper (AqSol) + additional curated datasets. In case of duplicated measurements, a median was taken and compounds with SD > 0.5 were removed (~1500 drop).
  • Three types of descriptors were used: Morgan fingerprints, 2D descriptors from rdkit, Jazzy features
  • 5-fold random cross-validation splits (no repeats)

Results:

Despite authors showed some positive correlation between compound activities and Jazzy features but for “global” solubility prediction (logS) results are not very promising, and Morgan alone or combination of Morgan + 2D descriptors showing better performance in a default  catboost regression setup (no HP search).


Descriptors

R2

Mean absolute percentage error

RMSE

Jazzy

0.483166

0.697701

1.209996

Morgan

0.647810

0.636827

1.005246

combo_Morgan_Jazzy

0.709274

0.522402

0.912780

combo_2D_Jazzy

0.755690

0.422069

0.832237

2D

0.756753

0.416461

0.830284

combo_2D_Jazzy_Morgan

0.770472

0.431028

0.806384

combo_Morgan_2D

0.772232

0.412204

0.803286

It might be way more impactful if Jazzy features are used for GNN-like architecture, but for tabular data representation 2D/Morgan features are providing better results. 

Links:

 

 

Comments

Popular posts from this blog

Paper comment: "MELLODDY: cross pharma federated learning at unprecedented scale unlocks benefits in QSAR without compromising proprietary information"

How many actual bits in unfolded Morgan fingerprint?

ChatGPT: Practical AI revolution is here... (1)