Can we "talk" with the data? A tiny case for testing pandas_ai for human clearance data

This is a test of the pandas_ai by package by Gabriele Venturi. PandasAI is a layer between the pandas data frame and LLMs that simplifies interaction with the data and makes it more conversational. I did a quick test for AstraZeneca clearance data from ChEMBL to test if pandas_ai will ease data manipulation. 

Google colab notebook is here.


**TLDR**: the proposed solution could be more optimal, but it's working! Some issues with linking abstract things, but current LLMs do not have a defined knowledge graph, so there are no surprises. More packages will appear soon, that will ease interaction with private data.


The algorithm behind the package provides real code for data frame filtering and manipulation. Because of this, it will fail if the data types of columns are not correctly defined. 



Simple request: Provide names for 5 compounds with the highest LogP.

Answer: looks excellent, proper linking name to 'Molecule Name' and 'IDs' to 'Molecule ChEMBL ID'.



A bit more complex request: Provide IDs for 5 compounds with the highest LogP that does not violate Lipinski rule. IDs because not all compounds have names.

Answer: correct, it did connect 'Lipinski rule' with '#RO5 Violations'.


Puzzling request: Provide IDs for compounds that do not violate the Lipinski rule and have the top 5 largest LogP but activity was measured for humans only

Answer: None. I tried to see if algo can "connect" the human with Homo Sapiens and assay with Assay Organism, not successfully. 










Comments

Popular posts from this blog

Paper comment: "MELLODDY: cross pharma federated learning at unprecedented scale unlocks benefits in QSAR without compromising proprietary information"

ChatGPT will bring revolution to knowledge management and insight generation. Are we ready for this?