Evaluation beyond goodness of fit: Quantifying biophysical alignment of AI models for kinase-centric drug discovery

Figure 1: Graphical abstract. Using PLIP, we derive PL-interaction fingerprints for Kinodata3D, a structure-based kinase inhibition potency prediction dataset. We infer the statistical association between interaction fingerprint (IFP) bits and inhibition potency (upper row), which, in conjunction with an explainable artificial intelligence (XAI) method, serves as a resource for evaluating PLI-alignment of inhibition potency message-passingneural networks (NNs) on a residue-level (lower row).

DOI

10.1016/j.ailsci.2026.100169

Abstract

Accurately modeling interactions between small molecules and proteins using machine learning (ML) remains a central challenge in modern drug discovery. In particular, achieving and reliably assessing generalization in ML-based models for inhibition potency prediction has proven difficult. Focusing on the protein kinase domain, we investigate both the predictive performance and biophysical plausibility of graph neural networks (GNNs) trained to predict inhibition potency, with and without access to docked protein–ligand complexes. To this end, we introduce a model-agnostic explainable AI framework that interprets model predictions by attributing them to specific residues within the adenosine triphosphate (ATP)-binding pocket of kinases. On the basis of discrete interaction fingerprints (IFPs), we employ robust statistical methods and feature selection to derive reference interaction profiles that serve as ground-truth explanations. Our analysis identifies 131 distinct residues across 29 unique kinases that play key roles in linking IFP and inhibition potency data. This enables a complementary benchmark, spanning nearly 10,000 docked complexes, to quantitatively assess model quality not only in terms of predictive accuracy but also biophysical alignment. The benchmark quantifies to what extent the model’s predictive mechanism aligns with meaningful biophysical mechanisms—specifically, whether it recognizes protein regions actively involved in ligand binding. In a pilot study, we find that models incorporating 3D structural features exhibit moderate biophysical alignment, whereas baseline models without 3D information show no strong alignment, underscoring the structural nature of molecular binding, and importance thereof for training GNN models.

DOI#

10.1016/j.ailsci.2026.100169#

Abstract#

DOI

10.1016/j.ailsci.2026.100169

Abstract