Talks and presentations

September 01, 2025

A Million-scale Dataset and Generalizable Foundation Model for Nanomaterial-Protein Interactions, ChinaNano 2025, Beijing International Convention Center, Beijing, China

Unlocking the potential of nanomaterials in medicine and environmental science hinges on understanding their interactions with proteins, a complex decision space where AI is poised to make a transformative impact. However, progress has been hindered by limited datasets and the restricted generalizability of existing models. Here, we propose NanoPro-3M, the largest nanomaterial-protein interaction dataset to date, comprising over 3.2 million samples and 37,000 unique proteins. Leveraging this, we present NanoProFormer, a foundational model that predicts nanomaterial-protein affinities through multimodal representation learning, demonstrating strong generalization, handling missing features, and unseen nanomaterials or proteins. We show that multimodal modeling significantly outperforms single-modality approaches and identifies key determinants of corona formation. Furthermore, we demonstrate its applicability to a range of downstream tasks through zero-shot inference and fine-tuning. Together, this work establishes a solid foundation for high-performance and generalized prediction of nanomaterial-protein interaction endpoints, reducing experimental reliance and accelerating various in vitro applications.

August 12, 2025

Protein corona foundation model and its application in disease diagnosis, Academic exchange activity, Yuhang Campus, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China

AI holds enormous potential for understanding the interactions between nanomaterials and proteins. This talk discussed our work on protein corona datasets and underlying foundation models, and envisioned its broad application prospects in disease diagnosis. AI will enable high-throughput screening and efficient design of nanoenrichment methods for protein biomarkers.

April 18, 2025

Protein Corona Dataset and Foundation Model—Explainability and Knowledge Consistency in AI4Science, AI·Proteomics·Medicine 2025 Spring Mini-Symposium, Yunqi Campus, Westlake University, Hangzhou, China

Upon entering biological environments, nanomaterials rapidly adsorb ambient proteins, forming a “protein corona,” which profoundly influences their recognition, distribution, metabolism, and ultimate biological fate within organisms. This report will introduce our overall research strategy, leveraging AI methodologies to investigate the interactions among nanomaterials, biological systems, and the environment. We will highlight recent advancements in the construction of protein corona datasets and the development of foundational models. Furthermore, the report will briefly outline our work on model interpretability and knowledge consistency within the AI for Science (AI4Science) framework and share our reflections on the potential of AI researchers in scientific discovery.