Sitemap
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Pages
Posts
portfolio
publications
In silico nanosafety assessment tools and their ecosystem-level integration prospect
Published in Nanoscale, 2021
In this review, the advances and challenges of in silico nanosafety assessment tools are carefully discussed. Furthermore, their integration at the ecosystem level may provide more comprehensive and reliable nanosafety assessment by establishing a site-specific interactive system among ENMs, abiotic environment, and biological communities.
Recommended citation: Hengjie Yu, Dan Luo, Limin Dai, Fang Cheng. In silico nanosafety assessment tools and their ecosystem-level integration prospect. Nanoscale, 2021, 13(19), 8722-8739.
Download Paper
Preparation and characterization of cross-linked starch nanocrystals and self-reinforced starch-based nanocomposite films
Published in International Journal of Biological Macromolecules, 2021
In this study, we prepared starch-based nanocomposite film reinforced by CSNCs. The strategy and method reported here are effective, inexpensive, biocompatible, and easily applicable in food field, thus providing a novel, nanomaterial approach for food preservation and packaging.
Recommended citation: Limin Dai, Hengjie Yu, Jun Zhang, Fang Cheng. Preparation and characterization of cross-linked starch nanocrystals and self-reinforced starch-based nanocomposite films. International Journal of Biological Macromolecules, 2021, 181, 868-876.
Download Paper
Predicting and investigating cytotoxicity of nanoparticles by translucent machine learning
Published in Chemosphere, 2021
This work presented an approach for uncovering causal structure in nanotoxicity datasets by mutual-validated and model-agnostic interpretation methods.
Recommended citation: Hengjie Yu, Zhilin Zhao, Fang Cheng. Predicting and investigating cytotoxicity of nanoparticles by translucent machine learning. Chemosphere, 2021, 276, 130164.
Download Paper
An analysis of factors affecting agricultural tractors’ reliability using random survival forests based on warranty data
Published in IEEE Access, 2022
Based on warranty data from an agricultural machinery manufacturing company in China, random survival forests (RSF), which is a machine learning method for survival analysis and provides various interpretation tools, was applied for reliability modeling in this study.
Recommended citation: Zhilin Zhao, Hengjie Yu and Fang Cheng. An analysis of factors affecting agricultural tractors’ reliability using random survival forests based on warranty data. IEEE Access, 2022, 10, 50183-50194.
Download Paper
Integrating machine learning interpretation methods for investigating nanoparticle uptake during seed priming and its biological effects
Published in Nanoscale, 2022
In this work, Post hoc interpretation and model-based interpretation of machine learning were integrated into two ways to understand the mechanism of nanoparticle uptake during seed priming and its biological effects on seed germination.
Recommended citation: Hengjie Yu, Zhilin Zhao, Da Liu, Fang Cheng. Integrating machine learning interpretation methods for investigating nanoparticle uptake during seed priming and its biological effects. Nanoscale, 2022, 14(41), 15305-15315.
Download Paper
Single-kernel classification of deoxynivalenol and zearalenone contaminated maize based on visible light imaging under ultraviolet light excitation combined with polarized light imaging
Published in Food Control, 2022
In this study, a single-kernel classification method based on image features under polarized light and UV light excitation was developed to achieve the effective classification of DON and ZEN contaminated maize.
Recommended citation: Maozhen Qu, Shijie Tian, Hengjie Yu, Da Liu, Chao Zhang, Yingchao He, Fang Cheng. Single-kernel classification of deoxynivalenol and zearalenone contaminated maize based on visible light imaging under ultraviolet light excitation combined with polarized light imaging. Food Control, 2023, 144, 109354.
Download Paper
Interpretable machine learning for investigating complex nanomaterial–plant–soil interactions
Published in Environmental Science: Nano, 2022
This work integrated the establishment, performance analysis, post hoc interpretation, and interpretation validation of light gradient boosting machine (LightGBM) model to investigate the root uptake of metal-oxide nanoparticles (MONPs) in the soil environment.
Recommended citation: Hengjie Yu, Zhilin Zhao, Dan Luo, Fang Cheng*. Interpretable machine learning for investigating complex nanomaterial–plant–soil interactions. Environmental Science: Nano, 2022, 9(11), 4305-4316.
Download Paper
Interpretable machine learning-accelerated seed treatment using nanomaterials for environmental stress alleviation
Published in Nanoscale, 2023
This work presented an interpretable structure–activity relationship (ISAR) approach based on interpretable machine learning for predicting and understanding stress mitigation effects of seed nanopriming.
Recommended citation: Hengjie Yu, Dan Luo, Sam F. Y. Li, Maozhen Qu, Da Liu, Yingchao He, Fang Cheng. Interpretable machine learning-accelerated seed treatment using nanomaterials for environmental stress alleviation. Nanoscale, 2023, 15(32), 13437-13449.
Download Paper
Averaging strategy for interpretable machine learning on small datasets to understand element uptake after seed nanotreatment
Published in Environmental Science & Technology, 2023
This work presented the averaging strategy for interpretable machine learning to understand the uptake and translocation of nanoparticles in crop seedlings after seed nanotreatment.
Recommended citation: Hengjie Yu, Shiyu Tang, Sam F. Y. Li, Fang Cheng. Averaging strategy for interpretable machine learning on small datasets to understand element uptake after seed nanotreatment. Environmental Science & Technology, 2023, 57(34): 12760-12770.
Download Paper
Ce-UiO-66-F4-based composites decorated with green carbon dots for universal adsorption of organic pollutants containing hydrogen bond donors and its application exploration
Published in Chemical Engineering Journal, 2024
In this study, Tea-L-CDs@Ce-UiO-66-F4 composites with strong hydrogen bonding was prepared to remove various dyes by modifying green CDs onto Ce-UiO-66-F4 MOF. Real-time water sample test and system test proved that the adsorbent has the potential for engineering application.
Recommended citation: Maozhen Qu, Hengjie Yu, Yingchao He, Weidong Xu, Da Liu, Fang Cheng. Ce-UiO-66-F4-based composites decorated with green carbon dots for universal adsorption of organic pollutants containing hydrogen bond donors and its application exploration. Chemical Engineering Journal, 2024, 486, 150266,
Download Paper
Optimizing benefit-risk trade-off in nano-agrochemicals through explainable machine learning: Beyond concentration
Published in Environmental Science: Nano, 2024
This work presented an explainable machine learning-driven multi-objective optimization approach to maximize the performance and minimize undesirable implications of seed nanopriming.
Recommended citation: Hengjie Yu, Shiyu Tang, Eslam M. Hamed, Sam F. Y. Li, Yaochu Jin, Fang Cheng. Optimizing benefit-risk trade-off in nano-agrochemicals through explainable machine learning: Beyond concentration. Environmental Science:Nano, 2024, 11, 3374-3389.
Download Paper
Unlocking the Potential of AI Researchers in Scientific Discovery: What Is Missing?
Published in arXiv, 2025
We project that AI4Science’s share of total publications in Nature Index journals will rise from 3.57% in 2024 to approximately 25% by 2050. This work proposes structured and actionable workflows, alongside key strategies to position AI researchers at the forefront of scientific discovery.
Recommended citation: Hengjie Yu, Yaochu Jin. Unlocking the Potential of AI Researchers in Scientific Discovery: What Is Missing? 2025, arXiv:2503.05822
Download Paper
A million-scale dataset and generalizable foundation model for nanomaterial-protein interactions
Published in arXiv, 2025
Here, we propose NanoPro-3M, the largest nanomaterial-protein interaction dataset to date, comprising over 3.2 million samples and 37,000 unique proteins. Leveraging this, we present NanoProFormer, a foundational model that predicts nanomaterial-protein affinities through multimodal representation learning, demonstrating strong generalization, handling missing features, and unseen nanomaterials or proteins.
Recommended citation: Hengjie Yu, Kenneth A. Dawson, Haiyun Yang, Shuya Liu, Yan Yan, Yaochu Jin. A million-scale dataset and generalizable foundation model for nanomaterial-protein interactions. 2025, arXiv:2507.14245
Download Paper
Empowering scientific discovery with explainable small domain-specific and large language models
Published in Artifical Intelligence Review, 2025
This review offers a forward-looking integration of explainable AI (XAI)-based research paradigms, encompassing small domain-specific models, large language models (LLMs), and agent-based large-small model collaboration.
Recommended citation: Hengjie Yu, Yizhi Wang, Tao Cheng, Yan Yan, Kenneth A. Dawson, Sam F. Y. Li, Yefeng Zheng, Yaochu Jin. Empowering scientific discovery with explainable small domain-specific and large language models. 2025, Artificial Intelligence Review, Accepted.
Download Paper
talks
Published:
Upon entering biological environments, nanomaterials rapidly adsorb ambient proteins, forming a “protein corona,” which profoundly influences their recognition, distribution, metabolism, and ultimate biological fate within organisms. This report will introduce our overall research strategy, leveraging AI methodologies to investigate the interactions among nanomaterials, biological systems, and the environment. We will highlight recent advancements in the construction of protein corona datasets and the development of foundational models. Furthermore, the report will briefly outline our work on model interpretability and knowledge consistency within the AI for Science (AI4Science) framework and share our reflections on the potential of AI researchers in scientific discovery.
Published:
AI holds enormous potential for understanding the interactions between nanomaterials and proteins. This talk discussed our work on protein corona datasets and underlying foundation models, and envisioned its broad application prospects in disease diagnosis. AI will enable high-throughput screening and efficient design of nanoenrichment methods for protein biomarkers.
Published:
Unlocking the potential of nanomaterials in medicine and environmental science hinges on understanding their interactions with proteins, a complex decision space where AI is poised to make a transformative impact. However, progress has been hindered by limited datasets and the restricted generalizability of existing models. Here, we propose NanoPro-3M, the largest nanomaterial-protein interaction dataset to date, comprising over 3.2 million samples and 37,000 unique proteins. Leveraging this, we present NanoProFormer, a foundational model that predicts nanomaterial-protein affinities through multimodal representation learning, demonstrating strong generalization, handling missing features, and unseen nanomaterials or proteins. We show that multimodal modeling significantly outperforms single-modality approaches and identifies key determinants of corona formation. Furthermore, we demonstrate its applicability to a range of downstream tasks through zero-shot inference and fine-tuning. Together, this work establishes a solid foundation for high-performance and generalized prediction of nanomaterial-protein interaction endpoints, reducing experimental reliance and accelerating various in vitro applications.
