Publications

You can also find my articles on my Google Scholar profile.

2025

Empowering scientific discovery with explainable small domain-specific and large language models

Published in Artifical Intelligence Review, 2025

This review offers a forward-looking integration of explainable AI (XAI)-based research paradigms, encompassing small domain-specific models, large language models (LLMs), and agent-based large-small model collaboration.

Recommended citation: Hengjie Yu, Yizhi Wang, Tao Cheng, Yan Yan, Kenneth A. Dawson, Sam F. Y. Li, Yefeng Zheng, Yaochu Jin. Empowering scientific discovery with explainable small domain-specific and large language models. 2025, Artificial Intelligence Review, Accepted.
Download Paper

A million-scale dataset and generalizable foundation model for nanomaterial-protein interactions

Published in arXiv, 2025

Here, we propose NanoPro-3M, the largest nanomaterial-protein interaction dataset to date, comprising over 3.2 million samples and 37,000 unique proteins. Leveraging this, we present NanoProFormer, a foundational model that predicts nanomaterial-protein affinities through multimodal representation learning, demonstrating strong generalization, handling missing features, and unseen nanomaterials or proteins.

Recommended citation: Hengjie Yu, Kenneth A. Dawson, Haiyun Yang, Shuya Liu, Yan Yan, Yaochu Jin. A million-scale dataset and generalizable foundation model for nanomaterial-protein interactions. 2025, arXiv:2507.14245
Download Paper

Unlocking the Potential of AI Researchers in Scientific Discovery: What Is Missing?

Published in arXiv, 2025

We project that AI4Science’s share of total publications in Nature Index journals will rise from 3.57% in 2024 to approximately 25% by 2050. This work proposes structured and actionable workflows, alongside key strategies to position AI researchers at the forefront of scientific discovery.

Recommended citation: Hengjie Yu, Yaochu Jin. Unlocking the Potential of AI Researchers in Scientific Discovery: What Is Missing? 2025, arXiv:2503.05822
Download Paper

2024

Optimizing benefit-risk trade-off in nano-agrochemicals through explainable machine learning: Beyond concentration

Published in Environmental Science: Nano, 2024

This work presented an explainable machine learning-driven multi-objective optimization approach to maximize the performance and minimize undesirable implications of seed nanopriming.

Recommended citation: Hengjie Yu, Shiyu Tang, Eslam M. Hamed, Sam F. Y. Li, Yaochu Jin, Fang Cheng. Optimizing benefit-risk trade-off in nano-agrochemicals through explainable machine learning: Beyond concentration. Environmental Science:Nano, 2024, 11, 3374-3389.
Download Paper

Ce-UiO-66-F4-based composites decorated with green carbon dots for universal adsorption of organic pollutants containing hydrogen bond donors and its application exploration

Published in Chemical Engineering Journal, 2024

In this study, Tea-L-CDs@Ce-UiO-66-F4 composites with strong hydrogen bonding was prepared to remove various dyes by modifying green CDs onto Ce-UiO-66-F4 MOF. Real-time water sample test and system test proved that the adsorbent has the potential for engineering application.

Recommended citation: Maozhen Qu, Hengjie Yu, Yingchao He, Weidong Xu, Da Liu, Fang Cheng. Ce-UiO-66-F4-based composites decorated with green carbon dots for universal adsorption of organic pollutants containing hydrogen bond donors and its application exploration. Chemical Engineering Journal, 2024, 486, 150266,
Download Paper

2023

Averaging strategy for interpretable machine learning on small datasets to understand element uptake after seed nanotreatment

Published in Environmental Science & Technology, 2023

This work presented the averaging strategy for interpretable machine learning to understand the uptake and translocation of nanoparticles in crop seedlings after seed nanotreatment.

Recommended citation: Hengjie Yu, Shiyu Tang, Sam F. Y. Li, Fang Cheng. Averaging strategy for interpretable machine learning on small datasets to understand element uptake after seed nanotreatment. Environmental Science & Technology, 2023, 57(34): 12760-12770.
Download Paper

Interpretable machine learning-accelerated seed treatment using nanomaterials for environmental stress alleviation

Published in Nanoscale, 2023

This work presented an interpretable structure–activity relationship (ISAR) approach based on interpretable machine learning for predicting and understanding stress mitigation effects of seed nanopriming.

Recommended citation: Hengjie Yu, Dan Luo, Sam F. Y. Li, Maozhen Qu, Da Liu, Yingchao He, Fang Cheng. Interpretable machine learning-accelerated seed treatment using nanomaterials for environmental stress alleviation. Nanoscale, 2023, 15(32), 13437-13449.
Download Paper

2022

Interpretable machine learning for investigating complex nanomaterial–plant–soil interactions

Published in Environmental Science: Nano, 2022

This work integrated the establishment, performance analysis, post hoc interpretation, and interpretation validation of light gradient boosting machine (LightGBM) model to investigate the root uptake of metal-oxide nanoparticles (MONPs) in the soil environment.

Recommended citation: Hengjie Yu, Zhilin Zhao, Dan Luo, Fang Cheng*. Interpretable machine learning for investigating complex nanomaterial–plant–soil interactions. Environmental Science: Nano, 2022, 9(11), 4305-4316.
Download Paper

Single-kernel classification of deoxynivalenol and zearalenone contaminated maize based on visible light imaging under ultraviolet light excitation combined with polarized light imaging

Published in Food Control, 2022

In this study, a single-kernel classification method based on image features under polarized light and UV light excitation was developed to achieve the effective classification of DON and ZEN contaminated maize.

Recommended citation: Maozhen Qu, Shijie Tian, Hengjie Yu, Da Liu, Chao Zhang, Yingchao He, Fang Cheng. Single-kernel classification of deoxynivalenol and zearalenone contaminated maize based on visible light imaging under ultraviolet light excitation combined with polarized light imaging. Food Control, 2023, 144, 109354.
Download Paper

Integrating machine learning interpretation methods for investigating nanoparticle uptake during seed priming and its biological effects

Published in Nanoscale, 2022

In this work, Post hoc interpretation and model-based interpretation of machine learning were integrated into two ways to understand the mechanism of nanoparticle uptake during seed priming and its biological effects on seed germination.

Recommended citation: Hengjie Yu, Zhilin Zhao, Da Liu, Fang Cheng. Integrating machine learning interpretation methods for investigating nanoparticle uptake during seed priming and its biological effects. Nanoscale, 2022, 14(41), 15305-15315.
Download Paper

An analysis of factors affecting agricultural tractors’ reliability using random survival forests based on warranty data

Published in IEEE Access, 2022

Based on warranty data from an agricultural machinery manufacturing company in China, random survival forests (RSF), which is a machine learning method for survival analysis and provides various interpretation tools, was applied for reliability modeling in this study.

Recommended citation: Zhilin Zhao, Hengjie Yu and Fang Cheng. An analysis of factors affecting agricultural tractors’ reliability using random survival forests based on warranty data. IEEE Access, 2022, 10, 50183-50194.
Download Paper

2021

Preparation and characterization of cross-linked starch nanocrystals and self-reinforced starch-based nanocomposite films

Published in International Journal of Biological Macromolecules, 2021

In this study, we prepared starch-based nanocomposite film reinforced by CSNCs. The strategy and method reported here are effective, inexpensive, biocompatible, and easily applicable in food field, thus providing a novel, nanomaterial approach for food preservation and packaging.

Recommended citation: Limin Dai, Hengjie Yu, Jun Zhang, Fang Cheng. Preparation and characterization of cross-linked starch nanocrystals and self-reinforced starch-based nanocomposite films. International Journal of Biological Macromolecules, 2021, 181, 868-876.
Download Paper

In silico nanosafety assessment tools and their ecosystem-level integration prospect

Published in Nanoscale, 2021

In this review, the advances and challenges of in silico nanosafety assessment tools are carefully discussed. Furthermore, their integration at the ecosystem level may provide more comprehensive and reliable nanosafety assessment by establishing a site-specific interactive system among ENMs, abiotic environment, and biological communities.

Recommended citation: Hengjie Yu, Dan Luo, Limin Dai, Fang Cheng. In silico nanosafety assessment tools and their ecosystem-level integration prospect. Nanoscale, 2021, 13(19), 8722-8739.
Download Paper