Yaxin Du
PhD Student
Shanghai Jiao Tong University
I am a Ph.D. student at Shanghai Jiao Tong University, advised by Prof. Siheng Chen. My research interests include LLM agents, coding LLMs, and federated learning. Recently, I have been working on agentic systems, coding-oriented language models, and related benchmarks and resources for large model research.
I will attend ICLR 2026 and would be happy to connect about research opportunities and collaborations.
News
- Apr 2026 Attending ICLR 2026 and presenting InfoMosaicBench; happy to connect in person.
- Apr 2026 Two papers accepted to ACL 2026: MCP-Flow and VLMGuard-R1.
- Apr 2026 Released the InCoder paper series. [Paper]
- Feb 2026 Released G²-Reader.
- Jan 2026 Joined iQuest Lab, Ubiquant, as a Research Intern, working on foundation model training and coding LLM mid-training.
- Jan 2026 InfoMosaicBench was accepted to ICLR 2026.
- Dec 2025 Presented SWE-Dev at the NeurIPS 2025 Workshop.
- Sep 2025 Joined TikTok AI Innovation Center as a Research Intern, mentored by Qian Liu.
Publications
Selected Publications
-
ICLR 2026 InfoMosaic-Bench: Evaluating Multi-Source Information Seeking in Tool-Augmented Agents
arXiv preprint arXiv:2510.02271 [Paper] -
arXiv 2026 G2-Reader: Dual Evolving Graphs for Multimodal Document QA
[Paper] -
arXiv 2025 SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development
NeurIPS 2025 DL4C Workshop [Paper] [Code] -
ACL 2025 Findings FedDQC: Data Quality Control in Federated Instruction-tuning of Large Language Models
[Paper] -
ICLR 2024 Workshop Enhancing Data Quality in Federated Fine-tuning of Large Language Models
[Paper] -
ICLR 2025 Self-evolving Multi-agent Collaboration Networks for Software Development
-
arXiv 2025 MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems
[Paper] -
KDD 2024 OpenFedLLM: Training Large Language Models on Decentralized Private Data via Federated Learning
[Paper] -
NeurIPS 2024 FedLLM-Bench: Realistic Benchmarks for Federated Learning of Large Language Models
[Paper] -
ICLR 2024 Fake It Till Make It: Federated Learning with Consensus-Oriented Generation
[Paper]
Other Publications
-
arXiv 2025 VLMGuard-R1: Proactive Safety Alignment for VLMs via Reasoning-Driven Prompt Optimization
[Paper] -
arXiv 2025 BrowseMaster: Towards Scalable Web Browsing via Tool-Augmented Programmatic Agent Pair
[Paper] -
ACL 2026 MCP-Flow: Facilitating LLM Agents to Master Real-World, Diverse and Scaling MCP Tools
[Paper] -
arXiv 2025 Federated Instruction Tuning of LLMs with Domain Coverage Augmentation
-
arXiv 2025 Selecting Auxiliary Data via Neural Tangent Kernels for Low-Resource Domains
[Paper] -
IET RSN 2024 Radar-based Human Activity Recognition Using Denoising Techniques to Enhance Classification Accuracy
-
arXiv 2026 InCoder-32B: Code Foundation Model for Industrial Scenarios
[Paper] -
arXiv 2026 InCoder-32B-Thinking: Industrial Code World Model for Thinking
[Paper] -
RadarConf 2022 A ViT Approach for Short-range Behaviour Recognition Using Radar Signals
-
CIE Radar 2021 Radar-based Human Activity Classification with Cyclostationarity
Research Experience
-
Jan 2026 - Present Research Intern, iQuest Lab, UbiquantWorking on foundation model training, with a recent focus on mid-training for coding LLMs.
-
Sep 2025 - Jan 2026 Research Intern, TikTok AI Innovation CenterMentored by Qian Liu; worked on LLM agents, coding-related language model research, and related systems problems.