Zhi-ning Liu
你好 / Hello / 안녕하세요 / こんにちは / Bonjour / Hola (and more)!
Ph.D. Candidate · University of Illinois Urbana-Champaign (UIUC)
Advised by Prof. Hanghang Tong at IDEA-ISAIL Lab
Advised by Prof. Hanghang Tong at IDEA-ISAIL Lab
I do research and build open-source systems for Data-centric Trustworthy AI, focusing on reliability, faithfulness, and ethics in LLMs, VLMs, and traditional ML systems under real-world challenges such as noisy context, distribution shift, and class imbalance. Published 30+ papers on top venues with 1,000+ citations. Build and maintain research-driven open-source software and projects with 4k+ GitHub stars and 150k+ downloads.
My research sits at the intersection of data quality, model reliability, and trustworthy decision-making.
LLM/VLM Reasoning: Agentic Reasoning, Context Grounding, LLM Mechanistic Interpretability
Trustworthy AI: Data Detoxification, Bias Attribution and Interpretation, FairML, AI Ethics and Alignment
Data Engineering: Data Curation for Learning/Reasoning, Dynamic Data Optimization, Rare-case Generalization
Relational & Temporal Modeling: Automated Model Fusion, Meta Time-series Analysis, Robust Graph Mining
News
May 2026
🏆 Award: 2026 C.W. Gear Outstanding Graduate Student (1 in UIUC CS Dept.).
May 2026
🎉 ICML'26: One paper accepted to ICML 2026.
Apr 2026
💼 Joining Amazon: I will join Amazon as an Applied Scientist starting June 2026 : )
Apr 2026
🎉 ACL'26: 3 papers (2 main 1 findings) accepted to ACL 2026.
Jan 2026
🇧🇷 ICLR'26: 4 papers accepted to ICLR 2026. See you in Brazil!
Oct 2025
👀 VLM Perception: VLMs can see the image, but still may not use it. [PDF]
Aug 2025
May 2025
🏆 Award: Honored to receive the 2025 C.L. and Jane Liu Award! (2 in UIUC CS Dept.) (school website)
May 2025
💼 Intern@Amazon: Back to the Bay Area again.
Jan 2025
🎉 ICLR'25: One paper on test-time adaptation for graph structural shift accepted. [PDF]
May 2024
💼 Intern@Amazon: Starting my Applied Scientist Internship in the Bay Area.
Mar 2024
⚖️ FAccT'24: Group Fairness via Group Consensus, with Eunice Chan. [PDF]
May 2023
🏥 KDD'23: Web-based Long-term Spine Treatment Outcome Forecasting, with Hangting Ye. [PDF]
May 2023
💼 Intern@Amazon: Starting my Applied Scientist Internship in Seattle.
Mar 2022
🎓 Starting Ph.D.@UIUC: I will join Prof. Hanghang Tong's group at UIUC in Fall 2022.
Jan 2022
Apr 2020
📦 Open-source: Awesome-Imbalanced-Learning, a curated list of imbalanced learning resources.
Oct 2019
Jul 2019
🎓 Graduation@JilinU: Received my B.Sc. from Tang Aoqing Honors Program in Science, Jilin University.
Sep 2018
💼 Intern@Microsoft: Starting my internship at Microsoft Research Asia. Supervisors: Dr. Jiang Bian and Dr. Wei Cao.
Education

University of Illinois Urbana-Champaign
Ph.D. in Computer Science · 2022 - 2026 (expected)
Department of Computer Science · Advisor: Prof. Hanghang Tong

Jilin University
M.Eng. in Computer Science · 2019 - 2022
School of Artificial Intelligence · Advisor: Prof. Yi Chang
B.Sc. in Computer Science · 2015 - 2019
Tang-Aoqing Honors Program · Computer Science
Experience

Amazon
Applied Scientist II · Palo Alto, CA · starting June 2026
Amazon Ads · LLM for recommendation
Applied Scientist Intern · Palo Alto, CA · May - Dec 2025
Amazon Ads · Vision language model reasoning reliability -> ICLR 2026, ICML 2026, ACL 2026
Applied Scientist Intern · Palo Alto, CA · May - Aug 2024
Amazon Rufus · RAG-based language model reasoning -> ACL 2025
Applied Scientist Intern · Seattle, WA · May - Aug 2023
Amazon Search · Multi-task learning for partially ordered entity ranking
Microsoft Research
Research Intern · Beijing · Aug 2018 - June 2019
Machine Learning Group · Extreme class-imbalanced learning -> ICDE 2020, NeurIPS 2020
Publications
Publications, preprints, and submissions, sorted by year. See the full list on Google Scholar.
-
Do VLMs Have a Moral Backbone? A Study on the Fragile Morality of Vision-Language Models
ACL Findings 2026Key Insight: VLM moral judgments are fragile when visual evidence and textual cues create subtle ethical tension.
-
MORALISE: A Structured Benchmark for Moral Alignment in Visual Language Models
ICML 2026Key Insight: Structured visual moral scenarios reveal alignment failures hidden by coarse benchmark scores.
-
Agentic Reasoning for Large Language Models
arXiv preprint, 2026Key Insight: A unified taxonomy connects planning, tool use, memory, reflection, and self-improvement in agentic reasoning.
-
Mixture of Sequence: Theme-aware Mixture-of-Experts for Long-Sequence Recommendation
WebConf 2026 OralKey Insight: Theme-aware experts let recommender models specialize over long, shifting user behavior sequences.
-
Seeing but Not Believing: Probing the Disconnect Between Visual Attention and Answer Correctness in VLMs
ICLR 2026Key Insight: Correct visual attention does not guarantee correct visual-language reasoning.
-
Continual Low-Rank Adapters for LLM-based Generative Recommender Systems
ICLR 2026Key Insight: Continually evolving LoRA adapters preserve generative recommendation quality as user data shifts.
-
Language in the Flow of Time: Time-Series-Paired Texts Weaved into a Unified Temporal Narrative
ICLR 2026Key Insight: Time-series and paired text can be aligned into unified temporal narratives for multimodal reasoning.
-
TRQA: Time Series Reasoning Question And Answering Benchmark
In submission, 2026Key Insight: Time-series QA should test compositional reasoning over temporal patterns, not just forecasting accuracy.
-
PLANETALIGN: A Comprehensive Python Library for Benchmarking Network Alignment
ICLR 2026Key Insight: A unified Python library makes network alignment algorithms easier to compare, reproduce, and extend.
-
Flow Matching Meets Biology and Life Science: A Survey
npj Artificial Intelligence 2026Key Insight: Flow matching provides a flexible generative lens for molecular, cellular, and biological modeling tasks.
-
ReMix: Reinforcement Routing for Mixtures of LoRAs in LLM Finetuning
arXiv preprint, 2026Key Insight: Reinforcement routing learns how to select and combine LoRA experts during LLM finetuning.
-
Inference Scaling of LLM Ensembling: Bridging Token Spaces with Token Translation
In submission, 2026Key Insight: Token translation bridges heterogeneous vocabularies so LLM ensembles can scale at inference time.
-
WAPITI: A Watermark for Finetuned Open-Source LLMs
In submission, 2026Key Insight: Finetuned open-source LLMs can retain detectable ownership signals without sacrificing utility.
-
AdaFuse: Adaptive Ensemble Decoding for Large Language Models
ACL 2026 MainKey Insight: Adaptive ensemble decoding fuses multiple LLM outputs during generation for stronger test-time reasoning.
-
Mem-Gallery: Benchmarking Multimodal Long-Term Conversational Memory for MLLM Agents
ACL 2026 MainKey Insight: Multimodal agents still struggle to store, retrieve, and use long-term conversational memory.
-
CLIMB: Class-imbalanced Learning Benchmark on Tabular Data
NeurIPS 2025Key Insight: A standardized benchmark exposes when class-imbalanced tabular methods actually generalize.
-
Breaking Silos: Adaptive Model Fusion Unlocks Better Time Series Forecasting
ICML 2025Key Insight: Sample-level adaptive fusion lets heterogeneous forecasters complement each other instead of competing in silos.
-
SelfElicit: Your Language Model Secretly Knows Where the Relevant Evidence is
ACL 2025 MainKey Insight: LLMs can self-elicit where relevant evidence lies, reducing reliance on external retrieval heuristics.
-
ClimateBench-M: A Multi-Modal Climate Data Benchmark with a Simple Generative Method
CIKM 2025Key Insight: Climate modeling benefits from benchmarks that jointly test time-series, image, and generative signals.
-
Not All Voices Are Rewarded Equally: Probing and Repairing Reward Models across Human Diversity
EMNLP Findings 2025Key Insight: Reward models can encode demographic preference gaps, and targeted repair can reduce those disparities.
-
LLM-RecG: A Semantic Bias-Aware Framework for Zero-Shot Sequential Recommendation
RecSys 2025Key Insight: Semantic group information helps LLM recommenders recognize and reduce bias in zero-shot settings.
-
Matcha: Mitigating Graph Structure Shifts with Test-Time Adaptation
ICLR 2025Key Insight: Test-time adaptation can mitigate graph structure shifts without retraining on the target graph.
-
THeGCN: Temporal Heterophilic Graph Convolutional Network
In submission, 2026Key Insight: Temporal heterophily requires graph convolutions that model changing cross-class neighborhood patterns.
-
BackTime: Backdoor Attacks on Multivariate Time Series Forecasting
NeurIPS 2024 SpotlightKey Insight: Time-series forecasters can be compromised by backdoor triggers embedded in multivariate temporal patterns.
-
AIM: Attributing, Interpreting, Mitigating Data-encoded Unfairness
KDD 2024Key Insight: Data-encoded unfairness can be attributed, interpreted, and mitigated before it becomes model behavior.
-
Class-Imbalanced Graph Learning without Class Rebalancing
ICML 2024Key Insight: Bias-aware graph learning can handle class imbalance without naive resampling or class reweighting.
-
Group Fairness via Group Consensus
FAccT 2024Key Insight: Group consensus offers a practical fairness signal when protected groups disagree in complex ways.
-
Graph Mixup on Approximate Gromov-Wasserstein Geodesics
ICML 2024Key Insight: Gromov-Wasserstein geodesics create topology-aware graph mixup paths for stronger augmentation.
-
Ensuring User-side Fairness in Dynamic Recommender Systems
WWW 2024Key Insight: Dynamic recommenders need fairness constraints that evolve with users, items, and exposure patterns.
-
Hierarchical Multi-Marginal Optimal Transport for Network Alignment
AAAI 2024Key Insight: Hierarchical multi-marginal optimal transport aligns multiple networks through shared structural signals.
-
Taming Over-Smoothing Representation on Heterophilic Graphs
Information Sciences, 2023Key Insight: Heterophilic graphs require representation smoothing to be controlled rather than blindly increased.
-
Web-based Long-term Spine Treatment Outcome Forecasting
KDD 2023Key Insight: Web-based modeling can support long-term spine treatment outcome forecasting from real clinical data.
-
UADB: Unsupervised Anomaly Detection Booster
ICDE 2023Key Insight: A boosting wrapper can improve unsupervised anomaly detectors without relying on anomaly labels.
-
A Survey of Explainable Graph Neural Networks for Cyber Malware Analysis
IEEE BigData 2022Key Insight: Explainable GNN techniques can make cyber malware analysis more transparent and actionable.
-
MESA: Boost Ensemble Imbalanced Learning with Meta-sampler
NeurIPS 2020Key Insight: A meta-sampler can learn how to construct better ensemble training sets for imbalanced data.
-
Self-paced Ensemble for Highly Imbalanced Massive Data Classification
ICDE 2020Key Insight: Self-paced sampling lets ensembles learn from massive imbalanced data from easy cases to harder ones.
Selected Awards & Honors
C.L. and Jane Liu Award (2 in UIUC CS) · UIUC, 2025
Top 10 Honorary Graduates (university's highest honor) · Jilin University, 2022
National Scholarship (top 0.2% nationally) · Ministry of Education of China, 2020
National Scholarship (top 0.2% nationally) · Ministry of Education of China, 2019
Gallery
Fun Facts
🌿 My Name
In Chinese, "Zhi Ning" has a gently feminine feeling: Zhi (芷) means fragrant herb, and Ning (宁) means peace and tranquility. Because of this name, some friends once expected to meet a cute girl before seeing me in person, and were mildly disappointed when reality arrived.
In Chinese, "Zhi Ning" has a gently feminine feeling: Zhi (芷) means fragrant herb, and Ning (宁) means peace and tranquility. Because of this name, some friends once expected to meet a cute girl before seeing me in person, and were mildly disappointed when reality arrived.
🎮 Games
I enjoy nearly every kind of video game, although I am not necessarily good at them: shooters, strategy, 4X, role playing games, roguelikes, racing games, and more. Some of my favorites include Battlefield, Civilization, Stellaris, GTA, The Witcher, DiRT, Homeworld, Metro, BioShock, and Borderlands.
I enjoy nearly every kind of video game, although I am not necessarily good at them: shooters, strategy, 4X, role playing games, roguelikes, racing games, and more. Some of my favorites include Battlefield, Civilization, Stellaris, GTA, The Witcher, DiRT, Homeworld, Metro, BioShock, and Borderlands.
🎨 Making Things
Making things look nice and organized makes me happy. That's why I make good paper figures, tables, and website. I also have a sleek desktop setup. In another life, I might be a designer or a professional organizer. Unfortunately, the former may soon be replaced by AI, while the latter probably still has some time.
Making things look nice and organized makes me happy. That's why I make good paper figures, tables, and website. I also have a sleek desktop setup. In another life, I might be a designer or a professional organizer. Unfortunately, the former may soon be replaced by AI, while the latter probably still has some time.