About Zhiheng

Hi, I’m Zhiheng. Welcome to my personal website! I’m currently a second-year Master’s student at the University of Waterloo, supervised by Professor Wenhu Chen at the TIGER Lab. My research focuses on AI for Software Engineering, particularly in agentic post-training, benchmarks, and causal reasoning.

I did my undergrad in Computer Science at the University of Hong Kong, where I was active in algorithm competitions—I entered the ICPC World Finals and won two regional gold medals. I’ve had the privilege to work with research groups at Berkeley, ETH Zürich, and University of Michigan.

Currently, I’m focused on post-training for large models using RL-based methods. I’m a core contributor to the open-source framework VerlTool. I’ve also worked as a Research Scientist Intern at MiniMax on software engineering tasks.

Research Areas

Agentic Post-Training

I’m deeply involved in the full pipeline of post-training for software engineering agents. As a core contributor to VerlTool, I develop environment interaction modules and post-training setups for SWE tasks. My work on MiniMax-M2 achieved 69% Pass@1 on SWE-Verified and ranked #2 on MultiSWE and TerminalBench. I’ve designed large-scale SWE data synthesis pipelines generating over 36K verifiable tasks from 5K+ sandbox environments.

I’m also working on BrowserAgent, which focuses on information-seeking tasks through direct browser environment interaction, moving beyond traditional tool-based approaches to enable more natural web navigation and information extraction.

Benchmarks & Evaluation

I believe that as models get stronger, the definition of tasks becomes increasingly important. My benchmark work spans three approaches:

  • Synthesis: Converting existing data (PixelWorld converts textual reasoning to images, Corr2Cause generates causal reasoning problems)
  • Human-in-the-loop: Developing repo-level QA benchmarks with crowdsourced annotation and validation
  • Structural data: Building benchmarks from web pages and GitHub repositories

I’ve contributed to StructEval for structured output evaluation and VideoScore for video generation assessment.

Causal Reasoning & Knowledge Methods

My work explores lightweight ways to enhance LLM capabilities without retraining. At Berkeley, I developed FactTrack for time-aware world state tracking in story outlines, decomposing complex narratives into atomic facts for contradiction detection.

I’ve investigated how large language models understand causal relations through Psychologically-Inspired Causal Prompts, exploring different psychological processes in sentiment classification. The Corr2Cause dataset tests pure causal inference skills of LLMs.

Current Focus

I’m particularly interested in AI for Software Engineering because it combines structural data that’s easy to synthesize, real-world relevance with immediate impact, and strong economic value. My research explores decomposing SWE tasks into skill-specific components: debugging, performance optimization, refactoring, test generation, repository-level QA, and security.

For detailed future research directions, see my Research Statements page. My complete background is in my CV.

Publications

Psychologically-Inspired Causal Prompts.

Published in , 2023

This paper is about a prompting method embedded causal direction and analyze the performance gap of LLMs

Recommended citation: Lyu, Z., Jin, Z., Mattern, J., Mihalcea, R., Sachan, M., & Schoelkopf, B. (2023). Psychologically-Inspired Causal Prompts. arXiv preprint arXiv:2305.01764. https://arxiv.org/pdf/2305.01764

Logical Fallacy Detection

Published in , 2022

This paper is about the a dataset of Logical Fallacy Detection and its baseline model

Recommended citation: Jin, Lalwani, A., Vaidhya, T., Shen, X., Ding, Y., Lyu, Z., Sachan, M., Mihalcea, R., & Schölkopf, B. (2022). Logical Fallacy Detection. https://arxiv.org/abs/2202.13758

Contact

I’m currently seeking opportunities in industry related to AI for Software Engineering. If you have relevant positions or can provide recommendations, I would greatly appreciate it.

Feel free to reach out at z63lyu@uwaterloo.ca for research collaboration, open source projects, job opportunities, or mentorship.