Publications

This page is out of dated, you can find my articles on my Google Scholar profile.

PixelWorld: Towards Perceiving Everything as Pixels

Published in TMLR, 2025

Converting textual reasoning data into images to probe vision-language model reasoning capabilities

Recommended citation: Lyu, Z., Ma, X., & Chen, W. (2025). PixelWorld: Towards Perceiving Everything as Pixels. Transactions on Machine Learning Research. https://arxiv.org/abs/placeholder

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Published in Technical Report, 2025

Technical report on MiniMax-M1 model with focus on software engineering capabilities and test-time compute scaling

Recommended citation: MiniMax. (2025). MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention. Technical Report. https://arxiv.org/abs/placeholder

FACTTRACK: Time-Aware World State Tracking in Story Outlines

Published in NAACL 2025 (Oral), 2025

A novel approach to tracking dynamic world states and detecting contradictions in story narratives

Recommended citation: Lyu, Z., Yang, K., Kong, L., & Klein, D. (2025). FACTTRACK: Time-Aware World State Tracking in Story Outlines. NAACL 2025. https://arxiv.org/abs/placeholder

BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions

Published in Under Review, 2025

A framework for training web agents that directly interact with browser environments for information-seeking tasks

Recommended citation: Yu, T., Zhang, Z., Lyu, Z., Gong, J., Yi, H., Wang, X., Zhou, Y., Yang, J., Nie, P., Huang, Y., & Chen, W. (2025). BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions. Under Review. https://arxiv.org/abs/placeholder

Can Large Language Models Infer Causation from Correlation?

Published in , 2023

This research introduces the first benchmark dataset, Corr2Cause, to test large language models (LLMs) pure causal inference skills.

Recommended citation: Jin Z, Liu J, Lyu Z, et al. Can Large Language Models Infer Causation from Correlation? arXiv preprint arXiv:2306.05836, 2023. https://arxiv.org/abs/2306.05836

Can Large Language Models Distinguish Cause from Effect?

Published in , 2023

Our paper conducts a post-hoc analysis to check whether large language models can be used to distinguish cause from effect.

Recommended citation: Jin, Lalwani, A., Vaidhya, T., Shen, X., Ding, Y., Lyu, Z., Sachan, M., Mihalcea, R., & Schölkopf, B. (2022). Logical Fallacy Detection. https://openreview.net/forum?id=ucHh-ytUkOH

Psychologically-Inspired Causal Prompts.

Published in , 2023

This paper is about a prompting method embedded causal direction and analyze the performance gap of LLMs

Recommended citation: Lyu, Z., Jin, Z., Mattern, J., Mihalcea, R., Sachan, M., & Schoelkopf, B. (2023). Psychologically-Inspired Causal Prompts. arXiv preprint arXiv:2305.01764. https://arxiv.org/pdf/2305.01764

Logical Fallacy Detection

Published in , 2022

This paper is about the a dataset of Logical Fallacy Detection and its baseline model

Recommended citation: Jin, Lalwani, A., Vaidhya, T., Shen, X., Ding, Y., Lyu, Z., Sachan, M., Mihalcea, R., & Schölkopf, B. (2022). Logical Fallacy Detection. https://arxiv.org/abs/2202.13758

Zhiheng Lyu

Publications