Simulating Users with State Alignment Beats Response Imitation

We build user simulators that accurately reflect real users by generating natural-language latent states aligned with ground-truth responses.

HumanLM Framework Overview
Fig 1. HumanLM generates responses by first producing latent states (stance, emotion, communication style) that align with ground-truth user behavior, then synthesizing responses from these aligned states.
Abstract

Large Language Models are increasingly used to simulate how specific users respond to any context, enabling more user-centric applications. However, existing user simulators mostly imitate surface-level patterns and language styles, which fails to reflect the underlying state of real users. To address these limitations, we propose HumanLM, a novel training framework which builds user simulators that accurately reflect real users. Our key insight is that we generate natural-language latent states that align with the ground truth responses through reinforcement learning.

Citation
@article{wu2026humanlm,
  title={HUMANLM: Simulating Users with State Alignment Beats Response Imitation},
  url={https://humanlm.stanford.edu/},
  author={Wu, Shirley and Choi, Evelyn and Khatua, Arpandeep and
          Wang, Zhanghan and He-Yueya, Joy and Weerasooriya, Tharindu Cyril and
          Wei, Wei and Yang, Diyi and Leskovec, Jure and Zou, James},
  year={2026}
}