|
Jiamin He
I am a Ph.D. student in Computing Science at the University of Alberta, supervised by Martha White. Currently, I'm interning at Google DeepMind in London, working with Diana Borsa and Hado van Hasselt on foundational reinforcement learning algorithms and their applications in science.
Previously, I received my M.Sc. (Thesis) degree in Computing Science at the University of Alberta under the supervision of Rupam Mahmood. Before that, I also worked with Chongjie Zhang as a research intern.
My research interests lie in reinforcement learning, with a focus on policy optimization, off-policy learning, and representation learning.
Email  / 
Google Scholar  / 
GitHub  / 
Blog
|
|
|
Distributions as Actions: A Unified Framework for Diverse Action Spaces.
Jiamin He, A. Rupam Mahmood, Martha White.
Preliminary version at the Finding the Frame Workshop at RLC, 2025.
ICLR, 2026.
Paper and code coming soon.
|
|
Investigating the Utility of Mirror Descent in Off-policy Actor-Critic.
Samuel Neumann, Jiamin He, Adam White, Martha White.
RLC, 2025.
paper | code
|
|
Deep Policy Gradient Methods Without Batch Updates, Target Networks, or Replay Buffers.
Gautham Vasan, Mohamed Elsayed, Alireza Azimi*, Jiamin He*, Fahim Shariar, Colin Bellinger, Martha White, A. Rupam Mahmood.
NeurIPS, 2024.
paper | code
|
|
Loosely Consistent Emphatic Temporal-Difference Learning.
Jiamin He, Fengdi Che, Wan Yi, A. Rupam Mahmood.
Preliminary version in the average-reward setting at the Deep RL Workshop at NeurIPS, 2022.
UAI, 2023.
paper | code
|
|
Episodic Multi-agent Reinforcement Learning with Curiosity-Driven Exploration.
Lulu Zheng*, Jiarui Chen*, Jianhao Wang, Jiamin He, Yujing Hu, Yingfeng Chen, Changjie Fan, Yang Gao,
Chongjie Zhang.
NeurIPS, 2021.
paper | code
|
|
Revisiting Mixture Policies in Entropy-Regularized Actor-Critic.
Jiamin He, Samuel Neumann, Jincheng Mei, Adam White, Martha White.
Aligning Reinforcement Learning Experimentalists and Theorists Workshop at NeurIPS, 2025.
Extended version under review, 2026.
paper
|
|
Improving Reward-Based Hindsight Credit Assignment.
Aditya A. Ramesh, Jiamin He, Jürgen Schmidhuber, Martha White.
European Workshop on Reinforcement Learning, 2025.
paper
|
|
Consistent Emphatic Temporal-Difference Learning.
Jiamin He.
M.Sc. Thesis, University of Alberta, 2023.
details
|
|