
Hi there! I’m Gokul, a PhD candidate in the Robotics Institute at Carnegie Mellon University working on interactive learning from implicit feedback (e.g. imitation learning / RLHF).
My work seeks to answer questions both about why the feedback loops inherent in decision making often make the standard machine learning playbook achieve suboptimal results when learning from implicit feedback, as well as how to surpass the limitations of traditional machine learning techniques via the design of novel algorithms that scale to complex problems in robotics and language modeling.
I work with Drew Bagnell and Steven Wu. I completed my B.S. / M.S. at UC Berkeley, where I worked with Anca Dragan on Learning with Humans in the Loop. I’ve spent summers working on ML @ SpaceX, Autonomous Vehicles @ NVIDIA, Motion Planning @ Aurora, and Research @ Microsoft and @ Google.
In my free time, I like to consume baked goods, go to concerts / art museums, and run / lift. I’m a huge fan of birds (especially lovebirds), books (especially those by Murakami), and bands (especially Radiohead).
Events & News
🌟 March 2025 🌟 - New particularly exciting preprint out on the real value of RL in fine-tuning / RLHF. I gave a talk at Cornell on the paper that might also be of interest.
January 2025 - Three papers accepted to ICLR’25! One on using score matching for inverse RL, one on REFUEL: a deceptively simple multi-turn RLHF algorithm, and one on what the right reset distribution is for speeding up inverse RL.
🌟 November 2024 🌟 - Drew, Steven, and I are co-teaching a course on the algorithmic foundations of interactive learning. If you’d like to understand the fundamental principles beyond imitation (e.g. for robots) and RLHF (e.g. for LLMs), this is the course for you!
September 2024 - Three papers accepted to NeurIPS’24! One on what is fundamentally different about multi-agent imitation learning, one on REBEL: a scalable and theoretically elegant RLHF algorithm, and one on the differences between online and offline preference fine-tuning algorithms.
June 2024 - Four papers accepted to ICML’24! One on SPO: an RLHF algorithm for reconciling diverse preferences, one on a fundamental framework for designing efficient inverse RL algorithms, one on evolving reward shaping terms for RL / Inverse RL, and one on transfer learning.
Research Highlights
All Roads Lead to Likelihood

We explore how the value of RL in fine-tuning / RLHF seems to be fundamentally derived from generation-verification gaps. [Paper] [Talk]
SPO: Self-Play Preference Optimization

We derive a new fundamental algorithm for RLHF that robustly handles the complex, intransitive preferences that often result from aggregating the a diversity of views. [Website] [Paper]
Inverse RL without RL

We derive exponentially faster algorithms for inverse RL by proving that local search, rather than global RL, is "all you need" for imitation. [Website][Paper]
Imitating a Privileged Expert

We formalize what the value of interaction (e.g DAgger) is when imitating an expert with access to privileged information (e.g. simulator state) using techniques from causal inference. [Website][Blog]