Gokul Swamy

Hi there! I’m Gokul, a PhD candidate in the Robotics Institute at Carnegie Mellon University working on interactive algorithms for agentic alignment (e.g. imitation learning / RLHF).

Rather than continuing to build agents that can only learn when humans tell them the answer, I want to build agents capable of reasoning towards their own answers. In other words, rather than giving agents the “fish”, my research focuses on teaching agents to fish: to make progress towards our desired outcomes even when faced with unforeseen obstacles. I often ground my research in robotics and language modeling.

I work with Drew Bagnell and Steven Wu. I completed my B.S. / M.S. at UC Berkeley, where I worked with Anca Dragan on Learning with Humans in the Loop. I’ve spent summers working on ML @ SpaceX, Autonomous Vehicles @ NVIDIA, Motion Planning @ Aurora, and Research @ Microsoft and @ Google.

In my free time, I like to consume baked goods, go to concerts / art museums, and run / lift. I’m a huge fan of birds (especially lovebirds), books (especially those by Murakami), and bands (especially Radiohead).

Events & News

🌟 June 2025 🌟 - Two new papers out on learning to search: one that introduces SAILOR: a method that outperforms diffusion policies trained on 10x as much human data on multi-stage visual manipulation tasks, another that allows real robots to avoid complex semantic failures via VLM verifiers (accepted to RSS ‘25).

🌟 March 2025 🌟 - New particularly exciting preprint out on the real value of RL in fine-tuning / RLHF. I gave a talk at Cornell on the paper that might also be of interest.

🌟 November 2024 🌟 - Drew, Steven, and I are co-teaching a course on the algorithmic foundations of interactive learning. If you’d like to understand the fundamental principles beyond imitation (e.g. for robots) and RLHF (e.g. for LLMs), this is the course for you!

Gokul Swamy

Events & News

Research Highlights

A Smooth Sea Never Made a Skilled 𝚂𝙰𝙸𝙻𝙾𝚁: Robust Imitation via Learning to Search

All Roads Lead to Likelihood

SPO: Self-Play Preference Optimization

Inverse RL without RL

Of Moments and Matching