Gokul Swamy

Hi there! I’m Gokul, a final-year PhD candidate in the Robotics Institute at Carnegie Mellon University, working on efficient algorithms for interactive learning (e.g., imitation / RL / RLHF).

My research focuses on developing the novel algorithmic paradigms required to build robustly aligned agents that gracefully handle situations unseen in their training data. In other words, rather than giving agents the “fish”, my research focuses on teaching agents to fish. I fuse ideas from RL and game theory to develop principled and scalable algorithms for domains like robotic manipulation and language modeling.

I work with Drew Bagnell and Steven Wu. I completed my B.S. / M.S. at UC Berkeley, where I worked with Anca Dragan on Learning with Humans in the Loop. I’ve spent summers working on ML @ SpaceX, Perception @ NVIDIA, Motion Planning @ Aurora, World Models @ Microsoft and LLMs @ Google.

🌟 I am currently on the job market! 🌟

Events & News

November 2025 - I’m incredibly grateful to be named a Rising Star in Data Science and Robotics and recieve the inaugural CMU RI Outstanding Graduate Teaching Assistant Award!

June 2025 - Two new papers out on learning to search: SAILOR that outperforms diffusion policies trained on 10x as many demos on multi-stage visual manipulation tasks (Spotlight @ NeurIPS ‘25), FOREWARN that allows real robots to avoid semantic failures via VLM verifiers (RSS ‘25, Outstanding Paper at ICML ‘25 Workshop).

March 2025 - New particularly exciting preprint out on the real value of RL in fine-tuning / RLHF. I gave a talk at Cornell on the paper that might also be of interest.

November 2024 - Drew, Steven, and I are co-teaching a course on the algorithmic foundations of interactive learning. If you’d like to understand the fundamental principles behind imitation (e.g. for robots) and RLHF (e.g. for LLMs), this is the course for you!

Gokul Swamy

🌟 I am currently on the job market! 🌟

Events & News

Research Highlights

A Smooth Sea Never Made a Skilled 𝚂𝙰𝙸𝙻𝙾𝚁: Robust Imitation via Learning to Search

All Roads Lead to Likelihood

SPO: Self-Play Preference Optimization

Inverse RL without RL

Of Moments and Matching