Brief posts I've helped write. Heuristics Considered Harmful: RL With Random Rewards Should Not Make LLMs Reason RLHF 101: A Technical Tutorial on Reinforcement Learning from Human Feedback Causal Confounds in Sequential Decision Making A Unifying, Game-Theoretic Framework for Imitation Learning