What would the Expert $do(\cdot)$?:
Causal Imitation Learning
Oral at NeurIPS'21 Safe and Robust Control of Uncertain Systems,
Offline RL, and Causal Sequential Decisions Workshops


Gokul Swamy1
Sanjiban Choudhury2
Drew Bagnell1, 2
Steven Wu3


Robotics Institute, CMU
Aurora Innovation
ISR, CMU




Teaser figure.


We focus in imitation learning in the presence of temporally correlated perturbations (exogenous noise, (a)) or not having access to the full state (endogenous noise, (b)). We formalize both in a graphical model (c) that allows us to leverage a technique known as instrumental variable regression to find a policy that isn't corrupted by spurious correlations introduced by the confounder.


Abstract

We develop algorithms for imitation learning from policy data that was corrupted by unobserved confounders. Sources of such confounding include (a) persistent perturbations to actions or (b) the expert responding to a part of the state that the learner does not have access to. When a confounder affects multiple timesteps of recorded data, it can manifest as spurious correlations between states and actions that a learner might latch onto, leading to poor policy performance. To break up these spurious correlations, we apply modern variants of the classical instrumental variable regression (IVR) technique, enabling us to recover the causally correct underlying policy without requiring access to an interactive expert. In particular, we present two techniques, one of a generative-modeling flavor (DoubIL) that can utilize access to a simulator and one of a game-theoretic flavor (ResiduIL) that can be run entirely offline. We discuss, from the perspective of performance, the types of confounding under which it is better to use an IVR-based technique instead of behavioral cloning and vice versa. We find both of our algorithms compare favorably to behavioral cloning on a simulated rocket landing task.



Video



Paper

Paper thumbnail

What would the Expert do()?: Causal Imitation Learning

Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu

@misc{swamy2021causal,
    title = {What would the Expert do()?: Causal Imitation Learning},
    author = {Gokul Swamy and Sanjiban Choudhury and J. Andrew Bagnell and Zhiwei Steven Wu},
    year = {2021},
    url={https://gokul.dev/causil/}
}


Acknowledgements

This template was originally made by Phillip Isola and Richard Zhang for a colorful ECCV project, and adapted to be mobile responsive by Jason Zhang. The code we built on can be found here.