Biography
I am a staff research scientist at Google DeepMind, where I focus on AI safety and alignment. I lead the Amplified Oversight team, tackling the core problem of enabling accurate human supervision of superhuman AI. I was previously a Postdoc at the University of Oxford, in the Oxford Applied and Theoretical Machine Learning (OATML) group, working under Yarin Gal. Prior to that I was a Research Assistant under Owain Evans at the Future of Humanity Institute, University of Oxford and a Visiting Researcher at the Montreal Institute for Learning Algorithms (MILA), under Yoshua Bengio. I also worked as a Data Scientist at ASI Data Science (now Faculty). I completed my PhD in 2017 at Queen Mary University of London in Theoretical Physics, where I worked on string theory and cosmology. Prior to my PhD, I studied Mathematics at the University of Cambridge.
Research Interests
- AI Safety and Alignment
- Scalable (Amplified) Oversight
- AI Debate, Critiques, AutoRaters and Judges
- Large Language Model Fine-tuning
Amplified oversight A major hurdle in making sure AI is developed safely is that as these systems become more advanced than their creators, it's tough to know if their actions are helpful or harmful. The field of "scalable oversight" (AKA amplified oversight) is working to solve this. The idea is to create a way to check an AI's work that's as effective as if a person could understand every reason behind the AI's choices and had endless time to think about the best decision.
The main strategy is to use AI systems to help us understand their own reasoning. For instance, in a "Debate" scenario, two AIs are pitted against each other. Each is tasked with finding mistakes in the other's responses and explaining them to a human judge. The hope is that if one AI creates a subtle error that a person might miss, the other AI will point it out. This allows the human judge to correctly penalize the flawed output, leading to safer and more reliable AI.