Yoshua Bengio, a Turing Award-winning pioneer in deep learning, has joined the U.K. government-backed initiative Safeguarded AI as its scientific director. Safeguarded AI aims to develop an advanced AI system capable of monitoring and mitigating risks posed by other AI agents, essentially functioning as a "gatekeeper" for AI safety.
The gatekeeper system is envisioned to monitor and understand interactions between autonomous AI agents, ensuring that they operate within predefined safety parameters. According to a thesis from Safeguarded AI, this system would not only reduce the risks associated with cutting-edge AI but also enable the safe deployment of AI in safety-critical and business-critical applications.
Safeguarded AI has secured a significant $75 million investment from the U.K. government's Advanced Research and Invention Agency, providing financial backing for the next four years. The addition of Bengio, one of the most prominent figures in AI, is a major boost for the project.
Bengio will provide scientific and strategic guidance to program director David "davidad" Dalrymple and the broader Safeguarded AI team. Dalrymple highlighted the synergy between his and Bengio’s technical ideas, noting that their research and development plans are increasingly aligned.
Bengio, who won the 2018 Turing Award along with Geoffrey Hinton and Yann LeCun for their groundbreaking work on deep neural networks, has become one of the most cautious voices in the AI community regarding AI safety. He has recently advocated for global AI regulations that focus on limiting compute power rather than restricting software.
Bengio, along with Dalrymple and others, has co-authored a paper proposing the development of "Guaranteed Safe AI" (GS AI). This approach aims to create rigorous, high-confidence safety guarantees for AI systems, particularly those with high autonomy or those used in safety-critical contexts.
The GS AI framework requires three essential components:
A world model: A mathematical description of how the AI system interacts with the outside world, accounting for uncertainties.
Safety specifications: A mathematical description of acceptable effects that the AI system can produce.
A verifier: A mechanism to provide auditable proof that the AI system meets the safety specifications relative to the world model.
Bengio and his colleagues believe that the GS AI approach could offer feasible safety guarantees while maintaining competitiveness with other AI systems that lack such assurances. They view the various approaches within the GS AI framework as complementary efforts that together form a robust portfolio for ensuring AI safety.
Safeguarded AI has also launched a funding call for organizations interested in applying the gatekeeper AI system to domain-specific applications, such as optimizing energy networks, clinical trials, or telecommunications networks. The goal is to explore how this tool can safeguard products in these critical areas, ensuring that AI technologies can be deployed safely and effectively.