British autonomous vehicle startup Wayve has developed LINGO-2, an advanced AI system that integrates language understanding with visual data to improve autonomous driving capabilities while explaining its decisions in real-time.
LINGO-2 is a VLAM that combines vision and language capabilities. It processes images from vehicle cameras and integrates this data with road speed rules to provide a continuous commentary on its driving decisions.
The model can explain actions like slowing down for pedestrians or executing an overtaking maneuver. It can also respond to commands such as "pull over" or "turn right," as well as predict and answer questions about its driving decisions.
Wayve conducted the first VLAM test on a public road in central London. LINGO-2 successfully navigated a route, changed lanes, adjusted speed to traffic conditions, safely passed a bus, and stopped at red lights.
LINGO-2 combines Wayve's vision model, which processes camera images into tokens, with an auto-regressive language model. The language model predicts driving trajectories and produces commentary text, guiding the car's controller in the next driving action.
Wayve sees the combination of visual and language capabilities as opening new possibilities for accelerated learning with natural language. They believe that natural language interfaces could help users understand and trust autonomous systems better. Wayve plans to further research the safety of controlling the car's behavior using language, following tests in their virtual simulator platform, Ghost Gym, and real-world experiments.
Founded in 2017, Wayve has garnered support from notable backers including Microsoft, online supermarket Ocado, Virgin Group founder Sir Richard Branson, and the inventor of the internet Sir Tim Berners-Lee.