Research
Towards more multimodal, robust and general AI systems
Next week marks the start of the 37th annual Conference on Neural Information Processing Systems (NeurIPS), the world’s largest artificial intelligence (AI) conference. NeuroIPS 2023 will be held December 10-16 in New Orleans, USA.
Teams from across Google DeepMind are presenting more than 180 papers at the main conference and workshops.
We will present demonstrations of our cutting-edge AI models for global weather forecasting, material discovery and AI-generated content watermarking. There will also be a chance to hear from the team behind Gemini, our largest and most capable AI model.
Here’s a look at some of the highlights from our research:
Multimodality: language, video, action
UniSim is a universal simulator of real-world interactions.
The AI models created can create paintings, compose music and write stories. But as skilled as these models are in one medium, most struggle to transfer those skills to another. We delve into how generative skills could help learning in different ways. In a spotlight presentation, we show it Diffusion models can be used for image classification without requiring additional training. Diffusion models like Imagen classify images in a more human-like way than other models, relying on shapes rather than textures. Furthermore, we show how fair Predicting captions from images can improve computer vision learning. Our approach outperformed current methods for vision and language tasks and showed greater scalability.
More multimodal models could give way to more useful digital and robot assistants to help people in their daily lives. In a spotlight poster, we create agents that could interact with the digital world like humans — via screenshots and keyboard and mouse actions. Separately, we show that with by leveraging video creation, including subtitles and subtitles, models can convey knowledge predicting video designs for real robot actions.
One of the next milestones could be the creation of realistic experience in response to actions performed by humans, robots and other types of interactive agents. We will present a demo of it UniSim, our universal simulator of real-world interactions. This type of technology could have applications in industries ranging from video games and movies, to training agents for the real world.
Creating safe and understandable artificial intelligence
An artist’s illustration of artificial intelligence (AI). This image depicts AI security research. It was created by artist Khyati Trehan as part of the Visualizing AI project initiated by Google DeepMind.
Large language models can generate impressive responses, but are prone to “hallucinations,” text that looks correct but is fabricated. Our researchers ask whether a method to find a stored event location (location) can allow the event to be processed. Amazingly, they found it Locating an event and editing the location does not edit the event, implying the complexity of understanding and controlling the information stored in LLM. With Tracr, we propose a new way to assess interpretability methods by translating human-readable programs into transformer models. We have open source a version of Tracr to help serve as a ground-truth for evaluating interpretation methods.
When developing and deploying large models, privacy must be built into every step of the way. For training, our teams study how to measure whether Language models memorize data – to protect private and sensitive material. At the same time, the researchers show us how to evaluate the preservation of privacy training with technique that is effective enough for real world use. In another oral presentation, our scientists investigate the limitations of training through ‘student’ and ‘teacher’ models. which have different levels of access and vulnerability in case of attack.
Emerging skills
An artist’s illustration of artificial intelligence (AI). This image imagines Artificial General Intelligence (AGI). It was created by Novoto Studio as part of the Visualizing AI project started by Google DeepMind.
As large models become more capable, our research pushes the boundaries of new capabilities to develop more general AI systems.
While language models are used for general tasks, they lack the exploratory and contextual understanding necessary to solve more complex problems. We present the Reasoning Tree, a new framework for language model inference to help models explore and reason about a wide range of possible solutions. By organizing reasoning and programming as a tree instead of the commonly used flat chains of thought, we demonstrate that a language model is able to solve complex tasks such as the “24 game” with much greater accuracy.
To help people solve problems and find what they’re looking for, AI models must efficiently process billions of unique values. With Feature Multiplexing, a single representation space is used for many different features, allowing large embedding models (LEMs) to scale into products for billions of users.
Finally, with DoReMi we show how using artificial intelligence to automate the mixture of training data types can significantly speed up the training of language models and improve performance on novel and non-obvious tasks.
Fostering a global AI community
We are proud to sponsor NeurIPS and support mentored workshops LatinX to AI, QueerInAIand Women in ML, helping to foster research collaborations and grow a diverse AI and machine learning community. This year, NeurIPS will have a creative track featuring our Visualizing AI project, which commissions artists to create more diverse and accessible representations of artificial intelligence.
If you’re attending NeurIPS, visit our booth to learn more about our cutting-edge research and meet our teams hosting workshops and presenting throughout the conference.