Google DeepMind at NeurIPS 2023

Research

Post it: December 8, 2023

Towards more multimodal, robust and general AI systems

Next week marks the start of the 37th annual Conference on Neural Information Processing Systems (NeurIPS), the world’s largest artificial intelligence (AI) conference. NeuroIPS 2023 will be held December 10-16 in New Orleans, USA.

Teams from across Google DeepMind are presenting more than 180 papers at the main conference and workshops.

We will present demonstrations of our cutting-edge AI models for global weather forecasting, material discovery and AI-generated content watermarking. There will also be a chance to hear from the team behind Gemini, our largest and most capable AI model.

Here’s a look at some of the highlights from our research:

Multimodality: language, video, action

UniSim is a universal simulator of real-world interactions.

The AI models created can create paintings, compose music and write stories. But as skilled as these models are in one medium, most struggle to transfer those skills to another. We delve into how generative skills could help learning in different ways. In a spotlight presentation, we show it Diffusion models can be used for image classification without requiring additional training. Diffusion models like Imagen classify images in a more human-like way than other models, relying on shapes rather than textures. Furthermore, we show how fair Predicting captions from images can improve computer vision learning. Our approach outperformed current methods for vision and language tasks and showed greater scalability.

More multimodal models could give way to more useful digital and robot assistants to help people in their daily lives. In a spotlight poster, we create agents that could interact with the digital world like humans — via screenshots and keyboard and mouse actions. Separately, we show that with by leveraging video creation, including subtitles and subtitles, models can convey knowledge predicting video designs for real robot actions.

One of the next milestones could be the creation of realistic experience in response to actions performed by humans, robots and other types of interactive agents. We will present a demo of it UniSim, our universal simulator of real-world interactions. This type of technology could have applications in industries ranging from video games and movies, to training agents for the real world.

Creating safe and understandable artificial intelligence

An artist’s illustration of artificial intelligence (AI). This image depicts AI security research. It was created by artist Khyati Trehan as part of the Visualizing AI project initiated by Google DeepMind.

Large language models can generate impressive responses, but are prone to “hallucinations,” text that looks correct but is fabricated. Our researchers ask whether a method to find a stored event location (location) can allow the event to be processed. Amazingly, they found it Locating an event and editing the location does not edit the event, implying the complexity of understanding and controlling the information stored in LLM. With Tracr, we propose a new way to assess interpretability methods by translating human-readable programs into transformer models. We have open source a version of Tracr to help serve as a ground-truth for evaluating interpretation methods.

When developing and deploying large models, privacy must be built into every step of the way. For training, our teams study how to measure whether Language models memorize data – to protect private and sensitive material. At the same time, the researchers show us how to evaluate the preservation of privacy training with technique that is effective enough for real world use. In another oral presentation, our scientists investigate the limitations of training through ‘student’ and ‘teacher’ models. which have different levels of access and vulnerability in case of attack.

Emerging skills

An artist’s illustration of artificial intelligence (AI). This image imagines Artificial General Intelligence (AGI). It was created by Novoto Studio as part of the Visualizing AI project started by Google DeepMind.

As large models become more capable, our research pushes the boundaries of new capabilities to develop more general AI systems.

While language models are used for general tasks, they lack the exploratory and contextual understanding necessary to solve more complex problems. We present the Reasoning Tree, a new framework for language model inference to help models explore and reason about a wide range of possible solutions. By organizing reasoning and programming as a tree instead of the commonly used flat chains of thought, we demonstrate that a language model is able to solve complex tasks such as the “24 game” with much greater accuracy.

To help people solve problems and find what they’re looking for, AI models must efficiently process billions of unique values. With Feature Multiplexing, a single representation space is used for many different features, allowing large embedding models (LEMs) to scale into products for billions of users.

Finally, with DoReMi we show how using artificial intelligence to automate the mixture of training data types can significantly speed up the training of language models and improve performance on novel and non-obvious tasks.

Fostering a global AI community

We are proud to sponsor NeurIPS and support mentored workshops LatinX to AI, QueerInAIand Women in ML, helping to foster research collaborations and grow a diverse AI and machine learning community. This year, NeurIPS will have a creative track featuring our Visualizing AI project, which commissions artists to create more diverse and accessible representations of artificial intelligence.

If you’re attending NeurIPS, visit our booth to learn more about our cutting-edge research and meet our teams hosting workshops and presenting throughout the conference.

A way to let robots learn by listening will make them more useful

AI companies are finally being forced to cough up training data

NanoNets AI solution feeds delivery information to Jamix

Why harmonize bank statements? Explain the importance and benefits

Que sont les règles métier ? : The wizard is not complete

Understanding YOLOv5 Loss: A Comprehensive Analysis

Master Advanced Prompt Engineering with LangChain for Context-Aware Language Models

Arduino vs Raspberry Pi: What’s the difference?

Top 20 Generative AI Applications/ Use Cases Across Industries

Top 35+ Finance Interview Questions And Answers

Google DeepMind at NeurIPS 2023

What Is Matter? We Explain the New Smart Home Standard (2024)

Google Rolls Back A.I. Search Feature After Flubs and Flaws

Google is bringing a slew of AI-powered software features to Chromebook Plus laptops

17 Best Android Phones (2024): Unlocked, Cheap, Foldable

Looking ahead to the Seoul AI Summit

Can Google Give A.I. Answers Without Breaking the Web?

A way to let robots learn by listening will make them more useful

How Forex Trading Robots Are Transforming Financial Markets

U.S. Awards $504 Million for ‘Tech Hubs’ in Overlooked Regions

Our Picks

A way to let robots learn by listening will make them more useful

How Forex Trading Robots Are Transforming Financial Markets

U.S. Awards $504 Million for ‘Tech Hubs’ in Overlooked Regions

Subscribe to Updates

Google DeepMind at NeurIPS 2023

Multimodality: language, video, action

Creating safe and understandable artificial intelligence

Emerging skills

Fostering a global AI community

Related Posts