BYOL-Explore: Exploration with Bootstrapped Prediction

Research

Post it: June 20, 2022
Authors: Zhaohan Daniel Guo, Shantanu Thakoor, Miruna Pîslar, Bernardo Avila Pires, Florent Altché, Corentin Tallec, Alaa Saade, Daniele Calandriello, Jean-Bastien Grill, Yunhao Tang, Michal Valko, Rémi Munos, Mohammad Gheshlaghi Azar, Bilal

Second-person and top-down views of a BYOL-Explore agent solving the Thow-Across level of DM-HARD-8, while pure RL and other baseline exploration methods fail to make progress in Thow-Across.

Curiosity-driven exploration is the active process of seeking new information to enhance the agent’s understanding of the environment. Suppose the agent has learned a model of the world that can predict future events, given the history of past events. The curiosity-based agent can then use the prediction mismatch of the global model as the intrinsic reward to direct its exploration policy toward seeking new information. As such, the agent can then use this new information to improve the global model itself so that it can make better predictions. This iterative process can allow the agent to eventually explore every innovation in the world and use this information to build an accurate global model.

Inspired by his successes bootstrap your own latent (BYOL) – which has been implemented in computer vision, Learning graph representationand representation learning in RL – we propose BYOL-Explore: a conceptually simple but general curiosity-driven artificial intelligence agent for solving hard exploration tasks. BYOL-Explore learns a representation of the world by predicting its own future representation. It then uses the representation-level prediction error as an intrinsic reward to train a curiosity-based policy. Therefore, BYOL-Explore learns a global representation, the dynamics of the world, and a curiosity-based exploration policy altogether by simply optimizing the prediction error at the representation level.

Comparison between BYOL-Explore, Random Network Distillation (RND), Intrinsic Curiosity Module (ICM) and pure RL (no intrinsic reward), in terms of average human normalized score (CHNS).

Despite the simplicity of its design, when applied to DM-HARD-8 suite of challenging 3-D, visually complex and hard-hitting exploration tasks, BYOL-Explore goes beyond typical curiosity-based exploration methods such as Random network distillation (RND) and Intrinsic Curiosity module (ICM), in terms of the mean highest human normalized score (CHNS), measured across all tasks. Remarkably, BYOL-Explore achieved this performance using only a single network trained simultaneously on all tasks, whereas previous work was limited to a single task setting and could only make significant progress on these tasks when provided with human demonstrations specialists.

As further proof of its generality, BYOL-Explore achieves superhuman performance on the ten most difficult exploration Atari gameswhile it has a simpler design than other competing agents, such as e.g Agent 57 and Go-Explore.

Comparison between BYOL-Explore, Random Network Distillation (RND), Intrinsic Curiosity Module (ICM) and pure RL (no intrinsic reward), in terms of average human normalized score (CHNS).

Moving forward, we can generalize BYOL-Explore to highly stochastic environments by learning a probabilistic global model that could be used to generate trajectories of future events. This could allow the agent to model the possible stochasticity of the environment, avoid stochastic pitfalls, and plan exploration.

A way to let robots learn by listening will make them more useful

AI companies are finally being forced to cough up training data

NanoNets AI solution feeds delivery information to Jamix

Why harmonize bank statements? Explain the importance and benefits

Que sont les règles métier ? : The wizard is not complete

Understanding YOLOv5 Loss: A Comprehensive Analysis

Master Advanced Prompt Engineering with LangChain for Context-Aware Language Models

Arduino vs Raspberry Pi: What’s the difference?

Top 20 Generative AI Applications/ Use Cases Across Industries

Top 35+ Finance Interview Questions And Answers

BYOL-Explore: Exploration with Bootstrapped Prediction

Bitcoin Price Prediction as BTC Hits $71k – Time To Buy?

Pepe Price Prediction – Can The Top 3 Meme Coin Flip Shiba Inu?

Looking ahead to the Seoul AI Summit

Introduction of the border security framework

How AI turned a Ukrainian YouTuber into a Russian

AI-generated text and video watermark with SynthID

A way to let robots learn by listening will make them more useful

How Forex Trading Robots Are Transforming Financial Markets

U.S. Awards $504 Million for ‘Tech Hubs’ in Overlooked Regions

Our Picks

A way to let robots learn by listening will make them more useful

How Forex Trading Robots Are Transforming Financial Markets

U.S. Awards $504 Million for ‘Tech Hubs’ in Overlooked Regions

Subscribe to Updates

BYOL-Explore: Exploration with Bootstrapped Prediction

Related Posts