In our recent paper, we investigate how populations of deep reinforcement learning (deep RL) agents can learn microeconomic behaviors such as production, consumption, and trade of goods. We find that artificial agents learn to make economically rational decisions about production, consumption, and prices, and to react appropriately to changes in supply and demand. The population converges on local prices that reflect nearby resource abundance, and some agents learn to move goods between these regions to “buy low and sell high.” This work advances the broader research agenda to enhance multi-agent learning by introducing new social challenges for agents to learn how to solve them.
To the extent that the goal of multi-agent learning enhancement research is to eventually produce agents that will operate across the full range and complexity of human social intelligence, the set of domains examined thus far has been woefully lacking. Critical areas where human intelligence excels and people spend significant amounts of time and energy are still missing. The subject of finance is one such area. Our goal in this work is to create transactional and negotiation-based environments for use by researchers in multi-agent reinforcement learning.
Economics uses agent-based models to simulate how economies behave. These agent-based models are often based on economic assumptions about how agents should act. In this paper, we present a multi-agent simulation world where agents can learn economic behaviors from scratch in ways familiar to any Microeconomics 101 student: decisions about production, consumption, and prices. But our agents must also make other choices that flow from a more physically embodied way of thinking. They have to navigate a natural environment, find trees to collect fruits and partners to trade them. Recent advances in deep RL techniques now make it possible to create agents that can learn these behaviors on their own, without requiring a programmer to code domain knowledge.
Our environment, called Fruit market, is a multiplayer environment where agents produce and consume two types of fruit: apples and bananas. Each agent is capable of producing one type of fruit but has a preference for the other – if agents can learn to trade goods, both parties would be better off.
In our experiments, we demonstrate that current deep RL agents can learn to trade, and their behaviors in response to supply and demand changes align with what microeconomic theory predicts. We then build on this work to present scenarios that would be very difficult to solve using analytical models, but which are simple for our deep RL agents. For example, in environments where each type of fruit grows in a different region, we observe the emergence of different price ranges related to local fruit abundance, as well as the subsequent learning of arbitrage behavior by some agents, who begin to specialize in transporting fruit between them of the areas.
The field of agent-based computational economics uses similar simulations for economic research. In this work, we also demonstrate that state-of-the-art deep RL techniques can flexibly learn to act in these environments from their own experience, without having to have embedded economic knowledge. This highlights the recent progress of the reinforcement learning community in multi-agent RL and deep RL, and demonstrates the potential of multi-agent techniques as tools to advance simulated economic research.
Like path to artificial general intelligence (AGI), research to enhance multi-agent learning should encompass all critical domains of social intelligence. However, so far it has not incorporated traditional economic phenomena such as trade, bargaining, specialization, consumption and production. This paper fills this gap and provides a platform for further research. To aid future research in this area, the Fruit Market environment will be included in its next release Crucible suite of environments.