In our recent paper we explore how multi-agent deep reinforcement learning can serve as a model of complex social interactions, such as social rule formation. This new class of models could provide a path to creating richer, more detailed simulations of the world.
People are one highly social species. Relative to other mammals we benefit more from cooperation, but we are also more dependent on it and face greater cooperation challenges. Today, humanity faces numerous cooperative challenges, including preventing conflicts over resources, ensuring access to clean air and drinking water, eradicating extreme poverty and combating climate change. Many of the cooperative problems we face are difficult to solve because they involve complex webs of social and biophysical interactions called socioecological systems. However, people can collectively learn to overcome the collaboration challenges we face. We achieve this through an ever-evolving culture, including rules and institutions that organize our interactions with the environment and with each other.
However, rules and institutions sometimes fail to solve the challenges of cooperation. For example, individuals may overexploit resources such as forests and fisheries, thereby causing their collapse. In such cases, policymakers can write laws to change institutional rules or develop new interventions to change the rules in the hope of bringing about a positive change. But policy interventions do not always work as intended. This is because real-world social-ecological systems are important more complicated from the models we typically use to try to predict the outcomes of policy candidates.
Models based on game theory are often applied to the study of cultural evolution. In most of these models, the basic interactions that agents have with each other are expressed in a “payoff matrix.” In a game with two players and two actions A and B, a payoff table sets the value of the four possible outcomes: (1) we both choose A, (2) we both choose B, (3) I choose A while you choose B and (4) I choose B while you choose A. The most famous example is the ‘Prisoner’s Dilemma’, in which the actions are interpreted as ‘cooperate’ and ‘defect’. Rational agents acting in their own myopic self-interest are doomed to be reduced to the Prisoner’s Dilemma, even though the best outcome of mutual cooperation is available.
Game theory models have been widely applied. Researchers in various fields have used them to study a wide range of different phenomena, including economies and the evolution of human civilization. However, game theory is not a neutral tool, rather it is a deeply opinionated modeling language. It imposes a strict requirement that everything must eventually be redeemed in terms of the paytable (or equivalent representation). This means that the shaper must know, or be willing to assume, everything about how the effects of individual actions combine to create incentives. This is sometimes appropriate, and the game theory approach has had many notable successes, such as in modeling the behavior of oligopolistic firms and the international relations of the cold war era. However, the main weakness of game theory as a modeling language is exposed in situations where the modeler does not fully understand how individuals’ choices combine to produce payoffs. Unfortunately this tends to be the case with social-ecological systems because their social and ecological parts interact in complex ways that we do not fully understand.
The work we present here is an example in a research program that attempts to create an alternative modeling framework, distinct from game theory, for use in the study of social-ecological systems. Our approach can be formally considered a variety agent-based modeling. However, its hallmark is the incorporation of algorithmic elements from artificial intelligence, specifically multi-factor deep reinforcement learning.
The basic idea of this approach is that each model consists of two interrelated parts: (1) a rich, dynamic model of the environment and (2) a model of individual decision making.
The first takes the form of a researcher-designed simulator: an interactive program that takes in a current environmental state and agent actions, and outputs the next environmental state as well as all agents’ observations and their instantaneous rewards. The model of individual decision-making also depends on the state of the environment. It is one means which learns from its past experience by performing a form of trial and error. An agent interacts with an environment by receiving observations and taking actions. Each agent chooses actions according to its behavior policy, a mapping from observations to actions. Agents learn by changing their policy to improve it in any desired dimension, usually to obtain more reward. The policy is stored in a neural network. Agents learn “from the ground up”, from their own experience, how the world works and what they can do to earn more rewards. They achieve this by adjusting the weights of their network in such a way that the pixels they receive as observations are gradually converted into competent actions. Multiple learning agents can reside in the same environment as each other. In this case the agents are interdependent because their actions affect each other.
Like other agent-based modeling approaches, multi-agent deep reinforcement learning makes it easy to identify models that cross levels of analysis that would be difficult to handle with game theory. For example, actions may be much closer to low-level primitive motors (e.g., “walk forward,” “turn right”) than the high-level strategic decisions of game theory (e.g., “cooperate” ). This is an important feature needed to capture situations where agents need to practice to effectively learn how to do this implement their strategic choices. For example in one study, the agents learned to work together by taking turns cleaning a river. This solution was possible only because the environment had spatial and temporal dimensions in which agents have great freedom in how they structure their behavior toward each other. Interestingly, while the environment allowed many different solutions (such as territoriality), agents converged on the same solution as human players.
In our latest study, we applied this kind of model to an open question in cultural evolution research: how to explain the existence of false and arbitrary social norms that appear to have no immediate material consequences for their violation beyond those imposed social. For example, in some societies men are expected to wear trousers rather than skirts. In many there are words or gestures that should not be used in polite company. and in most there are rules about how one does one’s hair or what one wears on one’s head. We call these social rules “stupid rules.” Importantly, in our context, enforcement and compliance with social norms must both be taught. Having a social environment that includes a “dumb rule” means that agents have more opportunities to learn about rule enforcement in general. This additional practice then allows them to enforce the important rules more effectively. Overall, the “dumb rule” can be beneficial to the population – a surprising result. This result is only possible because our simulation focuses on learning: rule enforcement and compliance are complex skills that need training to develop.
Part of why we find this result on stupid rules so compelling is that it demonstrates the utility of multi-agent deep reinforcement learning in modeling cultural evolution. Culture contributes to the success or failure of policy interventions for social-ecological systems. For example, strengthening social norms around recycling is part of it solution to some environmental problems. Following this trajectory, richer simulations could lead to a deeper understanding of how to design interventions for social-ecological systems. If the simulations become realistic enough, it may even be possible to test the impact of interventions, e.g. design a tax code that promotes productivity and fairness.
This approach provides researchers with tools to specify detailed models of phenomena of interest. Of course, like all research methodologies, it should be expected to have its own strengths and weaknesses. We hope to discover more about when this style of modeling can be fruitfully applied in the future. While there are no panaceas for modeling, we believe there are compelling reasons to look to multi-agent deep reinforcement learning when building models of social phenomena, especially when they involve learning.