Meta-control of social learning strategies
Social learning, by means of copying the behaviors of others, has evolved in nature to reduce the cost of learning. However, it raises a fundamental issue of identifying individuals with reliable knowledge. Social learning strategies emerged to provide heuristics in the selection of the individuals to copy their behaviors .
Here, we designed a binary decision-making task with low and high uncertainty, illustrated in Figure 1, to perform an analysis on individual learning and two social learning strategies referred as success-based and conformist. The former is based on copying the behavior of the most successful individual; whereas, the latter is based on copying the behavior of the majority.
Figure 1: Individual, success-based and conformist social learning on the binary decision-making task with low and high uncertainty. In this task, there are two actions, that provide rewards based on Gaussian distributions with specified means and standard deviations. There is a population of individuals that iteratively perform a selection on these options and aim to learn to choose the option that provides the highest reward. At some point during this learning process, an environment change occurs by altering the parameters of the reward distributions. The uncertainty in the environment arises based on the overlap between the reward distributions.
Individuals can adopt one of the learning strategies: individual learning, success-based and conformist social learning. Individual learning is modeled as a reinforcement learning agent (using epsilon-greedy algorithm) to explore the options and learn by experience based on the rewards returned by the selection of the options. Consequently, it has a learning cost. Social learning on the other hand, does not have the learning cost since it is based on copying.
We show here that (see Figure 1b) while the success-based strategy fully exploits the benign environment of low uncertainly, it fails in environments with high uncertainty. This is due to the fact that, in environment with high uncertainty, success-based copying is not reliable since the individual with the highest reward may be performing the action with the lower average reward. On the other hand, the conformist strategy can effectively mitigate this adverse effect.
Meta-control is based on the premise that different learning strategies such as Pavlovian, model-based and model-free learning in the brain, have different sensitivities to uncertainty concerning the perceived reward associations with actions . Inspired by this capability of the brain and motivated by our analysis between individual, success-based and conformist strategies, we hypothesized that meta-control of individual and social learning strategies can provide effective and sample-efficient learning in environments with various frequency of change and uncertainty.
To test this hypothesis, we propose meta-social learning to arbitrate between individual, success-based and conformist strategies to provide optimum learning. We modeled meta-social learning using rules based on our analysis as well as various other approaches including reinforcement learning, genetic algorithms and neuroevolution. We then tested these models on various environments with changing reward distributions. Depending on the configurations of the reward distributions, the uncertainty of the environment is also varied.
Figure 2: On the left, performance versus exploration costs of 13 versions of the meta-social learning implementations using various approaches (see  for more details). On the right, the models are ranked based on their average performance (the ranks of the models that did not differ significantly are linked with vertical bars).
Shown in Figure 2, the best performing meta-social learning models (e.g. SL-EC-Conf-Unc) use an environment change and uncertainty mechanism to switch between individual and social learning strategies. It uses individual learning for exploration when an environment change is identified, then switches to a social learning strategy to help the behavior to spread in population and minimize exploration cost. The choice of the type of social learning strategy (i.e. success-based vs conformist) depends on the estimated environment uncertainty. When the estimated uncertainty is low, success-based strategy, otherwise conformist strategy is used. We also used machine learning approaches to learn how to switch between these strategies depending on the environment and observed that the models converged on similar decision-making behaviors.
Figure 3: The evolution of meta-social learning models. The points of environment change are marked with orange arrows.
In addition, we performed an analysis on the evolution of the meta-social learners. In the beginning of the evolutionary process. We randomly initialize a population meta-social learners. At each generation, meta-social learners are selected to be copied to the next generation based on the rewards they collect. Shown in Figure 3, the meta-social learning models that incorporate environment uncertainty show domination in terms of their frequency in the population relative to others. Furthermore, the meta-social learning models that use conformist strategy showed higher performance relative to others in environments with high uncertainty. This indicates that conformist social learning strategy can be a good heuristic for social learning in environments with high uncertainty.
The results imply that meta-social learning facilitates agents to resolve environmental uncertainty with minimal exploration cost by exploiting others’ learning. We believe that the results can have a significant impact in multi-agent learning systems to provide effective and sample-efficient learning in changing and uncertain environments. Furthermore, this analysis may help forming new hypothesis to investigate social decision-making processes in brain.
 Yaman A, Bredeche N, Çaylak O, Leibo JZ, Lee SW. Meta-control of social learning strategies. 2021 (arxiv preprint )
 Rachel L Kendal, Neeltje J Boogert, Luke Rendell, Kevin N Laland, Mike Webster, and Patricia L Jones. Social learning strategies: Bridge-building between fields. Trends in cognitive sciences, 22(7):651–665, 2018.
 Sang Wan Lee, Shinsuke Shimojo, and John P O’Doherty. Neural computations underlying arbitration between model-based and model-free learning. Neuron, 81(3):687–699, 2014.