Abstract
Learning to communicate in adaptive multi-agent populations introduces instability challenges at the individual and population levels. To develop an effective communication system, a population must converge on a shared and sufficiently stable vocabulary. We explore the factors that affect the symmetry and effectiveness of the communication protocols developed by deep reinforcement learning agents playing a coordination game. We looked at the effects of bottom-driven supervision, agent population size, and self-play (“inner speech”) on the properties of the developed communication systems. To analyse the resulting communication protocols and derive appropriate conclusions, we developed a set of information-theoretic metrics, which has been a major underdevelopment in the field. We found that all the manipulated factors greatly affect the decentralized learning outcomes of the adaptive agents. The populations with more than 2 agents or with a self-play learning mode converge on more shared and symmetric communication protocols than the 2-agent (no self-play) groups. Bottom-driven supervising feedback, in turn, augments the learning results of all groups, helping the agents learning in bigger populations or with self-play to coordinate and converge on maximally homogeneous and symmetric communication systems. We discuss the implications of our results for future work on modeling language evolution with multi-agent reinforcement learning.