ABSTRACT
The performance in multi-agent reinforcement learning (MARL) scenarios has usually been analysed in homogeneous teams with a few choices for the sociality regime (selfish, egalitarian, or altruistic). In this paper we analyse both homogeneous and heterogeneous teams in a variation of sociality regimes in the predator-prey game, using a novel normalisation of the weights so that the sum of all rewards is independent of the sociality regime. We find that the selfish regime is advantageous for both predator and prey teams, and for both homogeneous and heterogeneous teams. In particular, rewards are about 100% higher for the predator team when switching from the egalitarian to selfish regime and more than 400% higher from the altruistic regime. For the prey, the increase is around 40% and 100% respectively. The results are similar for homogeneous and heterogeneous situations. The takeaway message is that any study of homogeneous and heterogeneous cooperative-competitive multi-agent reinforcement learning teams should also take into account the sociality regimes before making conclusions on the preference of any algorithm.
Acknowledgement
We thank the anonymous reviewers for their comments.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Reproducibility and results data availability
Here we give some further details about reproducibility and result data.
The hardware and software configuration where we ran the environment is as follows:
Processor: Intel(R) Xeon(R) Gold 5215 CPU @ 2.50 GHz
Operating system: Ubuntu 18.04
CPU cores: 40
Memory: 125GB
Available memory: 109GB
In the -predators -prey game, there are 100 matches; in a -predators -prey game, there are 400 matches. The sample numbers here are and . In total, experiments with three sociality regimes. With this hardware, a -predators -prey game will take around 50 mins, and -predators -prey will be 70 mins. We ran to processes simultaneously at a time. If things go well, these experiments take around three weeks with this hardware,Footnote1 which means around 100 KW in total.
In compliance with the recommendations of the Science paper (Burnell et al., Citation2023), we include all the results at the instance level in the appendix and further results can be found at: https://github.com/EvaluationResearch/SocialityMultiagent.
Supplementary data
Supplemental data for this article can be accessed online at https://doi.org/10.1080/0952813X.2024.2361408
Notes
1. GPUs didn’t speed up computations with Tensorflow in this environment, so we only used CPUs in the end.