🔧 Briefly, why is MARL so difficult?
🪨 Cases where MARL got to shine as a successful solution have so far been predominantly observed in simulation-only environments.
- The sim-to-real gap is painful to MARL in particular, as collecting the right type of data remains difficult in situ [1].
- Labelled data is expensive: for example, collecting underwater sonar data is limited by overdrifting, noise, and the presence of shadows;
🪨 Deciding how and when to communicate is a major issue between the agents.
- However:
Sharing knowledge through parameters and transfer learning is important in overcoming the scalability issues in MARL. [2]
- mean-field approach, for example
- curriculum learning
- DTDE with deep Q-networks (DQNs)
🪨 Generalization is difficult across scenarios and environments.
- In gaming, a pretrained model that does well with its known set of agents often fails when a human player interacts with it.
🪨 Communication issues, such as bandwidth & latency, need to be addressed early on when designing architecture.
🪨 Data heterogeneity. Understanding, quantifying & developing the proper algorithms to tackle it. [3]
References
[1] A. Labiosa and J. P. Hanna, "Multi-robot collaboration through reinforcement learning and abstract simulation," arXiv:2503.05092, 2025.
[2] Z. Ning and L. Xie, "A survey on multi-agent reinforcement learning and its application," J. Autom. Intell., vol. 3, no. 2, pp. 73–91, 2024, doi: 10.1016/j.jai.2024.02.003.
[3] T. Hu, Z. Pu, Y. Wang, T. Qiu, M. Chen, and X. Yu, "Heterogeneity in multi-agent reinforcement learning," in Proc. Int. Conf. Auton. Agents Multiagent Syst. (AAMAS), 2026, arXiv:2512.22941.