⚙️ Semi-Asynchronous Centralised Federated MARL
Literature Review
A key architectural consideration for HFMARL is the feasibility of asynchronous federated aggregation in multi-agent reinforcement learning.
Yu et al. [1] demonstrate this with FAuNO (Federated Asynchronous Network Orchestrator), a buffered semi-asynchronous centralised federated MARL framework using IPPO with Generalised Advantage Estimation. Two design choices from FAuNO are particularly relevant to the present work.
First, FAuNO federates only the critic network while keeping actor networks local to each agent.
- This separation mitigates policy divergence across agents operating in heterogeneous conditions, while still enabling shared value estimation.
- This principle is directly applicable to HFMARL's intra-cluster aggregation, where follower agents running IPPO benefit from a shared critic without sacrificing behavioural specialisation.
Second, FAuNO adopts a buffered asynchronous aggregation strategy adapted from FedBuff [2].
- Faster agents contribute updates more frequently than slower ones.
- Gradients are buffered at the aggregation server, and newer updates from the same agent replace older ones, with that agent's weight increasing during aggregation.
- The global critic parameters are updated via a weighted average:
where is the set of buffered updates and the weights are calculated based on the number of update steps each agent has performed, satisfying . This ensures that agents with more recent local experience exert greater influence on the global model.
For communication latency estimation across agents, the Shannon-Hartley theorem provides a standard model. The latency for transmitting bits between nodes and is:
where is the channel bandwidth, is the source node's transmission power, is the channel gain, and is the noise power. This model is domain-agnostic and can be instantiated for any communication channel by substituting the appropriate physical parameters.
- However, FAuNO assumes a centralised aggregation server, which introduces a single point of failure and requires all agents to be reachable from a central node.
References
[1] F. Metelo, A. Oliveira, S. Racković, P. Á. Costa, and C. Soares, "FAuNO: Semi-asynchronous federated reinforcement learning framework for task offloading in edge systems," arXiv:2506.02668, 2025
[2] J. Nguyen, K. Malik, H. Zhan, A. Yousefpour, M. Rabbat, M. Malek, and D. Huba, "Federated learning with buffered asynchronous aggregation," in Proc. Int. Conf. Artif. Intell. Statist. (AISTATS), 2022, arXiv:2106.06639