🦾 The O13 Energy Fleet Coordination Platform - HFMARL for VPPs Technical White Paper
The global electricity system is undergoing a structural transformation. The virtual power plant market is valued at $6.28 billion in 2025 and is projected to reach $45.67 billion by 2035, growing at 22.61% annually [1]. By aggregating thousands of small-scale distributed energy resources (DERs), Virtual Power Plants (VPPs) produce a large enough power profile to buy and sell energy, otherwise impossible to trade individually. VPPs then control the power generation and consumption aspects of the DERs to balance out supply and demand of renewable energy [2].
Yet the software coordinating these assets has not kept pace with their deployment. One VPP may aggregate tens of thousands of assets across multiple grid regions, while existing solvers lack proper support for such a scale and thus face severe real-time computational bottlenecks [2]. Another issue arises with relation to asset data strictly belonging to individual owners, i.e. to the homeowners, businesses and fleet operators. Because the GDPR explicitly covers smart meter data as personal data, with dedicated guidance from the EU Smart Grid Task Force Expert Group 2 [3], centralising raw consumption data across owners is legally prohibited in the EU and commercially unacceptable elsewhere. Lastly, a typical grid-scale battery loses 20–30% of its capacity in its first decade. In merchant markets, this liability can cost tens of millions of dollars per project [4]. According to ACCURE Battery Intelligence, SOC inaccuracies alone cost BESS operators more than $1 million per GWh annually.
The result of all these challenges is a widening gap between what VPP operators promise to the grid and what their fleets can physically deliver.
- Multi-agent reinforcement learning (MARL) has emerged as the leading approach to VPP coordination, with recent results demonstrating 18.73% cost reduction and 22.46% profit increase over model predictive control and Stackelberg game baselines [5].
- Safety-enhanced MARL for EV VPPs has achieved 45% reduction in voltage violations and 10% operational cost reduction [6].
However, existing MARL approaches for VPPs are either centralised, flat, synchronous or physics-unaware. As of May 2026, no published framework combines hierarchical federation, asynchronous aggregation, physics-informed policies and heterogeneous algorithm support for VPP coordination, to our knowledge.
Therefore, we propose O13 Energy Fleet Coordination Platform is a five layer architecture, implementing currently actively researched machine learning (ML) concepts with proven results.
Introduction to O13 Energy Fleet Coordination Platform
O13 Energy Fleet Coordination Platform is a coordination platform built on Hierarchical Federated Multi-Agent Reinforcement Learning (HFMARL) with physics-informed neural network (PINN) integration.
The platform treats VPP coordination as what it fundamentally is: a multi-agent problem where dozens of regional aggregators must learn, in real time, how to dispatch thousands of diverse assets to collectively deliver market commitments. Rather than replacing existing VPP infrastructure, the platform provides the intelligence layer that sits between market bidding and physical dispatch: it learns optimal coordination strategies through federated reinforcement learning, enforces physical constraints through embedded electrochemical and thermal models, estimates hidden asset health through inverse physics inference and preserves data privacy.
Figure 1: The O13 Energy Fleet Coordination Platform - General Overview
The following will be a technical layer by layer breakdown of the infrastructure aimed to make VPPs as efficient as possible, as per Figure 2.
Layer 5 translates market conditions and grid operator signals into structured dispatch targets for the hierarchical coordinator.
- Following L2M2 [7], the LLM in O13 operates as a zero-shot high-level planner that decomposes complex, multi-horizon market participation into actionable sub-goals.
- LLM-guided MARL achieves superior performance while requiring less than 20% of the training samples compared to baseline methods [7].
Generally speaking, the LLM is there to answer and coordinate the following questions and concerns:
- Which market products to bid into, at what volumes and prices,
- How to group assets into clusters by type, region, or grid connection point,
- Defining the reward structure and adapting to market conditions, and
- Selecting RL algorithm class per asset type based on action space characteristics.
Layer 4 manages the VPP's cluster topology, assigns assets to aggregator agents, defines the Dec-POMDP formulation and shapes the reward structure based on market signals and grid constraints.
The coordinator instantiates the FMARL decision-making environment [8] :
where
- is the set of aggregator agents,
- is the local state, i.e. asset fleet status, local grid conditions, battery health distribution,
- is the dispatch action space, i.e. setpoint allocation across fleet,
- is the reward combining individual cost and collective service delivery, and
- is the discount factor.
Each aggregator agent optimises the hybrid objective:
where
- is the individual aggregator's cost & revenue objective, and
- captures the VPP's collective grid service delivery obligation.
is set by Layer 5 based on market conditions:
- during high-price periods, shifts toward individual revenue, and
- during grid emergency events, shifts toward collective delivery reliability.
The design in Figure 2 was partially validated by FRESCO [9], who demonstrated that hierarchical RL with federated learning enables cooperative energy optimisation while protecting participant privacy.
*Figure 2: The O13 Energy Fleet Coordination Platform - Technical Overview*
Layer 3 is the core learning and aggregation engine. Aggregator agents learn coordinated dispatch policies while preserving asset owner privacy.
Intra-VPP Aggregation (Centralised)
Regional aggregator agents submit asynchronous model updates to the market agent. The market agent aggregates using FedBuff-style buffered asynchronous aggregation [10] :
where
- is the set of buffered updates, and
- weights are based on each aggregator's update frequency, satisfying .
Faster aggregators contribute more frequently than slower ones, without either blocking the other.
Staleness is managed via AFedPG's delay-adaptive lookahead [11]. At the -th iteration with delay :
AFedPG achieves sample complexity with linear speedup in the number of agents, and improves time complexity from synchronous to asynchronous [11].
Inter-VPP Aggregation (Decentralised)
When multiple VPP operators share learned market strategies without revealing portfolio composition, the market agents exchange parameters peer-to-peer using adaptive mixing from RSM-MASAC [12] :
Improvement is guaranteed when the MERL advantage is positive:
The mixing metric is bounded via the Fisher Information Matrix:
where
- measures the local curvature of policy space [12].
Layer 2 provides physics-aware intelligence at the aggregator level.
It has three main functions:
- forward prediction to answer what will happen if it is dispatched this way,
- inverse estimation: what is the true health of the owner's assets, and
- safety enforcement to prevent physically damaging commands.
Forward PINN: Battery Electrochemistry, Thermal Dynamics & Power Flow Constraints
The state-of-charge dynamics of lithium-ion batteries follow well-known electrochemical models.
- These physics constraints prevent the RL agent from learning dispatch policies that:
- Discharge batteries below safe voltage thresholds,
- Charge at rates that would cause lithium plating,
- Cycle batteries in patterns that accelerate capacity fade, and
- Assume instant charge & discharge.
Inverse PINN for Battery Health Estimation
Here, a parameterised PINN over the key aging-related parameter space accurately predicts internal battery variables and identifies internal parameters in approximately 30 seconds, achieving a 47× speedup over finite volume methods, while improving SoH estimation accuracy by at least 60.61% compared to models without parameter incorporation [13].
When one aggregator's inverse PINN discovers that a specific battery chemistry degrades faster under a particular charge pattern, this knowledge propagates through federation to other aggregators managing the same chemistry without sharing raw operational data.
- The federated inverse PINN creates a distributed, privacy-preserving battery health monitoring system across the entire VPP fleet.
Layer 1 encompasses simple controllers executing setpoints from Layer 2.
Each asset type exposes a standard interface:
| Asset Type | Control Interface | State Report |
|---|---|---|
| Home battery | Charge/discharge setpoint (kW) | SoC, voltage, current, temperature |
| Commercial BESS | Power setpoint + reactive power | SoC, cell voltages, temperatures, SoH |
| EV charger | Charge rate (kW), V2G enable/disable | Connected/not, SoC, departure estimate |
| Heat pump | On/off/modulate, setpoint temperature | Indoor temp, outdoor temp, power draw |
| Solar inverter | Curtailment %, reactive power setpoint | Generation (kW), irradiance |
Table 1: Asset Types in Layer 1 of the O13's Architecture
The coordination of distributed energy resources at scale is no longer an academic exercise. It is an operational requirement that the current generation of VPP software cannot meet. Rule-based dispatch and centralised optimisation were designed for a world of dozens of controllable generators, not tens of thousands of heterogeneous, privately-owned, degradation-sensitive assets spread across distribution networks with varying communication quality. The O13 HFMARL platform addresses these failures simultaneously through a five-layer architecture that combines adaptive multi-agent coordination, privacy-preserving federation, physics-informed control and inverse diagnostics. As the global VPP market grows, the operators who coordinate their fleets with adaptive, physics-aware, privacy-preserving intelligence will capture disproportionate value.
References
[1] L. Narayan, “Virtual Power Plant (VPP) market size to hit USD 45.67 billion by 2035,” Apr. 16, 2026. https://www.precedenceresearch.com/virtual-power-plant-market
[2] J. Li, C. Wang, and Y. Liu, "AI-driven virtual power plants: A comprehensive review," Energies, vol. 19, no. 4, art. 1084, 2026, doi: 10.3390/en19041084
[3] “Expert Group 2 – Regulatory recommendations for privacy, data protection and cyber-security in the smart grid environment- working group,” Energy, Dec. 12, 2016. https://energy.ec.europa.eu/publications/expert-group-2-regulatory-recommendations-privacy-data-protection-and-cyber-security-smart-grid_en
[4] A. Bonner, “‘Unfettered optimism’ on US BESS degradation hits wall of operational reality,” Energy-Storage.News, Apr. 15, 2026. https://www.energy-storage.news/unfettered-optimism-on-us-bess-degradation-hits-wall-of-operational-reality/
[5] J.-D. Yao, W.-B. Hao, Z.-G. Meng, B. Xie, J.-H. Chen, and J.-Q. Wei, "Adaptive multi-agent reinforcement learning for dynamic pricing and distributed energy management in virtual power plant networks," J. Electron. Sci. Technol., 2024, doi: 10.1016/j.jnlest.2024.100290
[6] C. Huang, J. Fan, W. Wang, and H. Wang, "Safe decentralized operation of EV virtual power plant with limited network visibility via multi-agent reinforcement learning," in Proc. IEEE Power Energy Soc. Gen. Meeting, 2026, arXiv:2604.03278
[7] M. Geng, S. Pateria, B. Subagdja, L. Li, X. Zhao, and A.-H. Tan, "L2M2: A hierarchical framework integrating large language model and multi-agent reinforcement learning," in Proc. Int. Joint Conf. Artif. Intell. (IJCAI), 2025
[8] Y. Jing, B. Guo, N. Li, R. Xu, and Z. Yu, "Federated multi-agent reinforcement learning: A comprehensive survey of methods, applications and challenges," Expert Syst. Appl., vol. 293, 2025, doi: 10.1016/j.eswa.2025.128729
[9] N. M. Cuadrado, R. A. Guillén, and M. Takáč, "FRESCO: Federated reinforcement energy system for cooperative optimization," in Proc. Int. Conf. Learn. Representations (ICLR), Tiny Papers Track, 2024, arXiv:2403.18444
[10] J. Nguyen, K. Malik, H. Zhan, A. Yousefpour, M. Rabbat, M. Malek, and D. Huba, "Federated learning with buffered asynchronous aggregation," in Proc. Int. Conf. Artif. Intell. Statist. (AISTATS), 2022, arXiv:2106.06639
[11] G. Lan, D.-J. Han, A. Hashemi, V. Aggarwal, and C. G. Brinton, "Asynchronous federated reinforcement learning with policy gradient updates: Algorithm design and convergence analysis," in Proc. Int. Conf. Learn. Representations (ICLR), 2025, arXiv:2404.08003
[12] X. Yu, R. Li, C. Liang, and Z. Zhao, "Communication-efficient soft actor-critic policy collaboration via regulated segment mixture," arXiv:2312.10123, 2024
[13] X. Gu, X. Huan, Y. Ren, W. Zhou, W. Jiang, and Z. Song, "Real-time physics-aware battery health monitoring from partial charging profiles via physics-informed neural networks," eTransportation, vol. 28, art. 100555, 2026, doi: 10.1016/j.etran.2026.100555