What is a physics-informed neural network (PINN)?

A physics-informed neural network is a neural network whose loss function includes the residual of governing differential equations (ODEs or PDEs), so that the model learns from both sensor data and the known physics of the system. PINNs generalize from sparse data where purely data-driven models memorize.

What problems does Astraea Intelligence solve?

Astraea closes the sim-to-real gap in robotics. Simulators use approximated contact, linearized friction, and simplified dynamics, so policies trained in simulation fail on real hardware. Astraea builds physics-informed dynamics models from your governing equations and sparse sensor data, ready to drop into MPC, RL, or state-estimation pipelines.

Is Astraea open source?

Yes. The Astraea Core PINN engine and equation library are open source under the Apache 2.0 license. The Astraea Pro tier — automated architecture search, managed GPU training, and production deployment integrations — is commercial.

Which equations does Astraea support?

Astraea ships validated templates for Fossen 6-DOF underwater dynamics, Cosserat rods, Navier–Stokes, advection–diffusion, heat equation, linear elasticity, and reaction–diffusion. It also supports arbitrary custom ODE/PDE systems.

Which deployment targets are supported?

Astraea exports to CasADi for MPC, Drake, ROS 2, ONNX, and TorchScript. Trained surrogates run at 100+ Hz on GPU.

HFMARL

🦾 The O13 Energy Fleet Coordination Platform - HFMARL for VPPs Technical White Paper

The global electricity system is undergoing a structural transformation. The virtual power plant market is valued at $6.28 billion in 2025 and is projected to reach $45.67 billion by 2035, growing at 22.61% annually [1]. By aggregating thousands of small-scale distributed energy resources (DERs), Virtual Power Plants (VPPs) produce a large enough power profile to buy and sell energy, otherwise impossible to trade individually. VPPs then control the power generation and consumption aspects of the DERs to balance out supply and demand of renewable energy [2].

Yet the software coordinating these assets has not kept pace with their deployment. One VPP may aggregate tens of thousands of assets across multiple grid regions, while existing solvers lack proper support for such a scale and thus face severe real-time computational bottlenecks [2]. Another issue arises with relation to asset data strictly belonging to individual owners, i.e. to the homeowners, businesses and fleet operators. Because the GDPR explicitly covers smart meter data as personal data, with dedicated guidance from the EU Smart Grid Task Force Expert Group 2 [3], centralising raw consumption data across owners is legally prohibited in the EU and commercially unacceptable elsewhere. Lastly, a typical grid-scale battery loses 20–30% of its capacity in its first decade. In merchant markets, this liability can cost tens of millions of dollars per project [4]. According to ACCURE Battery Intelligence, SOC inaccuracies alone cost BESS operators more than $1 million per GWh annually.

The result of all these challenges is a widening gap between what VPP operators promise to the grid and what their fleets can physically deliver.

Multi-agent reinforcement learning (MARL) has emerged as the leading approach to VPP coordination, with recent results demonstrating 18.73% cost reduction and 22.46% profit increase over model predictive control and Stackelberg game baselines [5].
Safety-enhanced MARL for EV VPPs has achieved 45% reduction in voltage violations and 10% operational cost reduction [6].

However, existing MARL approaches for VPPs are either centralised, flat, synchronous or physics-unaware. As of May 2026, no published framework combines hierarchical federation, asynchronous aggregation, physics-informed policies and heterogeneous algorithm support for VPP coordination, to our knowledge.

Therefore, we propose O13 Energy Fleet Coordination Platform is a five layer architecture, implementing currently actively researched machine learning (ML) concepts with proven results.

Introduction to O13 Energy Fleet Coordination Platform

O13 Energy Fleet Coordination Platform is a coordination platform built on Hierarchical Federated Multi-Agent Reinforcement Learning (HFMARL) with physics-informed neural network (PINN) integration.

The platform treats VPP coordination as what it fundamentally is: a multi-agent problem where dozens of regional aggregators must learn, in real time, how to dispatch thousands of diverse assets to collectively deliver market commitments. Rather than replacing existing VPP infrastructure, the platform provides the intelligence layer that sits between market bidding and physical dispatch: it learns optimal coordination strategies through federated reinforcement learning, enforces physical constraints through embedded electrochemical and thermal models, estimates hidden asset health through inverse physics inference and preserves data privacy. Pasted image 20260601235518.png Figure 1: The O13 Energy Fleet Coordination Platform - General Overview

The following will be a technical layer by layer breakdown of the infrastructure aimed to make VPPs as efficient as possible, as per Figure 2.

Layer 5 translates market conditions and grid operator signals into structured dispatch targets for the hierarchical coordinator.

Following L2M2 [7], the LLM in O13 operates as a zero-shot high-level planner that decomposes complex, multi-horizon market participation into actionable sub-goals.
LLM-guided MARL achieves superior performance while requiring less than 20% of the training samples compared to baseline methods [7].

Generally speaking, the LLM is there to answer and coordinate the following questions and concerns:

Which market products to bid into, at what volumes and prices,
How to group assets into clusters by type, region, or grid connection point,
Defining the reward structure and adapting to market conditions, and
Selecting RL algorithm class per asset type based on action space characteristics.

Layer 4 manages the VPP's cluster topology, assigns assets to aggregator agents, defines the Dec-POMDP formulation and shapes the reward structure based on market signals and grid constraints.

The coordinator instantiates the FMARL decision-making environment [8] :

\Gamma = (I, {S_i}_{i \in I}, {A_i}_{i \in I}, {P_i}_{i \in I}, {R_i}_{i \in I}, \gamma) \qquad (1)

where

$I$ is the set of aggregator agents,
$S_i$ is the local state, i.e. asset fleet status, local grid conditions, battery health distribution,
$A_i$ is the dispatch action space, i.e. setpoint allocation across fleet,
$R_i$ is the reward combining individual cost and collective service delivery, and
$\gamma$ is the discount factor.

Each aggregator agent $i$ optimises the hybrid objective:

\theta_i^* = \arg \max_{\theta_i} \left( \lambda J_i(\theta_i) + (1 - \lambda) J_{\text{global}}(\theta_1, \ldots, \theta_n) \right) \qquad (2)

where

$J_i(\theta_i)$ is the individual aggregator's cost & revenue objective, and
$J_{\text{global}}$ captures the VPP's collective grid service delivery obligation.

$\lambda$ is set by Layer 5 based on market conditions:

during high-price periods, $\lambda$ shifts toward individual revenue, and
during grid emergency events, $\lambda$ shifts toward collective delivery reliability.

The design in Figure 2 was partially validated by FRESCO [9], who demonstrated that hierarchical RL with federated learning enables cooperative energy optimisation while protecting participant privacy.

*Figure 2: The O13 Energy Fleet Coordination Platform - Technical Overview*

Layer 3 is the core learning and aggregation engine. Aggregator agents learn coordinated dispatch policies while preserving asset owner privacy.

Intra-VPP Aggregation (Centralised)

Regional aggregator agents submit asynchronous model updates to the market agent. The market agent aggregates using FedBuff-style buffered asynchronous aggregation [10] :

\hat{w} = w + \sum_{k \in \bar{K}} p_k \nabla w_k \qquad (3)

where

$\bar{K}$ is the set of buffered updates, and
$p_k$ weights are based on each aggregator's update frequency, satisfying $\sum_{k \in \bar{K}} p_k = 1$ .

Faster aggregators contribute more frequently than slower ones, without either blocking the other.

Staleness is managed via AFedPG's delay-adaptive lookahead [11]. At the $k$ -th iteration with delay $\delta_k$ :

\tilde{\theta}_k \leftarrow \theta_k + \frac{1 - \alpha_{k-\delta_k}}{\alpha_{k-\delta_k}}(\theta_k - \theta_{k-1}) \qquad (4)

AFedPG achieves $\mathcal{O}(\epsilon^{-2.5}/N)$ sample complexity with linear speedup in the number of agents, and improves time complexity from synchronous $\mathcal{O}(t_{\max}/N)$ to asynchronous $\mathcal{O}((\sum_{i=1}^{N} 1/t_i)^{-1})$ [11].

Inter-VPP Aggregation (Decentralised)

When multiple VPP operators share learned market strategies without revealing portfolio composition, the market agents exchange parameters peer-to-peer using adaptive mixing from RSM-MASAC [12] :

\pi_{\text{mix}}(a \mid s) = (1 - \beta)\pi(a \mid s) + \beta \tilde{\pi}(a \mid s) \qquad (5)

Improvement is guaranteed when the MERL advantage is positive:

A_{\pi}^{+}(\tilde{\pi}) := \mathbb{E}_{s \sim d^{\pi}, a \sim \tilde{\pi}}[A^{\pi}(s,a) + \alpha H(\tilde{\pi}(\cdot \mid s))] > 0 \qquad (6)

The mixing metric is bounded via the Fisher Information Matrix:

0 < \zeta < \left[\frac{2A_{\pi}^{+}(\tilde{\pi})}{C(\tilde{\theta}-\theta)^{\top}F(\theta)(\tilde{\theta}-\theta)}\right]^{1/2} \qquad (7)

where

$F(\theta)$ measures the local curvature of policy space [12].

Layer 2 provides physics-aware intelligence at the aggregator level.

It has three main functions:

forward prediction to answer what will happen if it is dispatched this way,
inverse estimation: what is the true health of the owner's assets, and
safety enforcement to prevent physically damaging commands.

Forward PINN: Battery Electrochemistry, Thermal Dynamics & Power Flow Constraints

The state-of-charge dynamics of lithium-ion batteries follow well-known electrochemical models.

These physics constraints prevent the RL agent from learning dispatch policies that:
- Discharge batteries below safe voltage thresholds,
- Charge at rates that would cause lithium plating,
- Cycle batteries in patterns that accelerate capacity fade, and
- Assume instant charge & discharge.

Inverse PINN for Battery Health Estimation

Here, a parameterised PINN over the key aging-related parameter space accurately predicts internal battery variables and identifies internal parameters in approximately 30 seconds, achieving a 47× speedup over finite volume methods, while improving SoH estimation accuracy by at least 60.61% compared to models without parameter incorporation [13].

When one aggregator's inverse PINN discovers that a specific battery chemistry degrades faster under a particular charge pattern, this knowledge propagates through federation to other aggregators managing the same chemistry without sharing raw operational data.

The federated inverse PINN creates a distributed, privacy-preserving battery health monitoring system across the entire VPP fleet.

Layer 1 encompasses simple controllers executing setpoints from Layer 2.

Each asset type exposes a standard interface:

Asset Type	Control Interface	State Report
Home battery	Charge/discharge setpoint (kW)	SoC, voltage, current, temperature
Commercial BESS	Power setpoint + reactive power	SoC, cell voltages, temperatures, SoH
EV charger	Charge rate (kW), V2G enable/disable	Connected/not, SoC, departure estimate
Heat pump	On/off/modulate, setpoint temperature	Indoor temp, outdoor temp, power draw
Solar inverter	Curtailment %, reactive power setpoint	Generation (kW), irradiance

Table 1: Asset Types in Layer 1 of the O13's Architecture

The coordination of distributed energy resources at scale is no longer an academic exercise. It is an operational requirement that the current generation of VPP software cannot meet. Rule-based dispatch and centralised optimisation were designed for a world of dozens of controllable generators, not tens of thousands of heterogeneous, privately-owned, degradation-sensitive assets spread across distribution networks with varying communication quality. The O13 HFMARL platform addresses these failures simultaneously through a five-layer architecture that combines adaptive multi-agent coordination, privacy-preserving federation, physics-informed control and inverse diagnostics. As the global VPP market grows, the operators who coordinate their fleets with adaptive, physics-aware, privacy-preserving intelligence will capture disproportionate value.

References

[1] L. Narayan, “Virtual Power Plant (VPP) market size to hit USD 45.67 billion by 2035,” Apr. 16, 2026. https://www.precedenceresearch.com/virtual-power-plant-market

[2] J. Li, C. Wang, and Y. Liu, "AI-driven virtual power plants: A comprehensive review," Energies, vol. 19, no. 4, art. 1084, 2026, doi: 10.3390/en19041084

[3] “Expert Group 2 – Regulatory recommendations for privacy, data protection and cyber-security in the smart grid environment- working group,” Energy, Dec. 12, 2016. https://energy.ec.europa.eu/publications/expert-group-2-regulatory-recommendations-privacy-data-protection-and-cyber-security-smart-grid_en

[4] A. Bonner, “‘Unfettered optimism’ on US BESS degradation hits wall of operational reality,” Energy-Storage.News, Apr. 15, 2026. https://www.energy-storage.news/unfettered-optimism-on-us-bess-degradation-hits-wall-of-operational-reality/

[5] J.-D. Yao, W.-B. Hao, Z.-G. Meng, B. Xie, J.-H. Chen, and J.-Q. Wei, "Adaptive multi-agent reinforcement learning for dynamic pricing and distributed energy management in virtual power plant networks," J. Electron. Sci. Technol., 2024, doi: 10.1016/j.jnlest.2024.100290

[6] C. Huang, J. Fan, W. Wang, and H. Wang, "Safe decentralized operation of EV virtual power plant with limited network visibility via multi-agent reinforcement learning," in Proc. IEEE Power Energy Soc. Gen. Meeting, 2026, arXiv:2604.03278

[7] M. Geng, S. Pateria, B. Subagdja, L. Li, X. Zhao, and A.-H. Tan, "L2M2: A hierarchical framework integrating large language model and multi-agent reinforcement learning," in Proc. Int. Joint Conf. Artif. Intell. (IJCAI), 2025

[8] Y. Jing, B. Guo, N. Li, R. Xu, and Z. Yu, "Federated multi-agent reinforcement learning: A comprehensive survey of methods, applications and challenges," Expert Syst. Appl., vol. 293, 2025, doi: 10.1016/j.eswa.2025.128729

[9] N. M. Cuadrado, R. A. Guillén, and M. Takáč, "FRESCO: Federated reinforcement energy system for cooperative optimization," in Proc. Int. Conf. Learn. Representations (ICLR), Tiny Papers Track, 2024, arXiv:2403.18444

[10] J. Nguyen, K. Malik, H. Zhan, A. Yousefpour, M. Rabbat, M. Malek, and D. Huba, "Federated learning with buffered asynchronous aggregation," in Proc. Int. Conf. Artif. Intell. Statist. (AISTATS), 2022, arXiv:2106.06639

[11] G. Lan, D.-J. Han, A. Hashemi, V. Aggarwal, and C. G. Brinton, "Asynchronous federated reinforcement learning with policy gradient updates: Algorithm design and convergence analysis," in Proc. Int. Conf. Learn. Representations (ICLR), 2025, arXiv:2404.08003

[12] X. Yu, R. Li, C. Liang, and Z. Zhao, "Communication-efficient soft actor-critic policy collaboration via regulated segment mixture," arXiv:2312.10123, 2024

[13] X. Gu, X. Huan, Y. Ren, W. Zhou, W. Jiang, and Z. Song, "Real-time physics-aware battery health monitoring from partial charging profiles via physics-informed neural networks," eTransportation, vol. 28, art. 100555, 2026, doi: 10.1016/j.etran.2026.100555

← All docs