What is a physics-informed neural network (PINN)?

A physics-informed neural network is a neural network whose loss function includes the residual of governing differential equations (ODEs or PDEs), so that the model learns from both sensor data and the known physics of the system. PINNs generalize from sparse data where purely data-driven models memorize.

What problems does Astraea Intelligence solve?

Astraea closes the sim-to-real gap in robotics. Simulators use approximated contact, linearized friction, and simplified dynamics, so policies trained in simulation fail on real hardware. Astraea builds physics-informed dynamics models from your governing equations and sparse sensor data, ready to drop into MPC, RL, or state-estimation pipelines.

Is Astraea open source?

Yes. The Astraea Core PINN engine and equation library are open source under the Apache 2.0 license. The Astraea Pro tier — automated architecture search, managed GPU training, and production deployment integrations — is commercial.

Which equations does Astraea support?

Astraea ships validated templates for Fossen 6-DOF underwater dynamics, Cosserat rods, Navier–Stokes, advection–diffusion, heat equation, linear elasticity, and reaction–diffusion. It also supports arbitrary custom ODE/PDE systems.

Which deployment targets are supported?

Astraea exports to CasADi for MPC, Drake, ROS 2, ONNX, and TorchScript. Trained surrogates run at 100+ Hz on GPU.

Literature

🪛 RSM-MASAC — Adaptive Decentralised Federated MARL

Literature Review

Yu et al. [1] proposed RSM-MASAC, a decentralised federated MARL framework that extends SAC to multi-agent settings with peer-to-peer communication. The framework was validated in mixed-autonomy traffic control scenarios, where it approached the converged performance of centralised FMARL while eliminating the single point of failure inherent to centralised architectures.

In RSM-MASAC, $N$ agents each run a local SAC instance. Agents train independently and periodically exchange only their policy parameters $\theta$ with neighbours. A communication round is initiated every $U$ policy updates, at which point agent $i$ receives policy parameters from its neighbour set $\Omega_i$ and aggregates them using an adaptive function $f(\cdot)$ . The communication phase is formulated as a constrained optimisation that minimises transmission cost subject to a cumulative reward threshold, with agents mixing their local parameters with an aggregated referential policy via a regulated mixing metric $\zeta$ (see [1] for the full formulation).

The central insight of [1] is that naive parameter averaging, the standard approach in decentralised federated learning:

\theta_{\text{new}}^{(i)} = \frac{1}{|\Omega_i| + 1} \left( \theta^{(i)} + \sum_{j \in \Omega_i} \theta^{(j)} \right) \qquad (1)

provides no guarantee of policy improvement, since not all neighbours necessarily have superior policies. RSM-MASAC replaces this with an adaptive mixing approach. The mixed policy distribution is defined as:

\pi_{\text{mix}}(a \mid s) = (1 - \beta)\pi(a \mid s) + \beta \tilde{\pi}(a \mid s) \qquad (2)

where agent $i$ retains $(1-\beta)$ weight on its own policy $\pi$ and borrows $\beta$ from a referential policy $\tilde{\pi}$ constructed from neighbours' parameters.

To ensure that this mixing actually improves performance, [1] establishes two theoretical results under the MERL framework. Theorem 1 provides a lower bound on the performance gain from mixing, showing that improvement is guaranteed when the referential policy has positive advantage under MERL, after accounting for a distribution shift penalty quadratic in $\beta$ and an entropy bonus from the Jensen-Shannon divergence between policies. The key condition is that the MERL advantage of the referential policy must be positive:

A_{\pi}^{+}(\tilde{\pi}) := \mathbb{E}_{s \sim d^{\pi}, a \sim \tilde{\pi}}[A^{\pi}(s,a) + \alpha H(\tilde{\pi}(\cdot \mid s))] > 0 \qquad (3)

This means the referential policy must either select better actions, maintain higher entropy, i.e. being more exploratory, or both.

Theorem 2 converts this result to parameter space: given positive MERL advantage, an agent can guarantee improvement by updating to mixed parameters $\theta_{\text{mix}}$ provided the mixing metric satisfies $0 < \zeta < [2A_{\pi}^{+}(\tilde{\pi}) / C(\tilde{\theta}-\theta)^{\top}F(\theta)(\tilde{\theta}-\theta)]^{1/2}$ , where $F(\theta)$ is the Fisher Information Matrix:

F(\theta) = \mathbb{E}_{s \sim d^{\pi}, a \sim \pi_{\theta}} \left[ \frac{\partial \log \pi_{\theta}(a|s)}{\partial \theta} \left( \frac{\partial \log \pi_{\theta}(a|s)}{\partial \theta} \right)^{\top} \right] \qquad (4)

The Fisher Information Matrix measures the local curvature of the policy space, i.e. how sensitive the policy distribution is to parameter changes, and serves to convert the distribution-space improvement bound into a parameter-space constraint on $\zeta$ .

A further practical contribution of [1] is that agents transmit parameter segments rather than full models, reducing communication overhead. The process repeats over multiple replicas, each time requesting different segments from different neighbours to ensure diversity in the aggregated referential policy.

However, RSM-MASAC assumes synchronous communication rounds: all agents pause local training simultaneously for the mixing phase. This assumption limits applicability to systems with reliable, low-latency communication.

References

[1] X. Yu, R. Li, C. Liang, and Z. Zhao, "Communication-efficient soft actor-critic policy collaboration via regulated segment mixture," arXiv:2312.10123, 2024.

← All docs