Cislunar Space Beginner's GuideCislunar Space Beginner's Guide
  • Satellite Orbit Simulation
  • Historical Inquiry
Cislunar Glossary
Resources & Tools
Blue Team Research
Space News
AI Q&A
Forum
Home
Gitee
GitHub
  • 简体中文
  • English
  • Satellite Orbit Simulation
  • Historical Inquiry
Cislunar Glossary
Resources & Tools
Blue Team Research
Space News
AI Q&A
Forum
Home
Gitee
GitHub
  • 简体中文
  • English
  • Site map

    • Home (overview)
    • Intro · what is cislunar space
    • Orbits · spacecraft trajectories
    • Frontiers · directions & labs
    • Glossary · terms & definitions
    • Tools · data & code
    • News · space industry archive
    • Topic · blue-team research
  • Cislunar glossary (terms & definitions)

    • Cislunar Space Glossary
    • Fundamentals

      • Allan Deviation (ADEV)
      • Dual One-Way Ranging (DOWR)
      • Einstein Equivalence Principle (EEP)
      • Gravitational Redshift
      • High Altitude Airship (HAA)
      • Near-space
      • Passive Hydrogen Maser (PHM)
      • Stratospheric Airship
      • /en/glossary/fundamentals/absolute-range/
      • /en/glossary/fundamentals/aerodynamic-coefficient/
      • /en/glossary/fundamentals/aerodynamic-moment/
      • /en/glossary/fundamentals/aerospace-vehicle/
      • /en/glossary/fundamentals/ballistic-coefficient/
      • /en/glossary/fundamentals/bi-elliptic-transfer/
      • /en/glossary/fundamentals/body-frame/
      • /en/glossary/fundamentals/celestial-coordinate-system/
      • /en/glossary/fundamentals/celestial-sphere/
      • /en/glossary/fundamentals/characteristic-velocity/
      • /en/glossary/fundamentals/coverage-angle/
      • /en/glossary/fundamentals/earth-ellipsoid/
      • /en/glossary/fundamentals/earth-oblateness-perturbation/
      • /en/glossary/fundamentals/ecef-frame/
      • /en/glossary/fundamentals/energy-parameter/
      • /en/glossary/fundamentals/finite-thrust-maneuver/
      • /en/glossary/fundamentals/free-flight-phase/
      • /en/glossary/fundamentals/free-flight-trajectory/
      • /en/glossary/fundamentals/frozen-orbit/
      • /en/glossary/fundamentals/gaussian-perturbation-equations/
      • /en/glossary/fundamentals/geocentric-inertial-frame/
      • /en/glossary/fundamentals/gps-time/
      • /en/glossary/fundamentals/gravitational-potential/
      • /en/glossary/fundamentals/gravity-turn/
      • /en/glossary/fundamentals/gravity-vs-gravitation/
      • /en/glossary/fundamentals/hit-equation/
      • /en/glossary/fundamentals/hohmann-transfer/
      • /en/glossary/fundamentals/inertial-navigation-system/
      • /en/glossary/fundamentals/instantaneous-balance/
      • /en/glossary/fundamentals/isru/
      • /en/glossary/fundamentals/julian-date/
      • /en/glossary/fundamentals/kepler-equation/
      • /en/glossary/fundamentals/kompsat/
      • /en/glossary/fundamentals/lagrangian-perturbation-equations/
      • /en/glossary/fundamentals/launch-azimuth/
      • /en/glossary/fundamentals/launch-window/
      • /en/glossary/fundamentals/lift-to-drag-ratio/
      • /en/glossary/fundamentals/load-factor/
      • /en/glossary/fundamentals/longitudinal-lateral-motion/
      • /en/glossary/fundamentals/lunar-lander/
      • /en/glossary/fundamentals/minimum-energy-trajectory/
      • /en/glossary/fundamentals/newton-iteration-method/
      • /en/glossary/fundamentals/nutation/
      • /en/glossary/fundamentals/optimal-velocity-inclination/
      • /en/glossary/fundamentals/orbit-capture/
      • /en/glossary/fundamentals/orbit-insertion-conditions/
      • /en/glossary/fundamentals/orbital-elements/
      • /en/glossary/fundamentals/orbital-equation/
      • /en/glossary/fundamentals/orbital-maneuver/
      • /en/glossary/fundamentals/orbital-phase/
      • /en/glossary/fundamentals/orbital-transfer-vehicle/
      • /en/glossary/fundamentals/perturbation-motion/
      • /en/glossary/fundamentals/phasing-orbit/
      • /en/glossary/fundamentals/pitch-program/
      • /en/glossary/fundamentals/powered-phase/
      • /en/glossary/fundamentals/precession/
      • /en/glossary/fundamentals/pressure-center/
      • /en/glossary/fundamentals/range-error-coefficient/
      • /en/glossary/fundamentals/reentry-corridor/
      • /en/glossary/fundamentals/reentry-phase/
      • /en/glossary/fundamentals/repeat-ground-track-orbit/
      • /en/glossary/fundamentals/reusable-launch-vehicle/
      • /en/glossary/fundamentals/satellite-ring/
      • /en/glossary/fundamentals/sequential-quadratic-programming/
      • /en/glossary/fundamentals/skip-reentry/
      • /en/glossary/fundamentals/solar-exposure-factor/
      • /en/glossary/fundamentals/specific-angular-momentum/
      • /en/glossary/fundamentals/specific-impulse/
      • /en/glossary/fundamentals/stagnation-heat-flux/
      • /en/glossary/fundamentals/standard-atmosphere/
      • /en/glossary/fundamentals/subsatellite-track/
      • /en/glossary/fundamentals/sun-synchronous-orbit/
      • /en/glossary/fundamentals/thrust-to-weight-ratio/
      • /en/glossary/fundamentals/thrust/
      • /en/glossary/fundamentals/total-angle-of-attack/
      • /en/glossary/fundamentals/trajectory-equation/
      • /en/glossary/fundamentals/trajectory-optimization/
      • /en/glossary/fundamentals/trim-angle-of-attack/
      • /en/glossary/fundamentals/true-anomaly/
      • /en/glossary/fundamentals/tsiolkovsky-equation/
      • /en/glossary/fundamentals/turning-program/
      • /en/glossary/fundamentals/two-body-problem/
      • /en/glossary/fundamentals/utc/
      • /en/glossary/fundamentals/variation-of-parameters/
      • /en/glossary/fundamentals/velocity-frame/
      • /en/glossary/fundamentals/velocity-inclination-angle/
      • /en/glossary/fundamentals/vis-viva-equation/
      • /en/glossary/fundamentals/vleo/
      • /en/glossary/fundamentals/walker-constellation/
      • /en/glossary/fundamentals/zero-angle-of-attack-reentry/
    • Dynamics & Math

      • A* Search Algorithm (A* Search)
      • A2PPO (Attention-Augmented Proximal Policy Optimization)
      • Action-Angle Variables
      • Backstepping Sliding Mode Control
      • Backward Stability Set
      • Bang-bang Control (Bang-bang Control)
      • Barycentric Synodic Coordinate System
      • Batch Deployment (Batch Deployment)
      • Bicircular Four-Body Problem
      • Birkhoff-Gustavson Normal Form
      • Buoyancy-weight Imbalance
      • Capture Set
      • Central Manifold
      • Chaos Effect
      • Clohessy-Wiltshire (CW) Equation
      • Co-state Normalization (Co-state Normalization)
      • Coasting Arc (Coasting Arc)
      • Continuation Method (Parameter Continuation)
      • Continuation (延拓)
      • Cooperative Agent (CA)
      • CR3BP with Low-Thrust (CR3BP-LT)
      • Circular Restricted Three-Body Problem (CR3BP)
      • Curriculum Learning
      • Deep Reinforcement Learning
      • Differential Correction (微分修正)
      • Differential Evolution (DE) Algorithm
      • Differential Games (Differential Games)
      • Direct Collocation
      • Dynamic Programming (Dynamic Programming)
      • Dynamic Target Method
      • Ephemeris Model
      • Equinoctial Orbital Elements (Equinoctial Orbital Elements)
      • Fuzzy Backstepping Control
      • Generalized Advantage Estimation (GAE)
      • Gaussian Process Regression
      • Geocentric Rotating Coordinate System (GRC)
      • Heteroclinic Orbit Transfer (Heteroclinic Orbit Transfer)
      • Hill Three-Body Problem
      • Homotopy Method (Homotopy Method)
      • Improved Baseline Control-Point Method (Improved Baseline Control-Point Method)
      • Impulsive Maneuver (脉冲机动)
      • Initial Value Optimization
      • Invariant Manifold (Invariant Manifold)
      • J2000 Geocentric Equatorial Coordinate System (J2000 Geocentric Equatorial Coordinate System)
      • Jacobi Constant (Jacobi Integral)
      • K-Means Clustering (K-Means Clustering)
      • K-Medoids Clustering (K-Medoids Clustering)
      • KD-Tree (KD-Tree)
      • Libration Point (Equilibrium Point)
      • Libration Point Spacecraft Body Coordinate System (Libration Point Spacecraft Body Coordinate System)
      • Libration Point Spacecraft Orbital Coordinate System (Libration Point Spacecraft Orbital Coordinate System)
      • Lindstedt-Poincare Method (Lindstedt-Poincare Method)
      • L2-centered Rotating Coordinate System (L2-centered Rotating Coordinate System, LRC)
      • Low-Thrust Transfer MDP Formulation
      • Mass Discontinuity (Mass Discontinuity)
      • Monodromy Matrix
      • Newton-Euler Equations
      • Particle Swarm Optimization
      • Patch Point (Splicing Point)
      • Patched Method (拼接法)
      • Poincaré Map (庞加莱图)
      • Poincaré Section
      • Quasi-Bicircular Problem (QBCP)
      • Quasi-Bicircular Four-Body Problem
      • Regional Station-keeping Control
      • Seven-node Model
      • Shooting Method
      • Six-DOF Motion Equations
      • Sliding Mode Control
      • Solar Radiation Pressure (SRP)
      • Stability Index
      • Stability Set
      • State Transition Matrix (STM)
      • Static Lift
      • Strobe Map
      • Targeting Method
      • Thermo-mechanical Coupling Model
      • Thermodynamic Model
      • Two-Level Differential Correction Method
      • Two-node Model
      • Variational Mode Decomposition
      • Zero-Velocity Surface
      • /en/glossary/dynamics/ddpg/
      • /en/glossary/dynamics/hcpso/
      • /en/glossary/dynamics/mo-mcts/
      • /en/glossary/dynamics/nsga-ii/
      • /en/glossary/dynamics/pareto-optimal/
      • /en/glossary/dynamics/pontryagin-principle/
      • /en/glossary/dynamics/pseudo-arclength-continuation/
      • /en/glossary/dynamics/pursuit-evasion-game/
      • /en/glossary/dynamics/q-law/
      • /en/glossary/dynamics/reachable-set/
      • /en/glossary/dynamics/reduced-order-dynamics/
      • /en/glossary/dynamics/regularization/
      • /en/glossary/dynamics/rlepeso/
      • /en/glossary/dynamics/saddle-point-strategy/
      • /en/glossary/dynamics/state-dependent-tsp/
      • /en/glossary/dynamics/two-dominant-invariant-manifold/
      • /en/glossary/dynamics/zero-effort-miss/
    • Mission orbits

      • Apolune (远月点)
      • Ballistic Capture Orbit
      • Cycler Trajectory
      • DRO Constellation
      • Distant Retrograde Orbit (DRO)
      • Earth-Moon L1/L2 Halo Orbit (EML1/EML2 Halo)
      • Free-Return Trajectory (自由返回轨道)
      • Full Lunar Surface Coverage Orbit
      • Halo Orbit (Halo 轨道)
      • Lissajous Orbit (Lissajous 轨道)
      • Low-Energy Transfer Orbit
      • Lyapunov Orbit (Lyapunov 轨道)
      • Multi-Revolution Halo Orbit
      • Near-Rectilinear Halo Orbit (NRHO)
      • Orbit Identification
      • Orbit Keeping (Station-Keeping)
      • Parking Orbit (停泊轨道)
      • Perilune (近月点)
      • Prograde (顺行)
      • Quasi-Periodic Orbit
      • Resonance Orbit
      • Retrograde (逆行)
      • Transfer Orbit (转移轨道)
      • /en/glossary/orbits/axial-orbit/
      • /en/glossary/orbits/butterfly-orbit/
      • /en/glossary/orbits/dpo/
      • /en/glossary/orbits/horseshoe-orbit/
      • /en/glossary/orbits/hub-and-spoke/
      • /en/glossary/orbits/lopo/
      • /en/glossary/orbits/polynomial-constraint-stationkeeping/
      • /en/glossary/orbits/primary-impulse-transfer/
      • /en/glossary/orbits/vertical-orbit/
    • Navigation

      • Altitude Regulation
      • Cislunar Spatiotemporal Reference
      • Earth-Moon Hybrid Navigation
      • Earth GNSS Weak Signal Navigation
      • Inter-Satellite Link Navigation
      • LiAISON Navigation
      • LunaNet (Lunar Network)
      • Lunar Navigation Constellation
      • Moonlight Initiative
      • Tiandu-1
      • Trajectory Planning
      • X-ray Pulsar Navigation
      • /en/glossary/navigation/autonomous-navigation/
      • /en/glossary/navigation/extended-kalman-filter/
      • /en/glossary/navigation/gagan/
      • /en/glossary/navigation/irnss/
      • /en/glossary/navigation/observability/
      • /en/glossary/navigation/orbit-identification/
      • /en/glossary/navigation/pnt/
      • /en/glossary/navigation/sem-autonomous-navigation/
    • Lunar minerals

      • Changeite-Ce (Cerium Changeite)
      • Changeite-Mg (Magnesium Changeite)
    • Programs & missions

      • Artemis Program
      • LuGRE Experiment
    • Other

      • Actuator Error
      • Chain-of-Thought (CoT) Prompting
      • Cislunar Navigation Prospects
      • Cislunar Space (地月空间)
      • EXOSIMS
      • Floquet Mode Method
      • Impulse Thrust
      • Insertion Error
      • Low Earth Orbit / LEO (低地球轨道)
      • Low-Rank Adaptation (LoRA)
      • Lunar Gravity Assist / LGA (月球借力)
      • Navigation Error
      • Noncooperative Target
      • Nuclear Thermal Propulsion (NTP)
      • Orbit Insertion (入轨)
      • Period-Doubling Bifurcation
      • Longitudinal Coupling Vibration (POGO)
      • Powered Lunar Flyby / PLF (有动力月球借力)
      • Prompt Tuning (P-tuning)
      • Reflection Coefficient (C_R)
      • Solar Constant (S₀)
      • Space Traffic Management (STM)
      • Spacecraft Intention Recognition
      • Starshade
      • Weak Stability Boundary / WSB (弱稳定边界)
      • /en/glossary/other/gslv/
      • /en/glossary/other/insat/
      • /en/glossary/other/orbital-residence-platform/
      • /en/glossary/other/pslv/
      • /en/glossary/other/pursuit-evasion-defense/
    • Organizations

      • Anduril Industries
      • Booz Allen Hamilton
      • General Dynamics Mission Systems
      • GITAI USA
      • Lockheed Martin
      • Northrop Grumman
      • Quindar
      • Raytheon Missiles & Defense
      • Sci-Tec
      • SpaceX
      • True Anomaly
      • Turion Space
      • /en/glossary/organizations/danuri/
      • /en/glossary/organizations/isro/
      • /en/glossary/organizations/kasa/
      • /en/glossary/organizations/sriharikota/
      • /en/glossary/organizations/true-anomaly-company/
    • Military space doctrine

      • Cislunar Space Situational Awareness
      • Competitive Endurance
      • Component Field Commands
      • Commander, Space Forces (COMSPACEFOR)
      • Counterspace Operations
      • DOTMLPF-P Framework
      • Force Design
      • Force Development
      • Force Employment
      • Force Generation
      • Golden Dome
      • Mission Command
      • Mission Delta (MD)
      • Operational Test and Training Infrastructure (OTTI)
      • Resilient/Disaggregated Architecture
      • Space Domain Awareness (SDA)
      • Space Mission Task Force (SMTF)
      • Space Superiority
      • Space Force Generation Process (SPAFORGEN)
      • System Delta (SYD)
      • /en/glossary/doctrine/asat/
      • /en/glossary/doctrine/civil-military-integration/
      • /en/glossary/doctrine/directed-energy-weapon/
      • /en/glossary/doctrine/distributed-architecture/
      • /en/glossary/doctrine/kinetic-weapon/
      • /en/glossary/doctrine/persistent-detection-corridor/
      • /en/glossary/doctrine/resilience-map/
    • Observation techniques

      • Astrometry
      • Background Star Elimination
      • Cislunar Moving Objects
      • Continuous Coverage (CP)
      • Earth Albedo
      • Ephemeris Correlation
      • Hot Pixel
      • Image Registration
      • Image Stacking
      • Lunar Glare Zone
      • Quasi-zero Wind Layer
      • Segmentation Map
      • Shift-and-Add (SAA)
      • Sidereal Tracking
      • Signal-to-Noise Ratio (SNR)
      • Solar Radiation
      • Source Extraction
      • Synthetic Tracking
      • Zonal Wind
      • /en/glossary/observation/illumination-constraint/
      • /en/glossary/observation/pointing-constraint/
    • Satellite Communication & TT&C

      • All-Time Seamless Communication
      • BeiDou Satellite System
      • Constellation Networking
      • Inter-Satellite Link (ISL)
      • Laser-Microwave Communication
      • Microwave Link

A2PPO (Attention-Augmented Proximal Policy Optimization)

Definition

A2PPO is a Deep Reinforcement Learning (DRL) framework for low-thrust trajectory optimization in cislunar space, proposed by Ul Haq, Dai, Du et al. in 2026. Its core innovation lies in integrating a directional cross-attention mechanism into the Actor-Critic architecture of the standard PPO (Proximal Policy Optimization) algorithm, enabling the policy network to selectively attend to state features that the Critic network deems important for future value, thereby improving learning stability and sample efficiency in chaotic multi-body dynamical environments.

Algorithm Architecture

Core Components

The forward propagation pipeline of A2PPO proceeds as follows:

  1. Shared MLP Encoder: Encodes the raw state st∈R16s_t \in \mathbb{R}^{16}st​∈R16 into a hidden vector ht∈R128h_t \in \mathbb{R}^{128}ht​∈R128
  2. Role Projection: Projects hth_tht​ into Actor- and Critic-specific role vectors via two independent linear projections Wa,Wc∈R128×128W_a, W_c \in \mathbb{R}^{128 \times 128}Wa​,Wc​∈R128×128
  3. Tokenization: Reshapes the role vectors into M=4M=4M=4 sub-tokens of dimension d=32d=32d=32 (D=M×d=128D = M \times d = 128D=M×d=128), with learned positional embeddings added
  4. Directional Cross-Attention: Actor tokens serve as Query, Critic tokens as Key and Value, performing feature fusion through multi-head cross-attention (Nh=2N_h=2Nh​=2 heads)
  5. Fusion Output: After residual connections and per-token feed-forward networks (FFN), layer normalization is applied and the result is flattened to obtain the fused hidden vector zt∈R128z_t \in \mathbb{R}^{128}zt​∈R128

Key Design: Directionality

A2PPO adopts an asymmetric Critic → Actor directional cross-attention design: the policy representation is conditioned on the value function's assessment signals, while the Critic remains decoupled from Actor exploration noise. This design outperforms self-attention variants in ablation experiments, significantly improving training stability.

PPO Loss Function

A2PPO optimizes the following composite loss:

J(θ,ψ)=−Lclip(θ)+cv12E[(Vψ(zt)−R^t)2]−ceE[H(πθ(⋅∣zt))]J(\theta, \psi) = -\mathcal{L}^{\mathrm{clip}}(\theta) + c_v \frac{1}{2} \mathbb{E}\left[ (V_\psi(z_t) - \hat{R}_t)^2 \right] - c_e \mathbb{E}\left[ \mathcal{H}(\pi_\theta(\cdot|z_t)) \right] J(θ,ψ)=−Lclip(θ)+cv​21​E[(Vψ​(zt​)−R^t​)2]−ce​E[H(πθ​(⋅∣zt​))]

The three terms are: the clipped policy loss, value function error (weight cvc_vcv​), and policy entropy regularization (weight cec_ece​).

Training Strategy

Curriculum Learning

A2PPO employs a progressive curriculum learning strategy, gradually tightening success thresholds: initial stages use relaxed terminal position/velocity tolerances (e.g., Δd=5×10−3\Delta d = 5 \times 10^{-3}Δd=5×10−3), progressively tightening to Δd=1×10−3\Delta d = 1 \times 10^{-3}Δd=1×10−3 as training advances. This strategy avoids initial instability in the chaotic CR3BP dynamical environment.

Hyperparameter Tuning

A two-stage hyperparameter search (100 trials each) is conducted using the Optuna framework, with key parameters including learning rate (1.315×10−31.315 \times 10^{-3}1.315×10−3), PPO clipping range (0.249), entropy coefficient (0.01474), and GAE-λ\lambdaλ (0.915).

Performance Evaluation

Evaluation results across four cislunar low-thrust transfer scenarios:

ScenarioDescriptionToF (days)Fuel (kg)vs. Direct Collocation
S1L₂ Halo → Halo4.952.084.99 days / 1.28 kg
S2L₂ Halo → NRHO8.385.007.26 days / 5.29 kg
S3NRHO → DRO7.605.107.63 days / 5.11 kg
S4Multi-rev Halo → Halo (very low thrust)33.60.9733.12 days / 0.97 kg

Without any initial guess, A2PPO autonomously learns trajectories highly consistent with direct collocation baselines, while significantly outperforming the SAC baseline in multi-revolution transfer scenarios (37.37 days / 1.06 kg).

Robustness

  • Monte Carlo perturbation test: 100% success rate under 100 initial state perturbations (σ=10−3\sigma = 10^{-3}σ=10−3 NDU)
  • Thrust degradation tolerance: Completes missions under up to 32% deterministic thrust degradation without retraining

Relation to Related Concepts

  • Standard PPO: A2PPO adds a directional cross-attention module on top of standard PPO, with both training convergence speed and final reward significantly outperforming Vanilla PPO
  • SAC (Soft Actor-Critic): As a comparison baseline, A2PPO wins with shorter time and less fuel in multi-revolution transfer scenarios
  • GTrXL: Another Transformer-enhanced RL method; A2PPO's cross-attention mechanism differs, focusing on Actor-Critic feature fusion
  • Generalized Advantage Estimation (GAE): A key component for advantage function estimation in A2PPO
  • Curriculum Learning: The progressive training strategy employed by A2PPO
  • Low-Thrust Transfer MDP: The problem formulation framework for A2PPO

References

  • Ul Haq I U, Dai H, Du C. Autonomous low-thrust trajectory optimization in cislunar space via attention-augmented reinforcement learning. Aerospace Science and Technology, 2026.
Improve this page
Last Updated: 4/29/26, 11:30 AM
Contributors: Hermes Agent, Cron Job
Prev
A* Search Algorithm (A* Search)
Next
Action-Angle Variables
地月空间入门指南
Cislunar Space Beginner's GuideYour guide to cislunar space
View on GitHub

Navigate

  • Home
  • About
  • Space News
  • Glossary

Content

  • Cislunar Orbits
  • Research
  • Resources
  • Blue Team

English

  • Home
  • About
  • Space News
  • Glossary

Follow Us

© 2026 Cislunar Space Beginner's Guide  |  湘ICP备2026006405号-1
Related:智慧学习助手 UStudy航天任务工具箱 ATK
支持我
鼓励和赞赏我感谢您的支持