Cislunar Space Beginner's GuideCislunar Space Beginner's Guide
  • Satellite Orbit Simulation
  • Historical Inquiry
Cislunar Glossary
Resources & Tools
Blue Team Research
Space News
AI Q&A
Forum
Home
Gitee
GitHub
  • 简体中文
  • English
  • Satellite Orbit Simulation
  • Historical Inquiry
Cislunar Glossary
Resources & Tools
Blue Team Research
Space News
AI Q&A
Forum
Home
Gitee
GitHub
  • 简体中文
  • English
  • Site map

    • Home (overview)
    • Intro · what is cislunar space
    • Orbits · spacecraft trajectories
    • Frontiers · directions & labs
    • Glossary · terms & definitions
    • Tools · data & code
    • News · space industry archive
    • Topic · blue-team research
  • Cislunar glossary (terms & definitions)

    • Cislunar Space Glossary
    • Fundamentals

      • Allan Deviation (ADEV)
      • Dual One-Way Ranging (DOWR)
      • Einstein Equivalence Principle (EEP)
      • Gravitational Redshift
      • High Altitude Airship (HAA)
      • Near-space
      • Passive Hydrogen Maser (PHM)
      • Stratospheric Airship
      • /en/glossary/fundamentals/absolute-range/
      • /en/glossary/fundamentals/aerodynamic-coefficient/
      • /en/glossary/fundamentals/aerodynamic-moment/
      • /en/glossary/fundamentals/aerospace-vehicle/
      • /en/glossary/fundamentals/ballistic-coefficient/
      • /en/glossary/fundamentals/bi-elliptic-transfer/
      • /en/glossary/fundamentals/body-frame/
      • /en/glossary/fundamentals/celestial-coordinate-system/
      • /en/glossary/fundamentals/celestial-sphere/
      • /en/glossary/fundamentals/characteristic-velocity/
      • /en/glossary/fundamentals/coverage-angle/
      • /en/glossary/fundamentals/earth-ellipsoid/
      • /en/glossary/fundamentals/earth-oblateness-perturbation/
      • /en/glossary/fundamentals/ecef-frame/
      • /en/glossary/fundamentals/energy-parameter/
      • /en/glossary/fundamentals/finite-thrust-maneuver/
      • /en/glossary/fundamentals/free-flight-phase/
      • /en/glossary/fundamentals/free-flight-trajectory/
      • /en/glossary/fundamentals/frozen-orbit/
      • /en/glossary/fundamentals/gaussian-perturbation-equations/
      • /en/glossary/fundamentals/geocentric-inertial-frame/
      • /en/glossary/fundamentals/gps-time/
      • /en/glossary/fundamentals/gravitational-potential/
      • /en/glossary/fundamentals/gravity-turn/
      • /en/glossary/fundamentals/gravity-vs-gravitation/
      • /en/glossary/fundamentals/hit-equation/
      • /en/glossary/fundamentals/hohmann-transfer/
      • /en/glossary/fundamentals/inertial-navigation-system/
      • /en/glossary/fundamentals/instantaneous-balance/
      • /en/glossary/fundamentals/isru/
      • /en/glossary/fundamentals/julian-date/
      • /en/glossary/fundamentals/kepler-equation/
      • /en/glossary/fundamentals/kompsat/
      • /en/glossary/fundamentals/lagrangian-perturbation-equations/
      • /en/glossary/fundamentals/launch-azimuth/
      • /en/glossary/fundamentals/launch-window/
      • /en/glossary/fundamentals/lift-to-drag-ratio/
      • /en/glossary/fundamentals/load-factor/
      • /en/glossary/fundamentals/longitudinal-lateral-motion/
      • /en/glossary/fundamentals/lunar-lander/
      • /en/glossary/fundamentals/minimum-energy-trajectory/
      • /en/glossary/fundamentals/newton-iteration-method/
      • /en/glossary/fundamentals/nutation/
      • /en/glossary/fundamentals/optimal-velocity-inclination/
      • /en/glossary/fundamentals/orbit-capture/
      • /en/glossary/fundamentals/orbit-insertion-conditions/
      • /en/glossary/fundamentals/orbital-elements/
      • /en/glossary/fundamentals/orbital-equation/
      • /en/glossary/fundamentals/orbital-maneuver/
      • /en/glossary/fundamentals/orbital-phase/
      • /en/glossary/fundamentals/orbital-transfer-vehicle/
      • /en/glossary/fundamentals/perturbation-motion/
      • /en/glossary/fundamentals/phasing-orbit/
      • /en/glossary/fundamentals/pitch-program/
      • /en/glossary/fundamentals/powered-phase/
      • /en/glossary/fundamentals/precession/
      • /en/glossary/fundamentals/pressure-center/
      • /en/glossary/fundamentals/range-error-coefficient/
      • /en/glossary/fundamentals/reentry-corridor/
      • /en/glossary/fundamentals/reentry-phase/
      • /en/glossary/fundamentals/repeat-ground-track-orbit/
      • /en/glossary/fundamentals/reusable-launch-vehicle/
      • /en/glossary/fundamentals/satellite-ring/
      • /en/glossary/fundamentals/sequential-quadratic-programming/
      • /en/glossary/fundamentals/skip-reentry/
      • /en/glossary/fundamentals/solar-exposure-factor/
      • /en/glossary/fundamentals/specific-angular-momentum/
      • /en/glossary/fundamentals/specific-impulse/
      • /en/glossary/fundamentals/stagnation-heat-flux/
      • /en/glossary/fundamentals/standard-atmosphere/
      • /en/glossary/fundamentals/subsatellite-track/
      • /en/glossary/fundamentals/sun-synchronous-orbit/
      • /en/glossary/fundamentals/thrust-to-weight-ratio/
      • /en/glossary/fundamentals/thrust/
      • /en/glossary/fundamentals/total-angle-of-attack/
      • /en/glossary/fundamentals/trajectory-equation/
      • /en/glossary/fundamentals/trajectory-optimization/
      • /en/glossary/fundamentals/trim-angle-of-attack/
      • /en/glossary/fundamentals/true-anomaly/
      • /en/glossary/fundamentals/tsiolkovsky-equation/
      • /en/glossary/fundamentals/turning-program/
      • /en/glossary/fundamentals/two-body-problem/
      • /en/glossary/fundamentals/utc/
      • /en/glossary/fundamentals/variation-of-parameters/
      • /en/glossary/fundamentals/velocity-frame/
      • /en/glossary/fundamentals/velocity-inclination-angle/
      • /en/glossary/fundamentals/vis-viva-equation/
      • /en/glossary/fundamentals/vleo/
      • /en/glossary/fundamentals/walker-constellation/
      • /en/glossary/fundamentals/zero-angle-of-attack-reentry/
    • Dynamics & Math

      • A* Search Algorithm (A* Search)
      • A2PPO (Attention-Augmented Proximal Policy Optimization)
      • Action-Angle Variables
      • Backstepping Sliding Mode Control
      • Backward Stability Set
      • Bang-bang Control (Bang-bang Control)
      • Barycentric Synodic Coordinate System
      • Batch Deployment (Batch Deployment)
      • Bicircular Four-Body Problem
      • Birkhoff-Gustavson Normal Form
      • Buoyancy-weight Imbalance
      • Capture Set
      • Central Manifold
      • Chaos Effect
      • Clohessy-Wiltshire (CW) Equation
      • Co-state Normalization (Co-state Normalization)
      • Coasting Arc (Coasting Arc)
      • Continuation Method (Parameter Continuation)
      • Continuation (延拓)
      • Cooperative Agent (CA)
      • CR3BP with Low-Thrust (CR3BP-LT)
      • Circular Restricted Three-Body Problem (CR3BP)
      • Curriculum Learning
      • Deep Reinforcement Learning
      • Differential Correction (微分修正)
      • Differential Evolution (DE) Algorithm
      • Differential Games (Differential Games)
      • Direct Collocation
      • Dynamic Programming (Dynamic Programming)
      • Dynamic Target Method
      • Ephemeris Model
      • Equinoctial Orbital Elements (Equinoctial Orbital Elements)
      • Fuzzy Backstepping Control
      • Generalized Advantage Estimation (GAE)
      • Gaussian Process Regression
      • Geocentric Rotating Coordinate System (GRC)
      • Heteroclinic Orbit Transfer (Heteroclinic Orbit Transfer)
      • Hill Three-Body Problem
      • Homotopy Method (Homotopy Method)
      • Improved Baseline Control-Point Method (Improved Baseline Control-Point Method)
      • Impulsive Maneuver (脉冲机动)
      • Initial Value Optimization
      • Invariant Manifold (Invariant Manifold)
      • J2000 Geocentric Equatorial Coordinate System (J2000 Geocentric Equatorial Coordinate System)
      • Jacobi Constant (Jacobi Integral)
      • K-Means Clustering (K-Means Clustering)
      • K-Medoids Clustering (K-Medoids Clustering)
      • KD-Tree (KD-Tree)
      • Libration Point (Equilibrium Point)
      • Libration Point Spacecraft Body Coordinate System (Libration Point Spacecraft Body Coordinate System)
      • Libration Point Spacecraft Orbital Coordinate System (Libration Point Spacecraft Orbital Coordinate System)
      • Lindstedt-Poincare Method (Lindstedt-Poincare Method)
      • L2-centered Rotating Coordinate System (L2-centered Rotating Coordinate System, LRC)
      • Low-Thrust Transfer MDP Formulation
      • Mass Discontinuity (Mass Discontinuity)
      • Monodromy Matrix
      • Newton-Euler Equations
      • Particle Swarm Optimization
      • Patch Point (Splicing Point)
      • Patched Method (拼接法)
      • Poincaré Map (庞加莱图)
      • Poincaré Section
      • Quasi-Bicircular Problem (QBCP)
      • Quasi-Bicircular Four-Body Problem
      • Regional Station-keeping Control
      • Seven-node Model
      • Shooting Method
      • Six-DOF Motion Equations
      • Sliding Mode Control
      • Solar Radiation Pressure (SRP)
      • Stability Index
      • Stability Set
      • State Transition Matrix (STM)
      • Static Lift
      • Strobe Map
      • Targeting Method
      • Thermo-mechanical Coupling Model
      • Thermodynamic Model
      • Two-Level Differential Correction Method
      • Two-node Model
      • Variational Mode Decomposition
      • Zero-Velocity Surface
      • /en/glossary/dynamics/ddpg/
      • /en/glossary/dynamics/hcpso/
      • /en/glossary/dynamics/mo-mcts/
      • /en/glossary/dynamics/nsga-ii/
      • /en/glossary/dynamics/pareto-optimal/
      • /en/glossary/dynamics/pontryagin-principle/
      • /en/glossary/dynamics/pseudo-arclength-continuation/
      • /en/glossary/dynamics/pursuit-evasion-game/
      • /en/glossary/dynamics/q-law/
      • /en/glossary/dynamics/reachable-set/
      • /en/glossary/dynamics/reduced-order-dynamics/
      • /en/glossary/dynamics/regularization/
      • /en/glossary/dynamics/rlepeso/
      • /en/glossary/dynamics/saddle-point-strategy/
      • /en/glossary/dynamics/state-dependent-tsp/
      • /en/glossary/dynamics/two-dominant-invariant-manifold/
      • /en/glossary/dynamics/zero-effort-miss/
    • Mission orbits

      • Apolune (远月点)
      • Ballistic Capture Orbit
      • Cycler Trajectory
      • DRO Constellation
      • Distant Retrograde Orbit (DRO)
      • Earth-Moon L1/L2 Halo Orbit (EML1/EML2 Halo)
      • Free-Return Trajectory (自由返回轨道)
      • Full Lunar Surface Coverage Orbit
      • Halo Orbit (Halo 轨道)
      • Lissajous Orbit (Lissajous 轨道)
      • Low-Energy Transfer Orbit
      • Lyapunov Orbit (Lyapunov 轨道)
      • Multi-Revolution Halo Orbit
      • Near-Rectilinear Halo Orbit (NRHO)
      • Orbit Identification
      • Orbit Keeping (Station-Keeping)
      • Parking Orbit (停泊轨道)
      • Perilune (近月点)
      • Prograde (顺行)
      • Quasi-Periodic Orbit
      • Resonance Orbit
      • Retrograde (逆行)
      • Transfer Orbit (转移轨道)
      • /en/glossary/orbits/axial-orbit/
      • /en/glossary/orbits/butterfly-orbit/
      • /en/glossary/orbits/dpo/
      • /en/glossary/orbits/horseshoe-orbit/
      • /en/glossary/orbits/hub-and-spoke/
      • /en/glossary/orbits/lopo/
      • /en/glossary/orbits/polynomial-constraint-stationkeeping/
      • /en/glossary/orbits/primary-impulse-transfer/
      • /en/glossary/orbits/vertical-orbit/
    • Navigation

      • Altitude Regulation
      • Cislunar Spatiotemporal Reference
      • Earth-Moon Hybrid Navigation
      • Earth GNSS Weak Signal Navigation
      • Inter-Satellite Link Navigation
      • LiAISON Navigation
      • LunaNet (Lunar Network)
      • Lunar Navigation Constellation
      • Moonlight Initiative
      • Tiandu-1
      • Trajectory Planning
      • X-ray Pulsar Navigation
      • /en/glossary/navigation/autonomous-navigation/
      • /en/glossary/navigation/extended-kalman-filter/
      • /en/glossary/navigation/gagan/
      • /en/glossary/navigation/irnss/
      • /en/glossary/navigation/observability/
      • /en/glossary/navigation/orbit-identification/
      • /en/glossary/navigation/pnt/
      • /en/glossary/navigation/sem-autonomous-navigation/
    • Lunar minerals

      • Changeite-Ce (Cerium Changeite)
      • Changeite-Mg (Magnesium Changeite)
    • Programs & missions

      • Artemis Program
      • LuGRE Experiment
    • Other

      • Actuator Error
      • Chain-of-Thought (CoT) Prompting
      • Cislunar Navigation Prospects
      • Cislunar Space (地月空间)
      • EXOSIMS
      • Floquet Mode Method
      • Impulse Thrust
      • Insertion Error
      • Low Earth Orbit / LEO (低地球轨道)
      • Low-Rank Adaptation (LoRA)
      • Lunar Gravity Assist / LGA (月球借力)
      • Navigation Error
      • Noncooperative Target
      • Nuclear Thermal Propulsion (NTP)
      • Orbit Insertion (入轨)
      • Period-Doubling Bifurcation
      • Longitudinal Coupling Vibration (POGO)
      • Powered Lunar Flyby / PLF (有动力月球借力)
      • Prompt Tuning (P-tuning)
      • Reflection Coefficient (C_R)
      • Solar Constant (S₀)
      • Space Traffic Management (STM)
      • Spacecraft Intention Recognition
      • Starshade
      • Weak Stability Boundary / WSB (弱稳定边界)
      • /en/glossary/other/gslv/
      • /en/glossary/other/insat/
      • /en/glossary/other/orbital-residence-platform/
      • /en/glossary/other/pslv/
      • /en/glossary/other/pursuit-evasion-defense/
    • Organizations

      • Anduril Industries
      • Booz Allen Hamilton
      • General Dynamics Mission Systems
      • GITAI USA
      • Lockheed Martin
      • Northrop Grumman
      • Quindar
      • Raytheon Missiles & Defense
      • Sci-Tec
      • SpaceX
      • True Anomaly
      • Turion Space
      • /en/glossary/organizations/danuri/
      • /en/glossary/organizations/isro/
      • /en/glossary/organizations/kasa/
      • /en/glossary/organizations/sriharikota/
      • /en/glossary/organizations/true-anomaly-company/
    • Military space doctrine

      • Cislunar Space Situational Awareness
      • Competitive Endurance
      • Component Field Commands
      • Commander, Space Forces (COMSPACEFOR)
      • Counterspace Operations
      • DOTMLPF-P Framework
      • Force Design
      • Force Development
      • Force Employment
      • Force Generation
      • Golden Dome
      • Mission Command
      • Mission Delta (MD)
      • Operational Test and Training Infrastructure (OTTI)
      • Resilient/Disaggregated Architecture
      • Space Domain Awareness (SDA)
      • Space Mission Task Force (SMTF)
      • Space Superiority
      • Space Force Generation Process (SPAFORGEN)
      • System Delta (SYD)
      • /en/glossary/doctrine/asat/
      • /en/glossary/doctrine/civil-military-integration/
      • /en/glossary/doctrine/directed-energy-weapon/
      • /en/glossary/doctrine/distributed-architecture/
      • /en/glossary/doctrine/kinetic-weapon/
      • /en/glossary/doctrine/persistent-detection-corridor/
      • /en/glossary/doctrine/resilience-map/
    • Observation techniques

      • Astrometry
      • Background Star Elimination
      • Cislunar Moving Objects
      • Continuous Coverage (CP)
      • Earth Albedo
      • Ephemeris Correlation
      • Hot Pixel
      • Image Registration
      • Image Stacking
      • Lunar Glare Zone
      • Quasi-zero Wind Layer
      • Segmentation Map
      • Shift-and-Add (SAA)
      • Sidereal Tracking
      • Signal-to-Noise Ratio (SNR)
      • Solar Radiation
      • Source Extraction
      • Synthetic Tracking
      • Zonal Wind
      • /en/glossary/observation/illumination-constraint/
      • /en/glossary/observation/pointing-constraint/
    • Satellite Communication & TT&C

      • All-Time Seamless Communication
      • BeiDou Satellite System
      • Constellation Networking
      • Inter-Satellite Link (ISL)
      • Laser-Microwave Communication
      • Microwave Link

Generalized Advantage Estimation (GAE)

Definition

Generalized Advantage Estimation (GAE) is a bias-variance balancing technique for estimating the advantage function in reinforcement learning, proposed by Schulman et al. in 2015. GAE provides low-variance but nearly unbiased advantage estimates for policy gradient algorithms (such as PPO and A2PPO) by computing exponentially weighted averages of multiple temporal difference (TD) residuals.

Background: Advantage Function and TD Residuals

In Actor-Critic reinforcement learning, the advantage function is defined as:

Aπ(st,at)=Qπ(st,at)−Vπ(st)A^\pi(s_t, a_t) = Q^\pi(s_t, a_t) - V^\pi(s_t) Aπ(st​,at​)=Qπ(st​,at​)−Vπ(st​)

Direct computation requires knowledge of the true value function VπV^\piVπ, which in practice must be approximated. The simple one-step TD advantage estimate is:

At(1)=δt=rt+γV(st+1)−V(st)A_t^{(1)} = \delta_t = r_t + \gamma V(s_{t+1}) - V(s_t) At(1)​=δt​=rt​+γV(st+1​)−V(st​)

However, one-step estimates have low variance but high bias (due to reliance on inaccurate value estimates). nnn-step returns can reduce bias but increase variance.

GAE Definition

GAE balances bias and variance through exponentially weighted averaging of nnn-step TD residuals:

A^tGAE(λ,γ)=∑k=0∞(γλ)kδt+k\hat{A}_t^{\text{GAE}(\lambda, \gamma)} = \sum_{k=0}^{\infty} (\gamma\lambda)^{k} \delta_{t+k} A^tGAE(λ,γ)​=k=0∑∞​(γλ)kδt+k​

where λ∈[0,1]\lambda \in [0,1]λ∈[0,1] controls the bias-variance tradeoff:

  • λ=0\lambda = 0λ=0: Degenerates to one-step TD (low variance, high bias)
  • λ=1\lambda = 1λ=1: Similar to nnn-step returns (low bias, high variance)

In practice, due to finite horizon, the recursive form is used:

A^t=δt+γλ(1−dt)A^t+1\hat{A}_t = \delta_t + \gamma\lambda(1-d_t)\hat{A}_{t+1} A^t​=δt​+γλ(1−dt​)A^t+1​

where dtd_tdt​ is the termination signal (dt=1d_t=1dt​=1 indicates episode termination at step ttt).

Application in A2PPO

In the A2PPO algorithm, GAE is used for advantage estimation with the following hyperparameter settings:

ParameterValueMeaning
γ\gammaγ0.99Discount factor
λ\lambdaλ (GAE-λ\lambdaλ)0.915GAE parameter

In A2PPO's ablation experiments, the combination of GAE with the attention mechanism produces more stable policy gradient estimates, significantly outperforming Vanilla PPO (final reward 1071.41±7.751071.41 \pm 7.751071.41±7.75 vs 344.87±563.71344.87 \pm 563.71344.87±563.71).

GAE's Variance Control Mechanism

GAE's variance control stems from its finite memory property: distant future TD residuals decay exponentially as (γλ)k(\gamma\lambda)^k(γλ)k. More importantly, GAE's variance is positively correlated with λ\lambdaλ — increasing λ\lambdaλ increases estimation bias but reduces variance, as more reliance is placed on actual cumulative returns.

Related Concepts

  • A2PPO (Attention-Augmented PPO): The application framework for GAE in cislunar trajectory optimization
  • Low-Thrust Transfer MDP: The RL problem formulation that GAE serves

References

  • Schulman J, Moritz P, Levine S, et al. High-dimensional continuous control using generalized advantage estimation[J]. arXiv:1512.04455, 2015.
  • Ul Haq I U, Dai H, Du C. Autonomous low-thrust trajectory optimization in cislunar space via attention-augmented reinforcement learning[J]. Aerospace Science and Technology, 2026.
Improve this page
Last Updated: 4/29/26, 11:30 AM
Contributors: Hermes Agent, Cron Job
Prev
Fuzzy Backstepping Control
Next
Gaussian Process Regression
地月空间入门指南
Cislunar Space Beginner's GuideYour guide to cislunar space
View on GitHub

Navigate

  • Home
  • About
  • Space News
  • Glossary

Content

  • Cislunar Orbits
  • Research
  • Resources
  • Blue Team

English

  • Home
  • About
  • Space News
  • Glossary

Follow Us

© 2026 Cislunar Space Beginner's Guide  |  湘ICP备2026006405号-1
Related:智慧学习助手 UStudy航天任务工具箱 ATK
支持我
鼓励和赞赏我感谢您的支持