Dynamic Programming (Dynamic Programming)
Editor Source: 胡敏, 肖金伟, 张天天, 陶雪峰 (2026) "面向中高轨小卫星批量部署的轨道转移飞行器任务规划"
Bellman R. Dynamic Programming[M]. Princeton University Press, 2010.
Website: https://cislunarspace.cn
Definition
Dynamic Programming (DP) is a mathematical method for solving multi-stage decision process optimization by decomposing complex problems into interrelated subproblems, solving bottom-up and gradually constructing optimal solutions.
The core characteristics of dynamic programming are:
- Optimal substructure: Global optimal solution contains optimal solutions of local subproblems
- No aftereffect: Current state choices only affect subsequent decisions, not past choices
- Overlapping subproblems: Subproblems are repeatedly calculated multiple times and can be stored for reuse
Bellman's Principle of Optimality
Principle Statement
Bellman's Principle of Optimality is the theoretical foundation of dynamic programming:
"An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision."
Mathematical Expression
For the State-Dependent Traveling Salesman Problem (SDTSP), Bellman's equation is:
Where:
- : Minimum cumulative cost for planning node
- : Set of visited orbits
- : Current orbit
- : Cost from orbit to orbit (dependent on state )
Physical Significance
The optimal path to reach the current state must contain an optimal path to some predecessor state , then transit from orbit to orbit .
Application in Orbit Deployment
Problem Modeling
胡敏等 (2026) modeled the OTV batch deployment mission as SDTSP and solved it using dynamic programming:
Planning node: Binary uniquely determines state
- : Subset of visited orbits
- : Current orbit of OTV
State transition: After deploying a satellite, state updates
Where is the number of deployed satellites
Value function: represents the minimum cumulative cost from the starting orbit, visiting all orbits in set , and ending at orbit
Initial Subproblem
The basis of recursion is the smallest subproblem visiting only the starting orbit:
Final Return
After completing all deployments, the cost of returning to the starting orbit must be calculated:
Algorithm Implementation
State Compression Technique
To improve computational efficiency, 胡敏等 (2026) used state compression:
- Orbital subset is mapped to binary integer bitmask
- Set inclusion test becomes efficient bitwise operation
- Algorithm complexity reduces from exponential to polynomial
Algorithm Flow
Input: Total number of targets N, State-dependent cost matrix C, Starting orbit i_start
1. Initialize J to infinity, J[i_start, i_start] = 0
2. Iterate through all possible orbital subsets m and current node u
3. Update cumulative cost J[S_new, v]
4. Backtrack to extract optimal deployment sequence Π
Output: Optimal cost J and deployment sequence Π
Complexity Analysis
| Metric | Complexity | Description |
|---|---|---|
| Time | Iterate through all state combinations | |
| Space | Store optimal cost array |
Performance Comparison
Research results (胡敏等, 2026) show that in medium-orbit batch deployment missions:
| Algorithm | N=8 Propellant (kg) | N=12 Propellant (kg) | Optimality |
|---|---|---|---|
| DP (This method) | 487.7 | 632.1 | Exact optimal |
| GA | 507.8 | 632.1 | Heuristic |
| Greedy | 502.2 | 669.1* | Locally optimal |
*Exceeds initial propellant capacity
Dynamic programming performs optimally in medium-scale missions with N≤12, with advantages:
- Global optimality guarantee: Ensures finding the optimal deployment sequence
- High computational efficiency: CPU time only requires milliseconds
- Good robustness: Not affected by initial solutions
Comparison with Other Algorithms
| Characteristic | Dynamic Programming | Genetic Algorithm | Greedy Algorithm |
|---|---|---|---|
| Optimality | Exact optimal | Probabilistic convergence | Locally optimal |
| Time complexity | Adjustable | ||
| Space complexity | Medium | ||
| Applicable scale | N < 15 | Any scale | N < 10 |
Related Concepts
- State-Dependent TSP
- Bellman's Principle of Optimality
- Batch Deployment
- Mass Discontinuity
- Q-law Control Law
References
- 胡敏, 肖金伟, 张天天, 陶雪峰. 面向中高轨小卫星批量部署的轨道转移飞行器任务规划[J]. 航天器工程, 2026, 25(3): 634-646.
- Bellman R. Dynamic Programming[M]. Princeton: Princeton University Press, 2010.
- Narayanaswamy S, Wu B, Ludivig P, et al. Low-thrust rendezvous trajectory generation for multi-target active space debris removal using the RQ-law[J]. Advances in Space Research, 2023, 71(10): 4276-4287.
