Abstract:
Aiming at the characteristics of multi-system parallelism and the need to meet various constraints in the proceAiming at the characteristics of multi-system parallelism and the need to meet various constraints in the process of autonomous mission planning of deep space detectors, a reinforcement learning task autonomous planning model construction method for deep space detectors was proposed based on dynamic rewards, and a deep space detector agent was established. In the interactive environment, a policy network and a loss function integrating resource constraints, time constraints and timing constraints were constructed, and a dynamic reward mechanism was proposed to improve the traditional policy gradient learning method. The simulation results show that the method in this paper could realize autonomous task planning. Compared with the static reward policy gradient algorithm, the planning success rate and planning efficiency were significantly improved, and the method could start planning in any state without changing the model structure, which improved the accuracy of the algorithm. This method provides a new solution for autonomous mission planning and decision-making of deep space probes.