Aiming at the problem of precise control of the sampling manipulator in the lunar surface sampling mission of "Chang'E-5", a path planning method based on deep reinforcement learning is proposed. By designing the multi-constraint reward function of the deep reinforcement learning algorithm, a motion path that satisfies the three constraints of safety, speed and reachability is planned. The precise control of the sampling robotic arm is realized. Under the advance of meeting the task safety, the interaction time between heaven and earth is greatly shortened, and the control effect of the manipulator is more stable. Experimental results show that this method has high accuracy and robustness, and can provide reference for subsequent on orbit sampling tasks.