[13] 吴金金,刘全,陈松,等.一种权重平均值的深度双 Q 网络方法[J].计算机研究与发展,2020,57( 3) : 576 -589.
[14] Van HASSELT H,GUEZ A,SILVER D. Deep reinforcement learning with double Q-Learning[C]/ /Proceedings of the 30th AAAI Conference on Artificial In-telligence. Phoenix: AAAI,2016: 2094-2100.
[15] PETERS J,SCHAAL S.Natural actor-critic[J].Neurocomputing,2008,71( 7 /8 /9) : 1180-1190.
[16] LILLICRAP T P,HUNT J J,PRITZEL A. Continuous control with deep reinforcement learning[EB/OL]. ( 2019- 06 - 05) [2021 - 09 - 09]. https: / /arxiv. org / abs/1509.02971.
[17] SILVER D, LEVER G, HEESS N, et al. Deterministic policy gradient algorithms [C]/ /Proceedings of the 31st International Conference on International Conference on Machine Learning. New York: ACM,2014: 387-395.
[18] FUJIMOTO S,VAN HOOF H,MEGER D. Addressing function approximation error in actor-critic methods [EB/OL]. ( 2018-03-11) [2021-08-04].https: / / arxiv.org /abs/1802. 09477.
[19] 刘全,翟建伟,章宗长,等.深度强化学习综述[J].计 算机学报,2018,41( 1) : 1-27.
[20] SUTTON R S,MCALLESTER D,SINGH S,et al.Policy gradient methods for reinforcement learning with function approximation [C]/ /Advances in Neural Information Processing Systems 12.Boston: MIT,2000: 1057-1063.