[1]王丙琛,司怀伟,谭国真.基于深度强化学习的自动驾驶车控制算法研究[J].郑州大学学报(工学版),2020,41(04):41-45.[doi:10.13705/j.issn.1671-6833.2020.04.002]
 WANG Bingchen,SI Huaiwei,TAN Guozhen.Research on Autopilot Control Algorithms Based on Deep Reinforcement Learning[J].Journal of Zhengzhou University (Engineering Science),2020,41(04):41-45.[doi:10.13705/j.issn.1671-6833.2020.04.002]
点击复制

基于深度强化学习的自动驾驶车控制算法研究()
分享到:

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

卷:
41
期数:
2020年04期
页码:
41-45
栏目:
出版日期:
2020-08-12

文章信息/Info

Title:
Research on Autopilot Control Algorithms Based on Deep Reinforcement Learning
作者:
王丙琛司怀伟谭国真
大连理工大学计算机科学与技术学院
Author(s):
WANG BingchenSI HuaiweiTAN Guozhen
School of Computer Science and Technology,Dalian University of Technology,Dalian 116000,China
关键词:
神经网络' target="_blank" rel="external">">神经网络强化学习自动驾驶DDPG算法actor-critic网络LSTM
Keywords:
neural networkreinforcement learningautopilotDDPG algorithmactor-critic networkLSTM
DOI:
10.13705/j.issn.1671-6833.2020.04.002
文献标志码:
A
摘要:
自动驾驶是人工智能研究的重要领域之一.本文提出了一种基于深度强化学习的自动驾驶策略学习算法.采用基于DDPG的强化学习算法进行模型的在线训练,使用真实的人类驾驶数据对Actor网络进行预训练,避免了在强化学习的初始阶段智能体从零学习的过程,加快了模型的收敛速度.同时为了让智能体更好的做出决策,学习对未来状况的预判,在Actor网络中加入LSTM预测机制,增强了模型的稳定性和泛化能力.通过与原始DDPG算法进行比较,本文所提算法的训练时间大大缩短,收敛速度加快,提升了模型的稳定性
Abstract:
In order to improve the learning efficiency of the autopilot car control algorithm based on reinforcement learning,this paper proposes an autopilot strategy learning algorithm DDPGwE (Deep Deterministic Policy Gridient with Expert,DDPGwE) combined with expert experience.DDPGwE used a DDPG-based reinforcement learning framework to conduct online training of the model;used real human driving data to pre-train the Actor network,and added an LSTM prediction mechanism to the Actor network to improve the prediction of the future status of autonomous vehicles.The experimental results in the simulation platform TORCS showed that Compared with the original DDPG algorithm,the algorithm proposed in this paper greatly reduced the training time and speedsed up the convergence speed,which improved the stability and generalization ability of the model.

参考文献/References:

[1] 左思翔.基于深度强化学习的无人驾驶智能决策控制研究[D].哈尔滨:哈尔滨工业大学,2018.

[2] TOURAN A,BRACKSTONE M A,MCDONALD M.A collision model for safety evaluation of autonomous intelligent cruise control[J].Accident analysis &prevention,1999,31(5):567-578.
[3] PADEN B,CAP M,YONG S Z,et al.A survey of motion planning and control techniques for self-driving urban vehicles[J].IEEE transactions on intelligent vehicles,2016,1(1):33-55.
[4] 夏伟,李慧云.基于深度强化学习的自动驾驶策略学习方法[J].集成技术,2017,6(3):29-34.
[5] 翁岳暄,多尼米克·希伦布兰德.汽车智能化的道路:智能汽车,自动驾驶汽车安全监管研究[J].科技与法律,2014 (4):632-655.
[6] GONZALEZ D,PEREZ J,MILANES V,et al.A review of motion planning techniques for automated vehicles[J].IEEE transactions.intelligent transportation systems,2016,17(4):1135-1145.
[7] HINTON G E,OSINDDRO S,TEH Y W.A fast learning algorithm for deep belief nets[J].Neural computation,2006,18(7):1527-1554.
[8] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet classification with deep convolutional neural networks[J].Advances in neural information processing systems,2012,25(2):1097-1105.
[9] BOJARSKI M,DEL TESTA D,DWORAKOWSKI D,et al.End to end learning for self-driving cars[EB/OL].(2016-3-25)[2019-09-31].https://arxiv.org/abs/1604.07316.
[10] CHEN C,SEFF A,KORNHAUSER A,et al.Deepdriving:Learning affordance for direct perception in autonomous driving[C]//Proceedings of the IEEE International Conference on Computer Vision.Santiago:IEEE,2015:2722-2730.
[11] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing atari with deep reinforcement learning[EB/OL].(2013-10-19)[2019-10-22].https://arxiv.org/abs/1312.5602.
[12] KONDA V R,TSITSIKLIS J N.Actor-critic algorithms[C]//Advances in Neural Information Processing Systems.[S.l.]:The MIT Press,2000:1008-1014.
[13] LILLICRAP T P,HUNT J J,PRITZEL A,et al.Continuous control with deep reinforcement learning[EB/OL].(2015-03-19)[2019-12-20].https://arxiv.org/abs/1509.02971.
[14] 刘赫.动物行为训练的理论基础[J].中国动物保健,2014,16(2):23-25.
[15] SILVER D,LEVER G,HEESS N,et al.Deterministic policy gradient algorithms[C]// Proceedings of the 31st International Conference on International Conference on Machine Learning.[S.l.]:JMLR,2014:387-395.
[16] GERS F A,SCHMIDHUBER J ,CUMMINS F.Learning to forget:continual prediction with LSTM[J].Neural computation,2000,12(10):2451-2471.
[17] WYMANN B,ESPIE E,GUIONNEAU C,et al.TORCS:the open racing car simulator[EB/OL].(2013-12-15)[2019-11-12].http://www.cse.chalmers.se/~chrdimi/papers/torcs.pdf.

相似文献/References:

[1]左敏,徐泽龙,张青川,等.基于双维度中文语义分析的食品领域知识库问答[J].郑州大学学报(工学版),2020,41(03):8.[doi:10.13705/j.issn.1671-6833.2020.02.003]
 Zuo Min,Xu Zelong,Zhang Qingchuan,et al.A question answering model over food domain knowledge base from two-dimensional Chinese semantic analysis[J].Journal of Zhengzhou University (Engineering Science),2020,41(04):8.[doi:10.13705/j.issn.1671-6833.2020.02.003]
[2]韩华强,陈生水,王占军,等.母岩变形特性差异对堆石料力学性质的影响[J].郑州大学学报(工学版),2020,41(03):67.[doi:10.13705/j.issn.1671-6833.2020.02.017]
 Hanwha StrongChen ShengshuiWang ZhanjunZheng ChengfengFu Hua.Influence of deformation characteristics of mother rock on mechanical properties of rockfill materials[J].Journal of Zhengzhou University (Engineering Science),2020,41(04):67.[doi:10.13705/j.issn.1671-6833.2020.02.017]
[3]张三川,苗帅宾.基于热仿真的动力电池箱结构紧凑化参数优化[J].郑州大学学报(工学版),2020,41(03):37.[doi:10.13705/j.issn.1671-6833.2019.05.020]
 Zhang Sanchuan,Miao Shuaibin.Research on Structural Parameter Design Based on Thermal Simulation for High Density Displacement Power Battery Box[J].Journal of Zhengzhou University (Engineering Science),2020,41(04):37.[doi:10.13705/j.issn.1671-6833.2019.05.020]
[4]乐金朝,张世兴,乐明静,等.不同损伤度和愈合温度条件下沥青胶浆自愈合行为研究[J].郑州大学学报(工学版),2020,41(04):12.[doi:10.13705/j.issn.1671-6833.2019.04.007]
 YUE Jinchao,ZHANG Shixing,YUE Mingjing,et al.Study on Self-healing Behavior of Asphalt Mastic under Different Damage Degree and Healing Temperature[J].Journal of Zhengzhou University (Engineering Science),2020,41(04):12.[doi:10.13705/j.issn.1671-6833.2019.04.007]
[5]欧阳海滨,全永彬,高立群,等.基于混合遗传粒子群优化算法的层次路径规划方法[J].郑州大学学报(工学版),2020,41(04):34.[doi:10.13705/j.issn.1671-6833.2020.01.011]
 OUYANG Haibin,QUAN Yongbin,GAO Liqun,et al.Hierarchical Path Planning Method for Mobile Robots Based on Hybrid Genetic Particle Swarm Optimization Algorithm[J].Journal of Zhengzhou University (Engineering Science),2020,41(04):34.[doi:10.13705/j.issn.1671-6833.2020.01.011]
[6]贺占蜀,陈雷,王武军,等.基于ABAQUS的中央电气接线盒温度场分析[J].郑州大学学报(工学版),2020,41(04):68.[doi:10.13705/j.issn.1671-6833.2020.04.001]
 HE Zhanshu,CHEN Lei,WANG Wujun,et al.Analysis of Temperature Field for Central Electric Junction Box Based on ABAQUS[J].Journal of Zhengzhou University (Engineering Science),2020,41(04):68.[doi:10.13705/j.issn.1671-6833.2020.04.001]
[7]徐刚,梁帅,刘武发,等.流动聚焦型微流控芯片微通道结构优化[J].郑州大学学报(工学版),2020,41(04):87.[doi:10.13705/j.issn.1671-6833.2020.04.003]
 XU Gang,LIANG Shuai,LIU Wufa,et al.Optimization of Micro-channel Structure of Flow Focusing Microfluidic Chip[J].Journal of Zhengzhou University (Engineering Science),2020,41(04):87.[doi:10.13705/j.issn.1671-6833.2020.04.003]
[8]邹卫华,刘鹏磊,刘秋节,等.磁性活性炭对水体中磺胺嘧啶钠的吸附机理研究[J].郑州大学学报(工学版),2020,41(04):92.[doi:10.13705/j.issn.1671-6833.2020.01.008]
 ZOU Weihua,LIU Penglei,LIU Qiujie,et al.Investigation into the Adsorption Mechanism of Sulfadiazine Sodium in Aqueous Solution Using Magnetic Biochar[J].Journal of Zhengzhou University (Engineering Science),2020,41(04):92.[doi:10.13705/j.issn.1671-6833.2020.01.008]
[9]严亚丹,李杨,仝佩.基于修正通行能力的出入口位置优化方法[J].郑州大学学报(工学版),2020,41(04):7.[doi:10.13705/j.issn.1671-6833.2020.01.009]
 YAN Yadan,LI Yang,TONG Pei.Optimization Method of Road Access Location Based on Modified Capacity[J].Journal of Zhengzhou University (Engineering Science),2020,41(04):7.[doi:10.13705/j.issn.1671-6833.2020.01.009]
[10]王复明,何 航,方宏远,等.交通和运行荷载耦合作用下管道承插口力学响应研究[J].郑州大学学报(工学版),2020,41(04):1.[doi:10.13705/j.issn.1671-6833.2020.01.012]
 WANG Fuming,HE Hang,FANG Hongyuan,et al.Mechanical Analysis of The bell-and-spigot Joints of Pipeline Under The Coupling of Traffic and Running Load[J].Journal of Zhengzhou University (Engineering Science),2020,41(04):1.[doi:10.13705/j.issn.1671-6833.2020.01.012]

更新日期/Last Update: 2020-10-06