[1]黄万伟,郑向雨,张超钦,等.基于深度强化学习的智能路由技术研究[J].郑州大学学报(工学版),2023,44(01):44-51.[doi:10.13705/j.issn.1671-6833.2022.04.018]
 HUANG Wanwei,ZHENG Xiangyu,ZHANG Chaoqin,et al.Research on Intelligent Routing Technology Based on Deep Reinforcement Learning[J].Journal of Zhengzhou University (Engineering Science),2023,44(01):44-51.[doi:10.13705/j.issn.1671-6833.2022.04.018]
点击复制

基于深度强化学习的智能路由技术研究()
分享到:

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

卷:
44
期数:
2023年01期
页码:
44-51
栏目:
出版日期:
2022-12-06

文章信息/Info

Title:
Research on Intelligent Routing Technology Based on Deep Reinforcement Learning
作者:
黄万伟1 郑向雨1 张超钦2 王苏南3 张校辉4
1.郑州轻工业大学软件学院,河南郑州 450001, 2.郑州轻工业大学计算机与通信工程学院,河南郑州 450001, 3.深圳职业技术学院电子与通信工程学院,广东 深圳 518055, 4.河南信安通信技术股份有限公司,河南 郑州450001

Author(s):
HUANG Wanwei1 ZHENG Xiangyu1 ZHANG Chaoqin2 WANG Sunan3 ZHANG Xiaohui4
1.College of Software Engineering, Zhengzhou University of Light Industry, Zhengzhou 450001, China; 2.College of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450001, China; 3.School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen 518055, China; 4.Henan Xin′an Communication Technology Co., Ltd., Zhengzhou 450001, China
关键词:
Keywords:
quality of experience software defined network deep reinforcement learning routing algorithms recurrent deterministic policy gradient
分类号:
TP393
DOI:
10.13705/j.issn.1671-6833.2022.04.018
文献标志码:
A
摘要:
针对现有智能路由算法收敛速度慢、平均时延高、带宽利用率低等问题,提出了一种基于深度强化学习 ( DRL) 的多路径智能路由算法RDPG-Route。该算法采用循环确定性策略梯度( RDPG) 作为训练框架,引入长短期 记忆网络( LSTM) 作为神经网络,基于RDPG 处理高纬度问题的算法优势,以及LSTM 循环核中记忆体的存储能 力,将动态变化的网络状态输入神经网络进行训练。算法训练收敛后,将神经网络输出的动作值作为网络链路权 重,基于多路径路由策略进行流量划分,以实现网络路由的智能动态调整。最后,将RDPG-Route 路由算法分别与 ECMP、DRL-TE 和DRL-R-DDPG 路由算法进行对比。结果表明,RDPG-Route 具有较好的收敛性和有效性,相比于 其他智能路由算法至少降低了7. 2%平均端到端时延,提高了6. 5%吞吐量,减少了8. 9%丢包率和6. 3%的最大链 路利用率。
Abstract:
To solve the problems of slow convergence speed, high average delay, and low bandwidth utilization of existing intelligent routing algorithms, in this study, a multi-path intelligent routing algorithm RDPG-Route based on deep reinforcement learning (DRL) was proposed. In the algorithm, the recurrent determi-nistic policy gradient (RDPG) was used as the training framework, the long short-term memory (LSTM) was introduced as the neural network. The algorithm advantages of RDPG were used to handle high-latitude problems and the storage capacity of the memory in the LSTM loop core, the dynamically changing network state could be input to the neural network for training. After the algorithm training converged, the action value output by the neural network was used as the network link weight, and the traffic was divided based on the multi-path routing strategy to realize the intelligent dynamic adjustment of the network routing. Finally, RDPG-Route routing algorithm was compared with ECMP, DRL-TE, and DRL-R-DDPG routing algorithms respectively. The results indicated that RDPG-Route had better convergence and effectiveness. Compared with other optimal intelligent routing algorithm, RDPG-Route could reduce the average end-to-end delay by at least 7.2%, improve the throughput by 6.5%, and reduce the packet loss rate by 8.9% and the maximum link utilization rate by 6.3%.

参考文献/References:

[1] 刘振鹏, 王鑫鹏, 李明, 等. 基于时延和负载均衡的 多控制器 部 署 策 略 [ J] . 郑 州 大 学 学 报 ( 工 学 版) , 2021, 42(3) : 19-25, 32. 

LIU Z P, WANG X P, LI M, et al. Multi-controller deployment strategy based on delay and load balancing[ J] . Journal of Zhengzhou university ( engineering science) , 2021, 42(3) : 19-25, 32. 
[2] SCHWARZMANN S, MARQUEZAN C C, TRIVISONNO R, et al. Accuracy vs. cost trade-off for machine learning based QoE estimation in 5G networks [ C] / / IEEE International Conference on Communications ( ICC) . Piscataway: IEEE, 2020:1-6
[3] LIU Y F, ZHAO B, ZHAO P Y, et al. A survey: typical security issues of software-defined networking[ J] . China communications, 2019, 16(7) : 13-31. 
[4] REZA M, JAVAD M, RAOUF S, et al. Network traffic classification using machine learning techniques over software defined networks[ J] . International journal of advanced computer science and applications, 2017, 8( 7) : 220-225.
[5] TANG F X, MAO B M, FADLULLAH Z M, et al. On removing routing protocol from future wireless networks: a real-time deep learning approach for intelligent traffic control [ J ] . IEEE wireless communications, 2018, 25 (1) : 154-160.
[6] RAO Z H, XU Y Y, PAN S M. A deep learning-based constrained intelligent routing method [ J] . Peer-to-peer networking and applications, 2021, 14(4) : 2224-2235. 
[7] LIU W X, CAI J, CHEN Q C, et al. DRL-R: deep reinforcement learning approach for intelligent routing in software-defined data-center networks[ J] . Journal of network and computer applications, 2021, 177: 102865.
[8] CHEN B, SUN P H, ZHANG P, et al. Traffic engineering based on deep reinforcement learning in hybrid IP / SR network [ J ] . China communications, 2021, 18 ( 10 ) : 204-213. 
[9] 王丙琛, 司怀伟, 谭国真. 基于深度强化学习的自动 驾驶车控制算法研究[ J] . 郑州大学学报( 工学版) , 2020, 41(4) : 41-45, 80. 
WANG B C, SI H W, TAN G Z. Research on autopilot control algorithm based on deep reinforcement learning [ J] . Journal of Zhengzhou university ( engineering science) , 2020, 41(4) : 41-45, 80. 
[10] HEESS N, HUNT J J, LILLICRAP T P, et al. Me-morybased control with recurrent neural networks [ EB / OL] . (2015- 12 - 14 ) [ 2021 - 10 - 20 ] . https: / / arxiv. org / abs/ 1512. 04455v1. 
[11] XI L, WU J N, XU Y C, et al. Automatic generation control based on multiple neural networks with actor-critic strategy[J]. IEEE transactions on neural networks and learning systems, 2021, 32(6): 2483-2493. 
[12] FANG L L, LI X Y, WU Y R, et al. Deep recurrent Qlearning method for single intersection signal control [ C] / / 13th Asia Pacific Transportation Development Conference. Reston, USA: ASCE, 2020: 148-156. 
[13] YAO Z, WANG Y, MENG L M, et al. DDPG-based energy-efficient flow scheduling algorithm in software-defined data centers[ J] . Wireless communications and mobile computing, 2021, 2021: 6629852.
[14] 李琳, 李玉泽, 张钰嘉, 等. 基于多估计器平均值的 深度确定性策略梯度算法[ J] . 郑州大学学报( 工学 版) , 2022, 43(2) : 15-21. 
LI L, LI Y Z, ZHANG Y J, et al. Deep deterministic policy gradient algorithm based on mean of multiple estimators[ J] . Journal of Zhengzhou university ( engineering science) , 2022, 43(2) : 15-21. 
[15] LI S, LI W Q, COOK C, et al. Independently recurrent neural network ( IndRNN) : building a longer and deeper RNN[C] / / 2018 IEEE / CVF Conference on Computer Vision and Pattern Recognition. Psicataway: IEEE, 2018: 5457-5466.
[16] SHERSTINSKY A. Fundamentals of recurrent neural network (RNN) and long short-term memory ( LSTM) network[ J] . Physica D: nonlinear phenomena, 2020, 404: 132306. 
[17] WEHRLE K, GÜNEŞ M, GROSS J. Modeling and tools for network simulation[M] . Berlin: Springer-Verlag Berlin Heidelberg, 2010.
[18] PATHAK S, MANI A, SHARMA M, et al. A novel salp swarm algorithm for controller placement problem [ J ] . Trends in computational intelligence, security and Internet of Things, 2020, 1358:24-36.
[19] BULL P, MURPHY S, BRUNO JUNIOR N, et al. A flow analysis and preemption fra

更新日期/Last Update: 2022-12-07