[1]陈梦婷,王兴刚,刘文予.基于密集深度插值的3D人体姿态估计方法[J].郑州大学学报(工学版),2021,42(03):26.[doi:10.13705/j.issn.1671-6833.2021.03.005]
 Chen Mengting,Wang Xinggang,Liu Wenyu,et al.Dense Depth Interpolation for 3D Human Pose Estimation[J].Journal of Zhengzhou University (Engineering Science),2021,42(03):26.[doi:10.13705/j.issn.1671-6833.2021.03.005]
点击复制

基于密集深度插值的3D人体姿态估计方法()
分享到:

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

卷:
42
期数:
2021年03期
页码:
26
栏目:
出版日期:
2021-05-10

文章信息/Info

Title:
Dense Depth Interpolation for 3D Human Pose Estimation
作者:
陈梦婷 王兴刚 刘文予
华中科技大学电子信息与通信学院;
Author(s):
Chen Mengting; Wang Xinggang; Liu Wenyu;
School of Electronic Information and Communication of Huazhong University of Science and Technology;
关键词:
Keywords:
3D vision human pose estimation dense depth interpolation cross-domain generalization
DOI:
10.13705/j.issn.1671-6833.2021.03.005
文献标志码:
A
摘要:
3D人体姿态估计是计算机视觉任务中一直非常具有挑战的任务。由于样本标注难度大,往往只能获得有限场景下的离散关键点数据,给三维的预测带来了更大的挑战研究发现,虽然人体是一个非常灵活的结构,但是单个躯干可以看作刚体这意味着当只知道躯干两端的深度时,整个躯干的深度都可以通过密集插值得到估计值因此,提出了一种可以将每个躯干的密集深度插值特征图作为中间监督的方法该特征图为深度的估计提供了更加密集¸更加结构化的学习目标,而不仅仅是直接对离散关键点的深度进行回归。在数据集Human3.6M上的实验结果表明,该方法仅仅通过简单的网络结构,平均每个关节位置误差达到50.9mm在数据集MPI-INF-3DHP上进行的跨域实验进一步证明了模型强大的泛化能力。
Abstract:
The 3D human pose estimation is a challenging task in computer vision. Due to the difficulty of annotation, only some disperse key-point data form limited scenes are available, which makes 3D prediction a big challenge. In this paper, the human body is deemed as a flexible structure, but a specific limb can be viewed as a rigid-body. Given depths of two points on both ends, the depths of the whole limb can be estimated by dense interpretation. Therefore, this paper proposes a method that can take the dense depth interpretation feature map as middle supervision. It provides a denser and more structured target, instead of regression for disperse key-points directly. The MPJPG on Human3.6M reaches 50.9 mm with only a simple network structure. The cross-domain experiments on dataset MPI-INF-3DHP further show the generalization ability of the proposed method.

参考文献/References:

[1] 杨忠明, 李子龙, 胡音文, 等. 一种前景提取的行人模式识别检测算法[J]. 郑州大学学报(工学版), 2019, 40(5):91-96.

[2] IONESCU C, PAPAVA D, OLARU V, et al. Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments[J]. IEEE transactions on pattern analysis and machine intelligence, 2014, 36(7): 1325-1339.
[3] ANDRILUKA M, PISHCHULIN L, GEHLER P, et al. 2D human pose estimation: new benchmark and state of the art analysis[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014:3686-3693.
[4] MEHTA D, RHODIN H, CASAS D, et al. Monocular 3D human pose estimation in the wild using improved CNN supervision[C]// 7th IEEE International Conference on 3D Vision, 3DV. Piscataway: IEEE, 2017:506-516.
[5] PISHCHULIN L, ANDRILUKA M, GEHLER P, et al. Poselet conditioned pictorial structures[C]// 26th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2013:588-595.
[6] YANG Y, RAMANAN D. Articulated pose estimation with flexible mixtures-of-parts[C] // Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2011: 1385-1392.
[7] FERRARI V,MARwidth=4,height=11,dpi=110N-JIMÉNEZ M,ZISSERMAN A.2D human pose estimation in TV shows[J]. Statistical and geometrical approaches to visual motion analysis, 2009, 5064:128-147.
[8] TOSHEV A, SZEGEDY C. DeepPose: human pose estimation via deep neural networks[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2014:1653-1660.
[9] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778.
[10] NEWELL A, YANG K Y, DENG J. Stacked hourglass networks for human pose estimation[C] //European Conference on Computer Vision. Berlin: Springer, 2016:483-499.
[11] WEI S E, RAMAKRISHNA V, KANADE T, et al. Convolutional pose machines[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 4724-4732.
[12] CARREIRA J, AGRAWAL P, FRAGKIADAKI K, et al. Human pose estimation with iterative error feedback[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 4733-4742.
[13] LEE H J, CHEN Z. Determination of 3D human body postures from a single view[J]. Computer vision, graphics, and image processing, 1985, 30(2): 148-168.
[14] GUPTA A, MARTINEZ J, LITTLE J J, et al. 3D pose from motion for cross-view action recognition via non-linear circulant temporal encoding[C] //Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE,2014: 2601-2608.
[15] ROGEZ G, RIHAN J, RAMALINGAM S, et al. Randomized trees for human pose detection[C] //2008 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2008:1-8.
[16] PAVLAKOS G,ZHOU X W,DERPANIS K G,et al. Coarse-to-fine volumetric prediction for single-image 3D human pose[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE,2017:1263-1272.
[17] YANG W,OUYANG W L,WANG X L,et al.3D human pose estimation in the wild by adversarial learning[C]// Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018:5255-5264.
[18] ZHOU X W, ZHU M, PAVLAKOS G, et al. MonoCap: monocular human motion capture using a CNN coupled with a geometric prior[J]. IEEE transactions on pattern analysis and machine intelligence, 2019, 41(4): 901-914.
[19] TOME D,RUSSELL C,AGAPITO L.Lifting from the deep: convolutional 3D pose estimation from a single image[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway:IEEE, 2017:5689-5698.
[20] MARTINEZ J,HOSSAIN R,ROMERO J,et al. A simple yet effective baseline for 3D human pose estimation[C] //Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2640-2649.
[21] FANG H S,XU Y L,WANG W G,et al. Learning pose grammar to encode human body configuration for 3D pose estimation[EB/OL].(2017-10-17)[2020-10-30]. https://arxiv.org/abs/1710.06513.
[22] CHEN C H, RAMANAN D. 3D human pose estimation=2D pose estimation+ matching[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 5759-5767.
[23] ZHOU X, HUANG Q, SUN X, et al. Towards 3D human pose estimation in the wild: a weakly-supervised approach[C]// Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017:398-407.
[24] WANG J, HUANG S L, WANG X C, et al. Not all parts are created equal: 3D pose estimation by modeling bi-directional dependencies of body parts[C]//Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE,2019: 7771-7780.
[25] LEE K,LEE I,LEE S.Propagating LSTM:3D pose estimation based on joint interdependency[C]// European Conference on Computer Vision-ECCV 2018. Berlin: Springer, 2018:123-141.

更新日期/Last Update: 2021-06-24