[1] JIA D, WEI D, RICHARD S, et al. Imagenet: a large-scale hierarchical image database[C]//2009 IEEE Conference on Computer Vision and Pattern Re-cognition. New York: IEEE, 2009: 248-255.[2] ZHOU B L, LPEDRIZA A, XIAO J X, et al. Learning deep features for scene recognition using places database[J]. Advances in neural information processing systems,2015,1: 487-495.
[3] ZHOU B, LAPEDRIZA A, KHOSLA A, et al. Places: a 10 million image database for scene recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2018, 40(6): 1452-1464.
[4] HOFFMAN J, GUPTA S, DARRELL T. Learning with side information through modality hallucination[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 826-834.
[5] WANG A, CAI J, LU J, et al. Modality and component aware feature fusion for RGB-D scene classification [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 5995-6004.
[6] VAPNIK V, VASHIST A. A new learning paradigm: learning using privileged information[J]. Neural networks, 2009, 22(5/6): 544-557.
[7] XIONG Z T, YUAN Y, WANG Q. MSN: modality separation networks for RGB-D scene recognition[J]. Neurocomputing, 2020, 373: 81-89.
[8] DU D, WANG L, WANG H, et al. Translate-to-recognize networks for RGB-D scene recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2019: 11836-11845.
[9] SHARMANSKA V, QUADRIANTO N, LAMPERT C H. Learning to rank using privileged information[C]//2013 IEEE International Conference on Computer Vision. New York: IEEE, 2013: 825-832.
[10] GARCIA N C, MORERIO P, MURINO V. Learning with privileged information via adversarial discriminative modality distillation[J]. IEEE transactions on pattern analysis and machine intelligence, 2020,42(10): 2581-2593.
[11] WANG F, JIANG M, QIAN C, et al. Residual attention network for image classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2017: 3156-3164.
[12] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2018: 7132-7141.
[13] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2016: 770-778.
[14] GUPTA S, GIRSHICK R, ARBELwidth=9,height=12,dpi=110EZ P, et al. Learning rich features from RGB-D images for object detection and segmentation[C]//European Conference on Computer Vision. Berlin: Springer, 2014: 345-360.
[15] KINGMA D P, BA J. Adam: a method for stochastic optimization[EB/OL]. (2014-12-22)[2020-05-15].https://arxiv.org/abs/1412.6980.
[16] SONG S, LICHTENBERG S P, XIAO J. Sun RGB-D: a RGB-D scene understanding benchmark suite[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 2015: 567-576.
[17] SILBERMAN N, HOIEM D, KOHLI P, et al. Indoor segmentation and support inference from RGBD images[C]//European Conference on Computer Vision. Berlin: Springer, 2012: 746-760.
[18] LIAO Y, KODAGODA S, WANG Y, et al. Understand scene categories by objects: A semantic regularized scene classifier using convolutional neural networks[C]//2016 IEEE International Conference on Robotics and Automation (ICRA). New York: IEEE, 2016: 2318-2325.
[19] SONG X, HERANZ L, JIANG S Q. Depth CNNs for RGB-D scene recognition: learning from scratch better than transferring from RGB-CNNs[EB/OL]. (2018-01-21)[2020-05-15]. https://arxiv.org/abs/1801.06797.
[20] SONG X H, JIANG S Q, HERRANZ L. Combining models from multiple sources for RGB-D scene recognition[C]// International Joint Conference on Artificial Intelligence. Melbourne,Australia:IJCAI,2017: 4523-4529.
[21] DU D, XU X, REN T, et al. Depth images could tell us more: enhancing depth discriminability for RGB-D scene recognition[C]//2018 IEEE International Conference on Multimedia and Expo (ICME). New York: IEEE, 2018: 1-6.
[22] LI Y B, ZHANG J G, CHENG Y H, et al. DF2Net: discriminative feature learning and fusion network for RGB-D indoor scene classification[C]//The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI). New Orleans: AAAI, 2018:7041-7048.