«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j.issn.1671-6833.2022.03.001]
点击复制

基于图文注意力融合的主题标签推荐()

分享到：

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

卷:: 43卷
期数:: 2022年06期

页码:: 30-35

栏目:

出版日期:: 2022-09-02

文章信息/Info

Title:: Multimodal Hashtag Recommendation Based on Image and Text Attention Fusion

作者:: 冯皓楠; 何智勇; 马良荔; 中国人民解放军海军工程大学电子工程学院;

Author(s):: FENG Haonan; HE Zhiyong; MA Liangli; School of Electronic Engineering, Naval University of Engineering, Wuhan 430000， China

关键词:: 共注意力机制; 标签分类; 标签生成; 统一模型; 多模态推荐

Keywords:: co-attention mechanism; hashtag classification; hashtag generation; unified model; multimodalrecommendation

分类号:: TP301. 6；TP391. 1

DOI:: 10.13705/j.issn.1671-6833.2022.03.001

文献标志码:: A

摘要:: 为了解决社交媒体平台上的信息超载问题,帮助用户快速捕捉所需信息,对基于多模态内容的标签推荐问题进行研究。针对不同模态间的异质性差异,采用共注意力机制进行跨模态内容的特征建模与融合;针对多标签分类方法只能推荐出数据集标签空间中标签的不足,采用 Seq2Seq 框架生成新的标签序列,并通过一种聚合策略将分类方法的推荐结果聚合到生成的标签序列中,得到 2 种方法的统一推荐模型。在大规模数据集上的实验结果表明:多模态方法比单模态方法更具优势,所提出的统一推荐模型的F1 值比仅使用单模态的对比模型高 9. 44 百分点;生成新标签序列的方法也优于传统的分类方法,所提出的标签序列生成模型的 F1 值比对比模型 COA 高 3. 41 百分点;所提出的统一推荐模型 UNIFIED-CO-ATT 的 F1 值比 GEN-CO-ATT 模型高 1. 25 百分点,其效果优于其他对比模型。所提出的模型综合了分类方法和生成方法的特点,可以使推荐的标签同时具有准确性和新颖性。

Abstract:: In order to solve the information overload problem on social media platforms and help users quickly capture the required information, in this study the problem of hashtag recommendation based on multimodal content was investigated. To address the heterogeneous differences between different modalities, a co-attention mechanism was used to model and fuse features of cross-modal content, and use Seq2Seq framework was used to generate new hashtag sequences to address the deficiency that multi-label classification methods could only recommend hashtags in the hashtag space of the dataset. An aggregation strategy was used to aggregate the recommendation results of classification methods into the generated hashtag sequences to obtain a unified recommendation model for both methods. The experimental results on a large-scale dataset showed that, firstly, the multimodal approach was more advantageous than the unimodal approach, and the unified recommendation model proposed in this paper had 9.44 percentage points improvement in F1 value over the comparison model using unimodal approach, and 3.41 percentage points improvement over the comparison model using the classification method. Finally, the unified recommendation model UNIFIED-CO-ATT is 1.25 percentage points higher than GEN-CO-ATT in F1 values. The model proposed in this study could combine the advantages of classification and generation methods and could make the recommended hashtags have the advantages of accuracy and novelty at the same time.

参考文献/References:

[1] ZHANG Y Y,LI J,SONG Y,et al. Encoding conversation context for neural keyphrase extraction from microblog posts[C]∥Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg: Association for Computational Linguistics,2018:1676-1686.

[2] VINYALS O,TOSHEV A,BENGIO S,et al. Show and tell:a neural image caption generator[ C]∥2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE,2015:3156-3164.

[3] 何伟成. 基于图卷积神经网络的个性化标签推荐系统[D] . 广州:华南理工大学,2020.

HE W C. Personalized tag recommender system based on graphconvolutional neural network[D] . Guangzhou: South China University of Technology,2020.

[4] YANG Z C,HE X D,GAO J F,et al. Stacked attention networks for image question answering [ C ] ∥ 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway:IEEE,2016:21-29.

[5] 张素威. 社交网络多模态内容标签推荐技术研究 [D] . 南京:南京大学,2020. ZHANG S W. Research on hashtag recommendation for multimodal contents in social networks [ D] . Nanjing: Nanjing University,2020.

[6] LU J S,YANG J W,BATRA D,et al. Hierarchical question-Image co-attention for visual question answering [EB / OL]. (2017- 01- 19) [ 2021- 01- 06]. https:∥ arxiv. org / pdf / 1606. 00061. pdf%20.

[7] ZHANG Q,WANG J W,HUANG H R,et al. Hashtag recommendation for multimodal microblog using co-attention network[ C]∥Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence. Melbourne: International Joint Conferences on Artificial Intelligence Organization,2017:3420-3426.

[8] ZHANG Q,WANG Y,GONG Y Y,et al. Keyphrase extraction using deep recurrent neural networks on twitter [C]∥Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2016: 836-845.

[9] CHAN H P, CHEN W, WANG L, et al. Neural keyphrase generation via reinforcement learning with adaptive rewards [ C ] ∥Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics,2019:2163-2174.

[10] WANG Y,LI J,CHAN H P,et al. Topic-aware neural keyphrase generation for social media language [ C]∥ Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg:Association for Computational Linguistics, 2019: 2516 -2526.

[11] CHEN W,CHAN H P,LI P J,et al. An integrated approach for keyphrase generation via exploring the power of retrieval and extraction [ C ] ∥Proceedings of the 2019 Conference of the North. Stroudsburg: Association for Computational Linguistics,2019:2846-2856.

[12] WANG Y, LI J, LYU M, et al. Cross-media keyphrase prediction: a unified framework with multi-modality multi-head attention and image wordings[C]∥Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing ( EMNLP ). Stroudsburg: Association for Computational Linguistics, 2020: 3122-3132.

[13] MENG R,ZHAO S Q,HAN S G,et al. Deep keyphrase generation[C]∥Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg:Association for Computational Linguistics, 2017: 582-592.

[14] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [ EB / OL] . ( 2015- 04- 10) [ 2021- 01- 06] . https:∥www. researchgate. net / publication / 265385906_Very_Deep_ Convolutional_Networks_for_Large-Scale_Image _Recognition.

[15] BAHDANAU D, CHO K H, BENGIO Y. Neural machine translation by jointly learning to align and translate[ J] . Statistics,2014,3:1-15.

[16] SEE A,LIU P J,MANNING C D. Get to the point:summarization with pointer-generator networks [ C ] ∥Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Stroudsburg:Association for Computational Linguistics,2017:1073-1083.

[17] PENNINGTON J, SOCHER R, MANNING C. Glove: global vectors for word representation [ C] ∥Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg:Association for Computational Linguistics,2014:1532-1543.

更新日期/Last Update: 2022-10-03

《郑州大学学报(工学版)》[ISSN:1671-6833/CN:41-1339/T]

文章信息/Info

参考文献/References:

常用功能

导航/Navigate

工具/Tools

统计/Statistics