Optimal Target-Oriented Knowledge Transportation For Aspect-Based Multimodal Sentiment Analysis.

Linhao Zhang; Li Jin; Guangluan Xu; Xiaoyu Li; Xian Sun; Zequn Zhang; Yanan Zhang; Qi Li

doi:10.9781/ijimai.2024.02.005

Authors

Linhao Zhang Chinese Academy of Sciences.
Li Jin Chinese Academy of Sciences.
Guangluan Xu Chinese Academy of Sciences.
Xiaoyu Li Chinese Academy of Sciences.
Xian Sun Chinese Academy of Sciences.
Zequn Zhang Chinese Academy of Sciences.
Yanan Zhang Sichuan University.
Qi Li Beijing Normal University.

DOI:

https://doi.org/10.9781/ijimai.2024.02.005

Keywords:

Aspect-Based Multimodal Sentiment Analysis, Optimal Transport, Social Media Opinion Mining

Supporting Agencies

The work is supported by the National Natural Science Foundation of China (62206267).

Abstract

Aspect-based multimodal sentiment analysis under social media scenario aims to identify the sentiment polarities of each aspect term, which are mentioned in a piece of multimodal user-generated content. Previous approaches for this interdisciplinary multimodal task mainly rely on coarse-grained fusion mechanisms from the data-level or decision-level, which have the following three shortcomings:(1) ignoring the category knowledge of the sentiment target mentioned in the text) in visual information. (2) unable to assess the importance of maintaining target interaction during the unimodal encoding process, which results in indiscriminative representations considering various aspect terms. (3) suffering from the semantic gap between multiple modalities. To tackle the above challenging issues, we propose an optimal target-oriented knowledge transportation network (OtarNet) for this task. Firstly, the visual category knowledge is explicitly transported through input space translation and reformulation. Secondly, with the reformulated knowledge containing the target and category information, the target sensitivity is well maintained in the unimodal representations through a multistage target-oriented interaction mechanism. Finally, to eliminate the distributional modality gap by integrating complementary knowledge, the target-sensitive features of multiple modalities are implicitly transported based on the optimal transport interaction module. Our model achieves state-of-theart performance on three benchmark datasets: Twitter-15, Twitter-17 and Yelp, together with the extensive ablation study demonstrating the superiority and effectiveness of OtarNet.

Downloads

Download data is not yet available.

References

F. Huang, X. Zhang, Z. Zhao, J. Xu, Z. Li, “Image– text sentiment analysis via deep multimodal attentive fusion,” Knowledge-Based Systems, vol. 167, pp. 26–37, 2019.

N. Majumder, D. Hazarika, A. Gelbukh, E. Cambria, S. Poria, “Multimodal sentiment analysis using hierarchical fusion with context modeling,” Knowledge- based systems, vol. 161, pp. 124–133, 2018.

N. Xu, W. Mao, G. Chen, “Multi-interactive memory network for aspect based multimodal sentiment analysis,” in Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, vol. 33, 2019, pp. 371–378, AAAI Press.

J. YU, J. JIANG, “Adapting bert for target-oriented multimodal sentiment classification,” in Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019, pp. 5408–5414.

M. E. Basiri, M. Abdar, M. A. Cifci, S. Nemati, U. R. Acharya, “A novel method for sentiment classification of drug reviews using fusion of deep and machine learning techniques,” Knowledge-Based Systems, vol. 198, p. 105949, 2020.

H. Xu, B. Liu, L. Shu, S. Y. Philip, “Bert post-training for review reading comprehension and aspect-based sentiment analysis,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, 2019, pp. 2324–2335.

M. Li, L. Chen, J. Zhao, Q. Li, “Sentiment analysis of chinese stock reviews based on bert model,” Applied Intelligence, vol. 51, no. 7, pp. 5016–5024, 2021.

V. Pérez-Rosas, R. Mihalcea, L.-P. Morency, “Utterance- level multimodal sentiment analysis,” in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013, pp. 973–982.

V. P. Rosas, R. Mihalcea, L.-P. Morency, “Multimodal sentiment analysis of spanish online videos,” IEEE Intelligent Systems, vol. 28, no. 3, pp. 38–45, 2013.

F. Celli, B. Lepri, J.-I. Biel, D. Gatica-Perez, G. Riccardi, F. Pianesi, “The workshop on computational personality recognition 2014,” in Proceedings of the 22nd ACM international conference on Multimedia, 2014, pp. 1245–1246.

J. G. Ellis, B. Jou, S.-F. Chang, “Why we watch the news: a dataset for exploring sentiment in broadcast video news,” in Proceedings of the 16th international conference on multimodal interaction, 2014, pp. 104–111.

X. Yang, S. Feng, D. Wang, Y. Zhang, “Image- text multimodal emotion classification via multi-view attentional network,” IEEE Transactions on Multimedia, vol. 23, pp. 4014–4026, 2021.

F. Chen, Z. Yuan, Y. Huang, “Multi-source data fusion for aspect-level sentiment classification,” Knowledge- Based Systems, vol. 187, p. 104831, 2020.

W. An, F. Tian, P. Chen, Q. Zheng, “Aspect-based sentiment analysis with heterogeneous graph neural network,” IEEE Transactions on Computational Social Systems, 2022.

J. Wagner, P. Arora, S. Cortes, U. Barman, D. Bogdanova, J. Foster, L. Tounsi, “Dcu: Aspect- based polarity classification for semeval task 4,” SemEval 2014, p. 223, 2014.

B. Pang, L. Lee, S. Vaithyanathan, “Thumbs up? sentiment classification using machine learning techniques,” in Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, 2002, pp. 79–86.

M. S. Akhtar, D. Gupta, A. Ekbal, P. Bhattacharyya, “Feature selection and ensemble construction: A two- step method for aspect based sentiment analysis,” Knowledge-Based Systems, vol. 125, pp. 116–135, 2017.

L. Jiang, M. Yu, M. Zhou, X. Liu, T. Zhao, “Target- dependent twitter sentiment classification,” in Proceedings of the 49th annual meeting of the association for computational linguistics, 2011, pp. 151–160.

N. Liu, B. Shen, “Aspect-based sentiment analysis with gated alternate neural network,” Knowledge-Based Systems, vol. 188, p. 105010, 2020.

P. Chen, Z. Sun, L. Bing, W. Yang, “Recurrent attention network on memory for aspect sentiment analysis,” in Proceedings of the 2017 conference on empirical methods in natural language processing, 2017, pp. 452–461.

Y. Wang, M. Huang, X. Zhu, L. Zhao, “Attention- based lstm for aspect-level sentiment classification,” in Proceedings of the 2016 conference on empirical methods in natural language processing, 2016, pp. 606–615.

Q. Liu, H. Zhang, Y. Zeng, Z. Huang, Z. Wu, “Content attention model for aspect based sentiment analysis,” in Proceedings of the 2018 World Wide Web Conference, 2018, pp. 1023–1032.

D. Tang, B. Qin, T. Liu, “Aspect level sentiment classification with deep memory network,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016, pp. 214–224.

F. Fan, Y. Feng, D. Zhao, “Multi-grained attention network for aspect-level sentiment classification,” in Proceedings of the 2018 conference on empirical methods in natural language processing, 2018, pp. 3433–3442.

J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, 2019, pp. 4171–4186.

C. Sun, L. Huang, X. Qiu, “Utilizing bert for aspect- based sentiment analysis via constructing auxiliary sentence,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, 2019, pp. 380–385.

D. Borth, R. Ji, T. Chen, T. Breuel, S.-F. Chang, “Large- scale visual sentiment ontology and detectors using adjective noun pairs,” in Proceedings of the 21st ACM international conference on Multimedia, 2013, pp. 223– 232.

Y. Zhang, L. Shang, X. Jia, “Sentiment analysis on microblogging by integrating text and image features,” in Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2015, pp. 52–63, Springer.

M. Chen, S. Wang, P. P. Liang, T. Baltrušaitis, A. Zadeh, L.-P. Morency, “Multimodal sentiment analysis with word-level fusion and reinforcement learning,” in Proceedings of the 19th ACM International Conference on Multimodal Interaction, 2017, pp. 163–171.

H. Wang, A. Meghawat, L.-P. Morency, E. P. Xing, “Select-additive learning: Improving generalization in multimodal sentiment analysis,” in 2017 IEEE International Conference on Multimedia and Expo (ICME), 2017, pp. 949–954, IEEE.

Y. Yu, H. Lin, J. Meng, Z. Zhao, “Visual and textual sentiment analysis of a microblog using deep convolutional neural networks,” Algorithms, vol. 9, no. 2, p. 41, 2016.

A. Zadeh, P. P. Liang, S. Poria, P. Vij, E. Cambria, L. Morency, “Multi-attention recurrent network for human communication comprehension,” in Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018, pp. 5642–5649, AAAI Press.

F. Huang, K. Wei, J. Weng, Z. Li, “Attention-based modality-gated networks for image-text sentiment analysis,” ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), vol. 16, no. 3, pp. 1–19, 2020.

Z. Khan, Y. Fu, “Exploiting bert for multimodal target sentiment classification through input space translation,” in Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 3034–3042.

H.-Y. Lin, H.-H. Tseng, X. Lu, Y. Tsao, “Unsupervised noise adaptive speech enhancement by discriminator- constrained optimal transport,” Advances in NeuralInformation Processing Systems, vol. 34, pp. 19935–19946, 2021.

L. Chen, Z. Gan, Y. Cheng, L. Li, L. Carin, J. Liu, “Graph optimal transport for cross-domain alignment,” in International Conference on Machine Learning, 2020, pp. 1542–1553.

E. Grave, A. Joulin, Q. Berthet, “Unsupervised alignment of embeddings with wasserstein procrustes,” in The 22nd International Conference on Artificial Intelligence and Statistics, vol. 89, 2019, pp. 1880–1890.

T. T. Nguyen, A. T. Luu, “Improving neural cross- lingual abstractive summarization via employing optimal transport distance for knowledge distillation,” in Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022, pp. 11103–11111, AAAI Press.

J. Xu, H. Zhou, C. Gan, Z. Zheng, L. Li, “Vocabulary learning via optimal transport for neural machine translation,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021, pp. 7361–7373.

L. Chen, G. Wang, C. Tao, D. Shen, P. Cheng, X. Zhang, W. Wang, Y. Zhang, L. Carin, “Improving textual network embedding with global attention via optimal transport,” in Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019, pp. 5193– 5202.

S. Pramanick, A. Roy, V. M. Patel, “Multimodal learning using optimal transport for sarcasm and humor detection,” in IEEE/CVF Winter Conference on Applications of Computer Vision, 2022, pp. 546–556.

Z. Ge, S. Liu, Z. Li, O. Yoshie, J. Sun, “OTA: optimal transport assignment for object detection,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 303-312.

N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, S. Zagoruyko, “End-to-end object detection with transformers,” in 16th European Conference on Computer Vision, vol. 12346, 2020, pp. 213–229, Springer.

K. He, X. Zhang, S. Ren, J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.

D. Hendrycks, K. Gimpel, “Bridging nonlinearities and stochastic regularizers with gaussian error linear units,” CoRR, vol. abs/1606.08415, 2016.

G. Mialon, D. Chen, A. d’Aspremont, J. Mairal, “A trainable optimal transport embedding for feature aggregation and its relationship to attention,” in ICLR 2021-The Ninth International Conference on Learning Representations, 2021.

X. Li, L. Bing, W. Zhang, W. Lam, “Exploiting BERT for end-to-end aspect-based sentiment analysis,” in Proceedings of the 5th Workshop on Noisy User-generated Text, 2019, pp. 34–41.

D. Lu, L. Neves, V. Carvalho, N. Zhang, H. Ji, “Visual attention model for name tagging in multimodal social media,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018, pp. 1990– 1999.

Q. Zhang, J. Fu, X. Liu, X. Huang, “Adaptive co- attention network for named entity recognition in tweets,” in Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, vol. 32, 2018, AAAI Press.

D. Gu, J. Wang, S. Cai, C. Yang, Z. Song, H. Zhao, L. Xiao, H. Wang, “Targeted aspect-based multimodal sentiment analysis: An attention capsule extraction and multi-head fusion network,” IEEE Access, vol. 9, pp. 157329–157336, 2021.

F. Fan, Y. Feng, D. Zhao, “Multi-grained attention network for aspect-level sentiment classification,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018, pp. 3433–3442.

D. Q. Nguyen, T. Vu, A. T. Nguyen, “Bertweet: A pre-trained language model for english tweets,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020, pp. 9–14.

L. M. S. Khoo, H. L. Chieu, “Meta auxiliary labels with constituent-based transformer for aspect-based sentiment analysis,” 2020.