Using Large Language Models to Shape Social Robots’ Speech.

Javier Sevilla Salcedo; Enrique Fernández Rodicio; Laura Martín Galván; Álvaro Castro González; José C. Castillo; Miguel A. Salichs

doi:10.9781/ijimai.2023.07.008

Authors

Javier Sevilla Salcedo Universidad Carlos III de Madrid
Enrique Fernández Rodicio Universidad Carlos III de Madrid
Laura Martín Galván Universidad Carlos III de Madrid
Álvaro Castro González Universidad Carlos III de Madrid
José C. Castillo Universidad Carlos III de Madrid
Miguel A. Salichs Universidad Carlos III de Madrid

DOI:

https://doi.org/10.9781/ijimai.2023.07.008

Keywords:

Human-Robot Interaction, Large Language Models, Social Robots

Supporting Agencies

The research leading to these results has received funding from the projects: Robots sociales para mitigar la soledad y el aislamiento en mayores (SOROLI), PID2021-123941OA-I00, funded by Agencia Estatal de Investigación (AEI), Spanish Ministerio de Ciencia e Innovación. Robots sociales para reducir la brecha digital de las personas mayores (SoRoGap), TED2021-132079B-I00, funded by Agencia Estatal de Investigación (AEI), Spanish Ministerio de Ciencia e Innovación. This publication is part of the R&D&I project PLEC2021-007819 funded by MCIN/AEI/10.13039/501100011033 and by the European Union NextGenerationEU/PRTR. This work has been supported by the Madrid Government (Comunidad de Madrid-Spain) under the Multiannual Agreement with UC3M (“Fostering Young Doctors Research”, SMM4HRI-CM-UC3M), and in the context of the V PRICIT (Research and Technological Innovation Regional Programme).

Abstract

Social robots are making their way into our lives in different scenarios in which humans and robots need to communicate. In these scenarios, verbal communication is an essential element of human-robot interaction. However, in most cases, social robots’ utterances are based on predefined texts, which can cause users to perceive the robots as repetitive and boring. Achieving natural and friendly communication is important for avoiding this scenario. To this end, we propose to apply state-of- the-art natural language generation models to provide our social robots with more diverse speech. In particular, we have implemented and evaluated two mechanisms: a paraphrasing module that transforms the robot’s utterances while keeping their original meaning, and a module to generate speech about a certain topic that adapts the content of this speech to the robot’s conversation partner. The results show that these models have great potential when applied to our social robots, but several limitations must be considered. These include the computational cost of the solutions presented, the latency that some of these models can introduce in the interaction, the use of proprietary models, or the lack of a subjective evaluation that complements the results of the tests conducted.

Downloads

Download data is not yet available.

References

L. Laranjo, A. G. Dunn, H. L. Tong, A. B. Kocaballi, J. Chen, R. Bashir, D. Surian, B. Gallego, F. Magrabi, A. Y. S. Lau, E. Coiera, “Conversational agents in healthcare: a systematic review,” Journal of the American Medical Informatics Association, vol. 25, pp. 1248–1258, Sept. 2018.

J. Cassell, J. Sullivan, E. Churchill, S. Prevost, Embodied Conversational Agents. MIT Press, 2000.

L. Clark, N. Pantidi, O. Cooney, P. Doyle, D. Garaialde, J. Edwards, B. Spillane, E. Gilmartin, C. Murad, C. Munteanu, V. Wade, B. R. Cowan, “What Makes a Good Conversation? Challenges in Designing Truly Conversational Agents,” in Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, CHI ’19, May 2019, pp. 1–12.

V. Klingspor, Y. Demiris, M. Kaiser, “Human- robot-communication and machine learning,” Applied Artificial Intelligence, vol. 11, 03 1999.

C. Clavel, Z. Callejas, “Sentiment Analysis: From Opinion Mining to Human-Agent Interaction,” IEEE Transactions on Affective Computing, vol. 7, pp. 74–93, Jan. 2016.

J. Woo, J. Botzheim, N. Kubota, “Conversation system for natural communication with robot partner,” in 2014 10th France-Japan/ 8th Europe-Asia Congress on Mecatronics (MECATRONICS2014- Tokyo), Nov. 2014.

A. Fujita, A. Kameda, A. Kawazoe, Y. Miyao, “Overview of Todai robot project and evaluation framework of its NLP-based problem solving,” in Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), Reykjavik, Iceland, May 2014, pp. 2590–2597, European Language Resources Association (ELRA).

I. A. Hameed, “Using natural language processing (NLP) for designing socially intelligent robots,” in 2016 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), Sept. 2016.

T. Williams, “A consultant framework for natural language processing in integrated robot architectures,” IEEE Intelligent Informatics Bulletin, vol. 18, pp. 10–14, 2017.

W. Kahuttanaseth, A. Dressler, C. Netramai, “Commanding mobile robot movement based on natural language processing with RNN encoderdecoder,” in 2018 5th International Conference on Business and Industrial Research (ICBIR), May 2018, pp. 161–166.

W. Budiharto, V. Andreas, A. A. S. Gunawan, “Deep learning-based question answering system for intelligent humanoid robot,” Journal of Big Data, vol. 7, p. 77, Dec. 2020, doi: 10.1186/s40537-020-00341-6.

M. Seo, A. Kembhavi, A. Farhadi, H. Hajishirzi, “Bidirectional Attention Flow for Machine Comprehension,” arXiv:1611.01603 [cs], June 2018. arXiv: 1611.01603.

S. Arroni, Y. Galán, X. Guzmán-Guzmán, E. R. Núñez-Valdez, A. Gómez, “Sentiment analysis and classification of hotel opinions in twitter with the transformer architecture,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 8, no. 1, pp. 53-63, 2023.

J. Zhou, T. Li, S. J. Fong, N. Dey, R. González-Crespo, “Exploring chatgpt’s potential for consultation, recommendations and report diagnosis: Gastric cancer and gastroscopy reports’ case,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 8, no. 2, pp. 7-13, 2023.

M. Rheu, J. Y. Shin, W. Peng, J. Huh-Yoo, “Systematic Review: TrustBuilding Factors and Implications for Conversational Agent Design,” International Journal of Human–Computer Interaction, vol. 37, pp. 81–96, Jan. 2021.

R. B. Miller, “Response time in man-computer conversational transactions,” in Proceedings of the December 9-11, 1968, fall joint computer conference, part I, 1968, pp. 267–277.

T. Shiwa, T. Kanda, M. Imai, H. Ishiguro, N. Hagita, “How quickly should communication robots respond?,” in 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI), 2008, pp. 153–160, IEEE.

R. R. Murphy, T. Nomura, A. Billard, J. L. Burke, “Human–Robot Interaction,” IEEE Robotics Automation Magazine, vol. 17, pp. 85–89, June 2010, doi: 10.1109/MRA.2010.936953. Conference Name: IEEE Robotics Automation Magazine.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, “Attention is All you Need,” in Advances in Neural Information Processing Systems, vol. 30, 2017, Curran Associates, Inc.

I. Sutskever, O. Vinyals, Q. V. Le, “Sequence to Sequence Learning with Neural Networks,” in Advances in Neural Information Processing Systems, vol. 27, 2014, Curran Associates, Inc.

“The Illustrated Transformer – Jay Alammar – Visualizing machine learning one concept at a time.” [Online]. Available: https://jalammar.github.io/illustrated-transformer/

D. Rothman, Transformers for Natural Language Processing: Build innovative deep neural network architectures for NLP with Python, PyTorch, TensorFlow, BERT, RoBERTa, and more. Packt Publishing Ltd, Jan. 2021. Google-Books-ID: Cr0YEAAAQBAJ.

A. M. P. Bras¸oveanu, R. Andonie, “Visualizing Transformers for NLP: A Brief Survey,” in 2020 24th International Conference Information Visualisation (IV), Sept. 2020, pp. 270–279. ISSN: 2375-0138.

K. Weiss, T. M. Khoshgoftaar, D. Wang, “A survey of transfer learning,” Journal of Big Data, vol. 3, p. 9, May 2016.

A. Radford, K. Narasimhan, T. Salimans, I. Sutskever, “Improving Language Understanding by Generative Pre-Training,” OpenAI, p. 12, 2018.

C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, P. J. Liu, “Exploring the Limits of Transfer Learning with a Unified Text-to- Text Transformer,” arXiv:1910.10683 [cs, stat], July 2020. arXiv: 1910.10683.

L. Floridi, M. Chiriatti, “GPT-3: Its Nature, Scope, Limits, and Consequences,” Minds and Machines, vol. 30, pp. 681–694, Dec. 2020, doi: 10.1007/s11023-020-09548-1.

C. Stevenson, I. Smal, M. Baas, R. Grasman, H. van der Maas, “Putting GPT-3’s Creativity to the (Alternative Uses) Test,” in International Conference on Innovative Computing and Cloud Computing, 2022, arXiv. Version Number: 1.

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei, “Language Models are Few-Shot Learners,” ArXiv, May 2020.

J. Liu, D. Shen, Y. Zhang, B. Dolan, L. Carin, W. Chen, “What makes good in-context examples for gpt-3?,” arXiv preprint arXiv:2101.06804, 2021.

G. Poesia, O. Polozov, V. Le, A. Tiwari, G. Soares, C. Meek, S. Gulwani, “Synchromesh: Reliable code generation from pre-trained language models,” arXiv preprint arXiv:2201.11227, 2022.

T. Goyal, J. J. Li, G. Durrett, “News summarization and evaluation in the era of gpt-3,” arXiv preprint arXiv:2209.12356, 2022.

C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, P. J. Liu, et al., “Exploring the limits of transfer learning with a unified text-to-text transformer.,” The Journal of Machine Learning Research, vol. 21, no. 140, pp. 1–67, 2020.

T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, et al., “Transformers: State-of-the-art natural language processing,” in Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, 2020, pp. 38–45.

L. Xue, N. Constant, A. Roberts, M. Kale, R. Al- Rfou, A. Siddhant, A. Barua, C. Raffel, “mT5: A massively multilingual pre-trained text-to-text transformer,” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, June 2021, pp. 483–498, Association for Computational Linguistics.

Y. Yang, Y. Zhang, C. Tar, J. Baldridge, “PAWS- X: A cross-lingual adversarial dataset for paraphrase identification,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, Nov. 2019, pp. 3687–3692, Association for Computational Linguistics.

J. Zhang, Y. Zhao, M. Saleh, P. J. Liu, “Pegasus: Pre- training with extracted gap-sentences for abstractive summarization,” in Proceedings of the 37th International Conference on Machine Learning, ICML’20, 2020, JMLR.org.

J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.

J. Canete, G. Chaperon, R. Fuentes, J.-H. Ho, H. Kang, J. Pérez, “Spanish pre-trained bert model and evaluation data,” Pml4dc at iclr, vol. 2020, pp. 1–10, 2020.

X. Zhao, B. F. Malle, “Spontaneous perspective taking toward robots: The unique impact of humanlike appearance,” Cognition, vol. 224, p. 105076, July 2022, doi: 10.1016/j.cognition.2022.105076.

M. A. Salichs, A. Castro, E. Salichs, E. Fernandez, M. Maroto, J. J. Gamboa, S. Marques, J. C. Castillo, F. Alonso, M. Malfaz, “Mini: A New Social Robot for the Elderly,” International Journal of Social Robotics, vol. 12, pp. 1231–1249, Dec. 2020, doi: 10.1007/s12369-020-00687-0.

K. Papineni, S. Roukos, T. Ward, W.-J. Zhu, “Bleu: a method for automatic evaluation of machine translation,” in Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 2002, pp. 311–318.

T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, Y. Artzi, “BERTScore: Evaluating Text Generation with BERT,” in Eighth International Conference on Learning Representations, Apr. 2020.