Improved Fine-Tuned Reinforcement Learning From Human Feedback Using Prompting Methods for News Summarization.

Sini Raj Pulari; Maramreddy Umadevi; Shriram K. Vasudevan

doi:10.9781/ijimai.2025.02.001

Authors

Sini Raj Pulari Technology and Research.
Maramreddy Umadevi Technology and Research.
Shriram K. Vasudevan Software and Advanced Technology Group, Intel India Pvt. Ltd.

DOI:

https://doi.org/10.9781/ijimai.2025.02.001

Keywords:

Abstractive Summarization, Extractive Summarization, Natural Language Processing, News Summarization, Prompt Engineering, Reinforcement Learning From Human Feedback (RLHF)

Abstract

ChatGPT uses a generative pretrained transformer neural network model, which is under the larger umbrella of generative models. One major boom after ChatGPT is the advent of prompt engineering, which is the most critical part of ChatGPT that utilizes Large Language Models (LLM) and helps ChatGPT provide the desired outputs based on the style and tone of interactions carried out with it. Reinforcement learning from human feedback (RLHF) was used as the major aspect for fine-tuning LLM-based models. This work proposes a human selection strategy that is incorporated in the RLHF process to prevent undesirable consequences of the rightful choice of human reviewers for feedback. H-Rouge is a new metric proposed for humanized AI systems. A detailed evaluation of State-of-the-art summarization algorithms and prompt-based methods have been provided as part of the article. The proposed methods have introduced a strategy for human selection of RLHF models which employs multi-objective optimization to balance various goals encountered during the process with H-Rouge. This article will help nuance readers conduct research in the field of text summarization to start with prompt engineering in the summarization field, and future work will help them proceed in the right direction of research.

Downloads

Download data is not yet available.

References

K. I. Roumeliotis and N. D. Tselikas, “ChatGPT and Open-AI Models: A Preliminary Review,” Future Internet, vol. 15, no. 6, p. 192, 2023.

N. Wu, M. Gong, L. Shou, S. Liang, and D. Jiang, “Large language models are diverse role-players for summarization evaluation,” ArXiv preprint, arXiv:2303.15078, 2023.

P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig, “Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing,” ACM Computing Surveys, vol. 55, no. 9, pp. 1-35, 2023.

N. Ding, S. Hu, W. Zhao, Y. Chen, Z. Liu, H. T. Zheng, and M. Sun, “OpenPrompt: An open-source framework for prompt-learning,” ArXiv preprint, arXiv:2111.01998, 2021.

V. Deokar and K. Shah, “Automated Text Summarization of News Articles,” International Research Journal of Engineering and Technology, vol. 8, no. 9, pp. 1-13, 2021.

S. A. Bahrainian, S. Feucht, and C. Eickhoff, “NEWTS: a corpus for news topic-focused summarization,” ArXiv preprint, arXiv:2205.15661, 2022.

J. Wang, Z. Liu, L. Zhao, Z. Wu, C. Ma, S. Yu, and S. Zhang, “Review of large vision models and visual prompt engineering,” ArXiv preprint, arXiv:2307.00855, 2023.

J. Wang, E. Shi, S. Yu, Z. Wu, C. Ma, H. Dai, and S. Zhang, “Prompt engineering for healthcare: Methodologies and applications,” ArXiv preprint, arXiv:2304.14670, 2023.

V. Liu and L. B. Chilton, “Design guidelines for prompt engineering text-to-image generative models,” in Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, pp. 1-23, Apr. 2022.

Y. Zhou, A. I. Muresanu, Z. Han, K. Paster, S. Pitis, H. Chan, and J. Ba, “Large language models are human-level prompt engineers,” ArXiv preprint, arXiv:2211.01910, 2022.

T. Goyal, J. J. Li, and G. Durrett, “News summarization and evaluation in the era of GPT-3,” ArXiv preprint, arXiv:2209.12356, 2022.

H. Liu, C. Sferrazza, and P. Abbeel, “Languages are rewards: Hindsight finetuning using human feedback,” ArXiv preprint, arXiv:2302.02676, 2023.

N. Stiennon, L. Ouyang, J. Wu, D. Ziegler, R. Lowe, C. Voss, and P. F. Christiano, “Learning to summarize with human feedback,” Advances in Neural Information Processing Systems, vol. 33, pp. 3008–3021, 2020.

G. Wu, W. Wu, X. Liu, K. Xu, T. Wan, and W. Wang, “Cheap-fake Detection with LLM using Prompt Engineering,” ArXiv preprint, arXiv:2306.02776, 2023.

T. K. Gilbert, N. Lambert, S. Dean, T. Zick, A. Snoswell, and S. Mehta, “Reward reports for reinforcement learning,” in Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, pp. 84-130, Aug. 2023.

L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, and R. Lowe, “Training language models to follow instructions with human feedback,” Advances in Neural Information Processing Systems, vol. 35, pp. 27730–27744, 2022.

G. Zhu and C. A. Iglesias, “Computing semantic similarity of concepts in knowledge graphs,” IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 1, pp. 72–85, 2016.

H. T. Kung, F. Luccio, and F. P. Preparata, “On Finding the Maxima of a Set of Vectors,” Journal of the ACM, vol. 22, no. 4, pp. 469–476, 1975.

A. Rame, G. Couairon, M. Shukor, C. Dancette, J. B. Gaya, L. Soulier, and M. Cord, “Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards,” ArXiv preprint, arXiv:2306.04488, 2023.

Y. Zhao, R. Joshi, T. Liu, M. Khalman, M. Saleh, and P. J. Liu, “Slic-hf: Sequence likelihood calibration with human feedback,” ArXiv preprint, arXiv:2305.10425, 2023.

P. J. A. Colombo, C. Clavel, and P. Piantanida, “Infolm: A new metric to evaluate summarization & data2text generation,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 10, pp. 10554-10562, Jun. 2022.

T. Oka, P. Patankar, S. Rege, and M. Dixit, “Text summarization of news articles,” in ICT Systems and Sustainability: Proceedings of ICT4SD 2021, Volume 1, pp. 441-450, Springer Singapore, 2022.

Y. Li, “Iterative improvements from feedback for language models,” ScienceOpen Preprints, 2023.

A. Ng, “Deep learning specialization,” DeepLearning.AI/Coursera, 2020.

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, and D. Amodei, “Language models are few-shot learners,” Advances in Neural Information Processing Systems, vol. 33, pp. 1877-1901, 2020.

Y. Liu, P. Liu, D. Radev, and G. Neubig, “BRIO: Bringing order to abstractive summarization,” ArXiv preprint, arXiv:2203.16804, 2022.

M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, and L. Zettlemoyer, “Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” ArXiv preprint, arXiv:1910.13461, 2019.

J. Zhang, Y. Zhao, M. Saleh, and P. Liu, “Pegasus: Pre-training with extracted gap-sentences for abstractive summarization,” in International Conference on Machine Learning, pp. 11328-11339, Nov. 2020.

W. S. El-Kassas, C. R. Salama, A. A. Rafea, and H. K. Mohamed, “Automatic text summarization: A comprehensive survey,” Expert Systems with Applications, vol. 165, p. 113679, 2021.

T. Wolf, “Huggingface’s transformers: State-of-the-art natural language processing,” ArXiv preprint, arXiv:1910.03771, 2019.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, ... I. Polosukhin, “Attention is all you need,” Advances in Neural Information Processing Systems, vol. 30, 2017.

Y. Bai, A. Jones, K. Ndousse, A. Askell, A. Chen, N. DasSarma, and J. Kaplan, “Training a helpful and harmless assistant with reinforcement learning from human feedback,” ArXiv preprint, arXiv:2204.05862, 2022.

M. M. Afsar, T. Crump, and B. Far, “Reinforcement learning based recommender systems: A survey,” ACM Computing Surveys, vol. 55, no. 7, pp. 1-38, 2022.

T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa, “Large language models are zero-shot reasoners,” Advances in Neural Information Processing Systems, vol. 35, pp. 22199-22213, 2022.

O. Ahuja, J. Xu, A. Gupta, K. Horecka, and G. Durrett, “ASPECTNEWS: Aspect-oriented summarization of news documents,” ArXiv preprint, arXiv:2110.08296, 2021.