A Trustworthy Automated Short-Answer Scoring System Using a New Dataset and Hybrid Transfer Learning Method.

Martinus Maslim; Hei Chia Wang; Cendra Devayana Putra; Yulius Denny Prabowo

doi:10.9781/ijimai.2024.02.003

Authors

Martinus Maslim National Cheng Kung University
Hei Chia Wang National Cheng Kung University
Cendra Devayana Putra National Cheng Kung University
Yulius Denny Prabowo Bina Nusantara University (Indonesia).

DOI:

https://doi.org/10.9781/ijimai.2024.02.003

Keywords:

Automated Short Answer Scoring, Hybrid Transfer Learning, Student Answer Dataset, Trustworthy System

Abstract

To measure the quality of student learning, teachers must conduct evaluations. One of the most efficient modes of evaluation is the short answer question. However, there can be inconsistencies in teacher-performed manual evaluations due to an excessive number of students, time demands, fatigue, etc. Consequently, teachers require a trustworthy system capable of autonomously and accurately evaluating student answers. Using hybrid transfer learning and student answer dataset, we aim to create a reliable automated short answer scoring system called Hybrid Transfer Learning for Automated Short Answer Scoring (HTL-ASAS). HTL-ASAS combines multiple tokenizers from a pretrained model with the bidirectional encoder representations from transformers. Based on our evaluation of the training model, we determined that HTL-ASAS has a higher evaluation accuracy than models used in previous studies. The accuracy of HTL-ASAS for datasets containing responses to questions pertaining to introductory information technology courses reaches 99.6%. With an accuracy close to one hundred percent, the developed model can undoubtedly serve as the foundation for a trustworthy ASAS system.

Downloads

Download data is not yet available.

References

Q. Aini, A. E. Julianto, and D. Purbohadi, “Development of a Scoring Application for Indonesian Language Essay Questions,” in Proceedings of the 2018 2nd International Conference on Education and E-Learning, 2018, pp. 6-10.

S. Burrows, I. Gurevych, and B. Stein, “The Eras and Trends of Automatic Short Answer Grading,” International Journal of Artificial Intelligence in Education, vol. 25, no. 1, pp. 60–117, Oct. 2014, doi: https://doi.org/10.1007/s40593-014-0026-8.

B. S. J. Kapoor et al., “An analysis of automated answer evaluation systems based on machine learning,” in 2020 International Conference on Inventive Computation Technologies (ICICT), IEEE, 2020, pp. 439-443.

D. Ramesh and S. K. Sanampudi, “An automated essay scoring systems: a systematic literature review,” Artificial Intelligence Review, Sep. 2021, doi: https://doi.org/10.1007/s10462-021-10068-2

F. F. Lubis et al., “Automated Short-Answer Grading using Semantic Similarity based on Word Embedding,” International Journal of Technology, vol. 12, no. 3, p. 571, Jul. 2021, doi: https://doi.org/10.14716/ijtech.v12i3.4651

M. Mohler, R. Bunescu, and R. Mihalcea, “Learning to grade short answer questions using semantic similarity measures and dependency graph alignments,” in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011, pp. 752-762.

A. Çınar, E. Ince, M. Gezer, and Ö. Yılmaz, “Machine learning algorithm for grading open-ended physics questions in Turkish,” Education and Information Technologies, Mar. 2020, doi: https://doi.org/10.1007/s10639-020-10128-0

A. Olowolayemo, S. D. Nawi, and T. Mantoro, “Short answer scoring in English grammar using text similarity measurement,” in 2018 International Conference on Computing, Engineering, and Design (ICCED), IEEE, 2018, pp. 131-136.

G. De Gasperis et al., “Automated grading of short text answers: preliminary results in a course of health informatics,” in Advances in Web-Based Learning–ICWL 2019: 18th International Conference, Magdeburg, Germany, September 23–25, 2019, Proceedings, Springer International Publishing, 2019, pp. 190-200.

S. Patil and K. P. Adhiya, “Automated Evaluation of Short Answers: a Systematic Review,” in Intelligent Data Communication Technologies and Internet of Things: Proceedings of ICICI 2021, 2022, pp. 953-963.

Y.-H. Park, Y.-S. Choi, C.-Y. Park, and K.-J. Lee, “EssayGAN: Essay Data Augmentation Based on Generative Adversarial Networks for Automated Essay Scoring,” Applied Sciences, vol. 12, no. 12, p. 5803, Jun. 2022, doi: https://doi.org/10.3390/app12125803

M. J. Gierl, S. Latifi, H. Lai, A.-P. Boulais, and A. De Champlain, “Automated essay scoring and the future of educational assessment in medical education,” Medical Education, vol. 48, no. 10, pp. 950–962, Sep. 2014, doi: https://doi.org/10.1111/medu.12517

S. H. Mijbel and A. T. Sadiq, “Short Answers Assessment Approach based on Semantic Network,” Iraqi Journal of Science, pp. 2702–2711, Jun. 2022, doi: https://doi.org/10.24996/ijs.2022.63.6.35

C. E. Kulkarni, R. Socher, M. S. Bernstein, and S. R. Klemmer, “Scaling short-answer grading by combining peer assessment with algorithmic scoring,” in Proceedings of the first ACM conference on Learning@Scale Conference, 2014, pp. 99-108.

A. K. F. Lui, S. C. Ng, and S. W. N. Cheung, “A framework for effectively utilising human grading input in automated short answer grading,” International Journal of Mobile Learning and Organisation, vol. 16, no. 3, p. 266, 2022, doi: https://doi.org/10.1504/ijmlo.2022.124160

N. Süzen, A. N. Gorban, J. Levesley, and E. M. Mirkes, “Automatic short answer grading and feedback using text mining methods,” Procedia Computer Science, vol. 169, pp. 726-743, 2020.

S. Bonthu, S. R. Sree, and M. H. M. K. Prasad, “Automated short answer grading using deep learning: A survey,” in Machine Learning and Knowledge Extraction: 5th IFIP TC 5, TC 12, WG 8.4, WG 8.9, WG 12.9 International Cross-Domain Conference, CD-MAKE 2021, Virtual Event, August 17–20, 2021, Proceedings, vol. 5, 2021, pp. 61-78.

K. Anekboon, “Automated scoring for short answering subjective test in Thai’s language,” in 2018 International Conference on Image and Video Processing, and Artificial Intelligence, vol. 10836, SPIE, 2018, pp. 324-329.

M. Beseiso, O. A. Alzubi, and H. Rashaideh, “A novel automated essay scoring approach for reliable higher educational assessments,” Journal of Computing in Higher Education, Jun. 2021, doi: https://doi.org/10.1007/s12528-021-09283-1

J. Xiong, J. M. Wheeler, H. Choi, J. Lee, and A. S. Cohen, “An empirical study of developing automated scoring engine using supervised latent dirichlet allocation,” in Quantitative Psychology: The 85th Annual Meeting of the Psychometric Society, Virtual, Springer International Publishing, 2021, pp. 429-438.

P. Kudi, A. Manekar, K. Daware and T. Dhatrak, “Online Examination with short text matching,” 2014 IEEE Global Conference on Wireless Computing & Networking (GCWCN), 2014, pp. 56-60, doi: 10.1109/GCWCN.2014.6998787.

A. Condor, “Exploring automatic short answer grading as a tool to assist in human rating,” in Artificial Intelligence in Education: 21st International Conference, AIED 2020, Ifrane, Morocco, July 6–10, 2020, Proceedings, Part II, vol. 12164, Springer International Publishing, 2020, pp. 74-79.

S. Roy, Y. Narahari, and O. D. Deshmukh, “A perspective on computer assisted assessment techniques for short free-text answers,” in Computer Assisted Assessment. Research into E-Assessment: 18th International Conference, CAA 2015, Zeist, The Netherlands, June 22–23, 2015. Proceedings, vol. 18, Springer International Publishing, 2015, pp. 96-109.

X. Ye and S. Manoharan, “Machine Learning Techniques to Automate Scoring of Constructed-Response Type Assessments,” in 2018 28th EAEEIE Annual Conference (EAEEIE), IEEE, 2018, pp. 1-6.

Y. Kumar, S. Aggarwal, D. Mahata, R. R. Shah, P. Kumaraguru, and R. Zimmermann, “Get it scored using autosas—an automated system for scoring short answers,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 9662-9669.

P. Shweta and K. Adhiya, “Comparative Study of Feature Engineering for Automated Short Answer Grading,” in 2022 IEEE World Conference on Applied Intelligence and Computing (AIC), IEEE, 2022, pp. 594-597.

C. N. Tulu, O. Ozkaya, and U. Orhan, “Automatic Short Answer Grading With SemSpace Sense Vectors and MaLSTM,” IEEE Access, vol. 9, pp. 19270–19280, 2021, doi: https://doi.org/10.1109/access.2021.3054346

T. Sato, H. Funayama, K. Hanawa, and K. Inui, “Plausibility and Faithfulness of Feature Attribution-Based Explanations in Automated Short Answer Scoring,” presented at the International Conference on Artificial Intelligence in Education, 2022, Lecture Notes in Computer Science, vol 13355. Springer, Cham.

M. Heilman and N. Madnani, “The impact of training data on automated short answer scoring performance,” presented at the Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications, Denver, Colorado, 2015, pp. 81-85.

B. Riordan, A. Horbach, A. Cahill, T. Zesch, and C. Lee, “Investigating neural architectures for short answer scoring,” presented at the Proceedings of the 12th workshop on innovative use of NLP for building educational applications, Copenhagen, Denmark, 2017, pp. 159-168.

H. Funayama, T. Sato, Y. Matsubayashi, T. Mizumoto, J. Suzuki, and K. Inui, “Balancing Cost and Quality: An Exploration of Human-in-theLoop Frameworks for Automated Short Answer Scoring,” presented at the International Conference on Artificial Intelligence in Education, 2022, Lecture Notes in Computer Science, vol 13355. Springer, Cham.

C. Leacock and M. Chodorow, “C-rater: Automated scoring of short answer questions,” Computers and the Humanities, vol. 37, no. 4, pp. 389-405, 2003.

R. Siddiqi, C. J. Harrison, and R. Siddiqi, “Improving Teaching and Learning through Automated Short-Answer Marking,” IEEE Transactions on Learning Technologies, vol. 3, no. 3, pp. 237–249, Jul. 2010, doi: https://doi.org/10.1109/tlt.2010.4

M. Mohler and R. Mihalcea, “Text-to-text semantic similarity for automatic short answer grading,” presented at the Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009), Athens, Greece, 2009, pp. 567-575.

F. S. Pribadi, A. B. Utomo, and A. Mulwinda, “Automated short essay scoring system using normalized Simpson methods,” in Proceedings of the 6th International Conference on Education, Concept, and Application of Green Technology, Semarang, Indonesia, 2018.

L. dela-Fuente-Valentín, E. Verdú, N. Padilla-Zea, C. Villalonga, X. P. Blanco Valencia, and S. M. Baldiris Navarro, “Semiautomatic Grading of Short Texts for Open Answers in Higher Education,” in Higher Education Learning Methodologies and Technologies Online, 2022, pp. 49-62.

L. Ramachandran, J. Cheng, and P. Foltz, “Identifying patterns for short answer scoring using graph-based lexico-semantic text matching,” in Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications, Denver, Colorado, 2015, pp. 97-106.

I. G. Ndukwe, B. K. Daniel, and C. E. Amadi, “A Machine Learning Grading System Using Chatbots,” in Artificial Intelligence in Education, 2019, pp. 365-368.

H. Qi, Y. Wang, J. Dai, J. Li, and X. Di, “Attention-based hybrid model for automatic short answer scoring,” in Simulation Tools and Techniques: 11th International Conference, SIMUtools 2019, Chengdu, China, July 8–10, 2019, Proceedings 11, 2019.

M. Uto and Y. Uchida, “Automated Short-Answer Grading Using Deep Neural Networks and Item Response Theory,” in Artificial Intelligence in Education, 2020, pp. 334-339.

K. Sakaguchi, M. Heilman, and N. Madnani, “Effective feature integration for automated short answer scoring,” in Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, 2015, pp. 1049-1054.

N. LaVoie, J. Parker, P. J. Legree, S. Ardison, and R. N. Kilcullen, “Using Latent Semantic Analysis to Score Short Answer Constructed Responses: Automated Scoring of the Consequences Test,” Educational and Psychological Measurement, vol. 80, no. 2, pp. 399–414, Jul. 2019, doi: https://doi.org/10.1177/0013164419860575

R. Agarwal, V. Khurana, K. Grover, M. Mohania and V. Goyal, “MultiRelational Graph Transformer for Automatic Short Answer Grading,” in Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, United States, 2022, pp. 2001-2012.

H. Oka, H. T. Nguyen, C. T. Nguyen, M. Nakagawa and T. Ishioka, “Fully Automated Short Answer Scoring of the Trial Tests for Common Entrance Examinations for Japanese University,” in International Conference on Artificial Intelligence in Education, 2022, Lecture Notes in Computer Science, vol 13355. Springer, Cham.

J. Sawatzki, T. Schlippe and M. Benner-Wickner, “Deep learning techniques for automatic short answer grading: Predicting scores for English and German answers,” in Artificial Intelligence in Education: Emerging Technologies, Models and Applications: Proceedings of 2021 2nd International Conference on Artificial Intelligence in Education Technology, 2022, Lecture Notes on Data Engineering and Communications Technologies, vol 104. Springer, Singapore.

K. Steimel and B. Riordan, “Toward instance-based content scoring with pretrained transformer models,” in Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020, vol. 34.

C. Sung, T. I. Dhamecha and N. Mukhi, “Improving short answer grading using transformer-based pretraining,” in Artificial Intelligence in Education: 20th International Conference, AIED 2019, Chicago, IL, USA, June 25-29, 2019, Proceedings, Part I, 2019.

S. Takano and O. Ichikawa, “Automatic scoring of short answers using justification cues estimated by BERT,” in Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022), Seattle, Washington, 2022, pp. 8-13.

H. A. Ghavidel, A. Zouaq and M. C. Desmarais, “Using BERT and XLNET for the Automatic Short Answer Grading Task,” in Proceedings of the 12th International Conference on Computer Supported Education - Volume 1: CSEDU, 2020, pp 58-67.

R. Somers, S. Cunningham-Nelson, and W. Boles, “Applying natural language processing to automatically assess student conceptual understanding from textual responses,” Australasian Journal of Educational Technology, vol. 37, no. 5, pp. 98–115, Dec. 2021, doi: https://doi.org/10.14742/ajet.7121

J. Garg, J. Papreja, K. Apurva, and G. Jain, “Domain-Specific Hybrid BERT based System for Automatic Short Answer Grading,” presented at the 2022 2nd International Conference on Intelligent Technologies (CONIT), Hubli, India, 2022, pp. 1-6.

V. Ramnarain-Seetohul, V. Bassoo, and Y. Rosunally, “Work-in-Progress: Computing Sentence Similarity for Short Texts using Transformer models,” presented at the 2022 IEEE Global Engineering Education Conference (EDUCON), Tunis, Tunisia, 2022, pp. 1765-1768.

M. H. Haidir and A. Purwarianti, “Short answer grading using contextual word embedding and linear regression,” Jurnal Linguistik Komputasional, vol. 3, no. 2, pp. 54-61, 2020.

J. H. Clark, D. Garrette, I. Turc, and J. Wieting, “Canine: Pre-training an Efficient Tokenization-Free Encoder for Language Representation,” Transactions of the Association for Computational Linguistics, vol. 10, pp. 73–91, 2022, doi: https://doi.org/10.1162/tacl_a_00448

V. Sanh, L. Debut, J. Chaumond, and T. Wolf, “DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter,” arXiv preprint arXiv:1910.01108, 2019.

J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pretraining of Deep Bidirectional Transformers for Language Understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1 (Long and Short Papers), 2019, pp. 4171-4186.

Y. Wu et al., “Google’s neural machine translation system: Bridging the gap between human and machine translation,” arXiv preprint arXiv:1609.08144, 2016.

B. Rodrawangpai and W. Daungjaiboon, “Improving text classification with transformers and layer normalization,” Machine Learning with Applications, vol. 10, p. 100403, Dec. 2022, doi: https://doi.org/10.1016/j.mlwa.2022.100403

K. Song, X. Tan, T. Qin, J. Lu and T. Y. Liu, “Mpnet: Masked and permuted pretraining for language understanding,” in Advances in Neural Information Processing Systems, vol. 33, pp. 16857-16867, 2020.

Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov and Q. V. Le, “Xlnet: Generalized autoregressive pretraining for language understanding,” in Advances in Neural Information Processing Systems, vol. 32, 2019.

T. Kudo and J. Richardson, “SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 66-71, 2018.

R. Sennrich, B. Haddow and A. Birch, “Neural Machine Translation of Rare Words with Subword Units,” in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 1715-1725, 2016.

Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen et al., “RoBERTa: A Robustly Optimized BERT Pretraining Approach,” arXiv preprint arXiv:1907.11692, 2019.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez et al., “Attention is All You Need,” in Advances in Neural Information Processing Systems, vol. 30, 2017.

L. Camus and A. Filighera, “Investigating transformers for automatic short answer grading,” in Artificial Intelligence in Education: 21st International Conference, AIED 2020, Ifrane, Morocco, July 6–10, 2020, Proceedings, Part II, vol. 21, pp. 43-48, Springer International Publishing, 2020.

J. Seo, S. Lee, and L. Liu, “TA-SBERT: Token Attention Sentence-BERT for Improving Sentence Representation,” IEEE Access, vol. 10, pp. 39119- 39128, Apr. 2022, doi: https://doi.org/10.1109/ACCESS.2022.3164769

F. García-Peñalvo, & A. Vázquez-Ingelmo, “What Do We Mean by GenAI? A Systematic Mapping of The Evolution, Trends, and Techniques Involved in Generative AI,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 8, no. 4, pp. 7-16, 2023, https://doi.org/10.9781/ijimai.2023.07.006

J. M. Flores-Vivar, & F. J. García-Peñalvo, “Reflections on the ethics, potential, and challenges of artificial intelligence in the framework of quality education (SDG4)”. Comunicar, vol. 31, no. 74, pp. 37-47, 2023, https://doi.org/10.3916/C74-2023-03