Can Generative AI Solve Geometry Problems? Strengths and Weaknesses of LLMs for Geometric Reasoning in Spanish.

Verónica Parra; Patricia Sureda; Ana Corica; Silvia Schiaffino; Daniela Godoy

doi:10.9781/ijimai.2024.02.009

Authors

Verónica Parra Universidad Nacional del Centro de la Provincia de Buenos Aires
Patricia Sureda Universidad Nacional del Centro de la Provincia de Buenos Aires
Ana Corica Universidad Nacional del Centro de la Provincia de Buenos Aires
Silvia Schiaffino Universidad Nacional del Centro de la Provincia de Buenos Aires
Daniela Godoy Universidad Nacional del Centro de la Provincia de Buenos Aires

DOI:

https://doi.org/10.9781/ijimai.2024.02.009

Keywords:

Chatbot, Generative AI, Geometry, Large Language Models, Math Problem-Solving

Abstract

Generative Artificial Intelligence (AI) has emerged as a disruptive technology that is challenging traditional teaching and learning practices. Question-answering in natural language fosters the use of chatbots, such as ChatGPT, Bard and others, that generate text based on pre-trained Large Language Models (LLMs). The performance of these models in certain areas, like Math problem solving is receiving a crescent attention as it directly impacts on its potential use in educational settings. Most of these evaluations, however, concentrate on the construction and use of benchmarks comprising diverse Math problems in English. In this work, we discuss the capabilities of most used LLMs within the subfield of Geometry, in view of the relevance of this subject in high-school curricula and the difficulties exhibited by even most advanced multimodal LLMs to deal with geometric notions. This work focuses on Spanish, which is additionally a less resourced language. The answers of three major chatbots, based on different LLMs, were analyzed not only to determine their capacity to provide correct solutions, but also to categorize the errors found in the reasoning processes described. Understanding LLMs strengths and weaknesses in a field like Geometry can be a first step towards the design of more informed methodological proposals to include these technologies in classrooms as well as the development of more powerful automatic assistance tools based on generative AI.

Downloads

Download data is not yet available.

References

S. Frieder, L. Pinchetti, A. Chevalier, R.-R. Griffiths, T. Salvatori, T. Lukasiewicz, P. C. Petersen, J. Berner, “Mathematical capabilities of ChatGPT,” 2023.

D. Hendrycks, C. Burns, S. Kadavath, A. Arora, S. Basart, E. Tang, D. Song, J. Steinhardt, “Measuring mathematical problem solving with the MATH dataset,” in Proceedings of the 35th Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.

P. Shakarian, A. Koyyalamudi, N. Ngu, L. Mareedu, “An independent evaluation of ChatGPT on mathematical word problems (MWP),” in Proceedings of the AAAI 2023 Spring Symposium on Challenges Requiring the Combination of Machine Learning and Knowledge Engineering (AAAIMAKE 2023), 2023.

A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H. W. Chung, C. Sutton, S. Gehrmann, P. Schuh, K. Shi, S. Tsvyashchenko, J. Maynez, A. Rao, P. Barnes, Y. Tay, N. Shazeer, V. Prabhakaran, E. Reif, N. Du, B. Hutchinson, R. Pope, J. Bradbury, J. Austin, M. Isard, G. Gur-Ari, P. Yin, T. Duke, A. Levskaya, S. Ghemawat, S. Dev, H. Michalewski, X. Garcia, V. Misra, K. Robinson, L. Fedus, D. Zhou, D. Ippolito, D. Luan, H. Lim, B. Zoph, A. Spiridonov, R. Sepassi, D. Dohan, S. Agrawal, M. Omernick, A. M. Dai, T. S. Pillai, M. Pellat, A. Lewkowycz, E. Moreira, R. Child, O. Polozov, K. Lee, Z. Zhou, X. Wang, B. Saeta, M. Diaz, O. Firat, M. Catasta, J. Wei, K. Meier-Hellstern, D. Eck, J. Dean, S. Petrov, N. Fiedel, “PaLM: Scaling language modeling with pathways,” 2022.

K. Cobbe, V. Kosaraju, M. Bavarian, M. Chen, H. Jun, L. Kaiser, M. Plappert, J. Tworek, J. Hilton, R. Nakano, C. Hesse, J. Schulman, “Training verifiers to solve math word problems,” arXiv preprint arXiv:2110.14168, 2021.

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. HerbertVoss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei, “Language models are few-shot learners,” in Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS’20), Vancouver, BC, Canada, 2020.

OpenAI, “GPT-4 technical report,” ArXiv, vol. abs/2303.08774, 2023.

F. J. García Pen¨ alvo, F. Llorens-Largo, J. Vidal, “La nueva realidad de la educación ante los avances de la inteligencia artificial generativa,” RIEDRevista Iberoamericana de Educación a Distancia, vol. 27, p. 9–39, ene. 2024, doi: 10.5944/ried.27.1.37716.

B. Memarian, T. Doleck, “ChatGPT in education: Methods, potentials, and limitations,” Computers in Human Behavior: Artificial Humans, vol. 1, no. 2, p. 100022, 2023, doi: 10.1016/j.chbah.2023.100022.

B. Han, S. Nawaz, G. Buchanan, D. McKay, “Ethical and pedagogical impacts of AI in education,” in Artificial Intelligence in Education, Tokyo, Japan, 2023, pp. 667–673.

J. Flores-Vivar, F. García-Pen¨ alvo, “Reflections on the ethics, potential, and challenges of artificial intelligence in the framework of quality education (SDG4),” Comunicar, 2023, doi: 10.3916/C74-2023-03.

R. Hadi Mogavi, C. Deng, J. Juho Kim, P. Zhou, Y. D. Kwon, A. Hosny Saleh Metwally, A. Tlili, S. Bassanelli, A. Bucchiarone, S. Gujar, L. E. Nacke, P. Hui, “ChatGPT in education: A blessing or a curse? A qualitative study exploring early adopters’ utilization and perceptions,” Computers in Human Behavior: Artificial Humans, vol. 2, no. 1, p. 100027, 2024, doi: 10.1016/j.chbah.2023.100027.

S. S. Gill, M. Xu, P. Patros, H. Wu, R. Kaur, K. Kaur, S. Fuller, M. Singh, P. Arora, A. K. Parlikad, V. Stankovski, A. Abraham, S. K. Ghosh, H. Lutfiyya, S. S. Kanhere, R. Bahsoon, O. Rana, S. Dustdar, R. Sakellariou, S. Uhlig, R. Buyya, “Transformative effects of ChatGPT on modern education: Emerging era of AI chatbots,” Internet of Things and CyberPhysical Systems, vol. 4, pp. 19–23, 2024, doi: 10.1016/j.iotcps.2023.06.002.

C. K. Lo, “What is the impact of ChatGPT on education? A rapid review of the literature,” Education Sciences, vol. 13, no. 4, 2023, doi: 10.3390/educsci13040410.

S. Chithrananda, G. Grand, B. Ramsundar, “ChemBERTa: Large-scale self-supervised pretraining for molecular property prediction,” ArXiv, vol. abs/2010.09885, 2020.

Y. Wu, F. Jia, S. Zhang, H. Li, E. Zhu, Y. Wang, Y. T. Lee, R. Peng, Q. Wu, C. Wang, “An empirical study on challenging math problem solving with GPT-4,” 2023.

R. T. McCoy, S. Yao, D. Friedman, M. Hardy, T. L. Griffiths, “Embers of autoregression: Understanding large language models through the problem they are trained to solve,” 2023.

P. Nguyen, P. Nguyen, Bruneau, L. Cao, Wang, H. Truong, “Evaluation of mathematics performance of Google Bard on the mathematics test of the vietnamese national high school graduation examination,” 07 2023. doi: 10.36227/techrxiv.23691876.v1.

V. Plevris, G. Papazafeiropoulos, A. Jiménez Rios, “Chatbots put to the test in math and logic problems: A preliminary comparison and assessment of ChatGPT- 3.5, ChatGPT-4, and Google Bard,” 2023.

J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. Chi, Q. Le, D. Zhou, “Chain-of-thought prompting elicits reasoning in large language models,” 2023.

J. Gao, R. Pi, J. Zhang, J. Ye, W. Zhong, Y. Wang, L. Hong, J. Han, H. Xu, Z. Li, L. Kong, “G-LLaVA: Solving geometric problem with multi-modal large language model,” 2023.

H. Liu, C. Li, Q. Wu, Y. J. Lee, “Visual instruction tuning,” in NeurIPS, 2023.

Ministerio de Educación, Argentina, Núcleos de Aprendizajes Prioritarios. Matemática. Ciclo Básico Educación Secundaria 1° y 2° / 2° y 3° An¨ os. 2006.

R. S. Abrate, G. I. Delgado, M. D. Pochulu, “Caracterización de las actividades de geometría que proponen los textos de matemática,” Revista Iberoamericana de Educación, vol. 39, pp. 1–9, jun. 2006, doi: 10.35362/rie3912598.

M. B. López, I. B. Fernández, “Tendencias actuales de la enseñanza-aprendizaje de la geometría en educación secundaria,” Revista Internacional de Investigación en Ciencias Sociales, vol. 8, no. 1, pp. 25–42, 2012.

A. M. Bressan, K. Crego, B. Bogisic, Razones para ensenar geometría en la educación básica: mirar, construir, decir y pensar (1a. ed.). Novedades educativas, 2000.

C. R. Suárez, T. Ángel Sierra Delgado, “Spatial problems: An alternative proposal to teach geometry in compulsory secondary education,” Educaçao Matemática Pesquisa, vol. 22, ago. 2021, doi: 10.23925/1983-3156.2020v22i4p593-602.

L. Santalo, “Olimpíadas matemáticas,” Revista de Educación Matemática, vol. 6, ago. 2021, doi: 10.33044/revem.11101.

P. Fauring, F. Gutierrez Eds., Olimpiadas de Mayo - XVII a XXIV. Buenos Aires, Argentina: Red Olimpica, 2020.

B. Glass, C. Maher, “Students problem solving and justification,” in Proceedings of the 28th Conference of the International Group for the Psychology of Mathematics Education, vol. 2, 2004, pp. 463–470.

Y. S. Eko, S. Prabawanto, A. Jupri, “The role of writing justification in mathematics concept: the case of trigonometry,” Journal of Physics: Conference Series, vol. 1097, p. 012146, sep 2018, doi: 10.1088/1742-6596/1097/1/012146.

E. Pavlick, “Symbols and grounding in large language models,” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 381, no. 2251, p. 20220041, 2023, doi: 10.1098/rsta.2022.0041.

G. M. Zunzarren, “The error as a problem or as teaching strategy,” Procedia - Social and Behavioral Sciences, vol. 46, pp. 3209–3214, 2012, doi: 10.1016/j.sbspro.2012.06.038.