Song, Hao, and Peter Flach. “Efficient and Robust Model Benchmarks With Item Response Theory and Adaptive Testing”. International Journal of Interactive Multimedia and Artificial Intelligence, vol. 6, no. 5, Mar. 2021, pp. 110-8, doi:10.9781/ijimai.2021.02.009.