An Extensive Analysis of Machine Learning Based Boosting Algorithms for Software Maintainability Prediction.

Shikha Gupta; Anuradha Chug

doi:10.9781/ijimai.2021.10.002

Authors

Shikha Gupta Guru Gobind Singh Indraprastha University
Anuradha Chug Guru Gobind Singh Indraprastha University

DOI:

https://doi.org/10.9781/ijimai.2021.10.002

Keywords:

Boosting Algorithms, Feature Selection, Machine Learning, Software, Software Maintainability Prediction

Supporting Agencies

This research work has been supported by the O/o Director (Research & Consultancy), GGSIPU under the FRGS scheme through the project entitled, “Determination of Optimum Refactoring Sequence after Prioritization of Classes on the basis of their bad smell,” dt. 03.05.2019, Ref. No. GGSIPU/DRC/FRGS/2019/1553/62.

Abstract

Software Maintainability is an indispensable factor to acclaim for the quality of particular software. It describes the ease to perform several maintenance activities to make a software adaptable to the modified environment. The availability & growing popularity of a wide range of Machine Learning (ML) algorithms for data analysis further provides the motivation for predicting this maintainability. However, an extensive analysis & comparison of various ML based Boosting Algorithms (BAs) for Software Maintainability Prediction (SMP) has not been made yet. Therefore, the current study analyzes and compares five different BAs, i.e., AdaBoost, GBM, XGB, LightGBM, and CatBoost, for SMP using open-source datasets. Performance of the propounded prediction models has been evaluated using Root Mean Square Error (RMSE), Mean Magnitude of Relative Error (MMRE), Pred(0.25), Pred(0.30), & Pred(0.75) as prediction accuracy measures followed by a non-parametric statistical test and a post hoc analysis to account for the differences in the performances of various BAs. Based on the residual errors obtained, it was observed that GBM is the best performer, followed by LightGBM for RMSE, whereas, in the case of MMRE, XGB performed the best for six out of the seven datasets, i.e., for 85.71% of the total datasets by providing minimum values for MMRE, ranging from 0.90 to 3.82. Further, on applying the statistical test and on performing the post hoc analysis, it was found that significant differences exist in the performance of different BAs and, XGB and CatBoost outperformed all other BAs for MMRE. Lastly, a comparison of BAs with four other ML algorithms has also been made to bring out BAs superiority over other algorithms. This study would open new doors for the software developers for carrying out comparatively more precise predictions well in time and hence reduce the overall maintenance costs.

Downloads

Download data is not yet available.

References

“IEEE Standard for Software Maintenance,” IEEE Std 1219-1993, 1993, doi: 10.1109/IEEESTD.1993.11557.

“ISO/IEC 25010:2011(en) Systems and software engineering — Systems and software Quality Requirements and Evaluation (SQuaRE) — System and software quality models,” 2011. https://www.iso.org/obp/ui/#iso:std:iso-iec:25010:ed-1:v1:en

M. A. Ahmed and H. A. Al-Jamimi, “Machine learning approaches for predicting software maintainability: a fuzzy-based transparent model,” IET software, vol. 7, no. 6, pp. 317–326, 2013, doi: 10.1049/iet-sen.2013.0046.

N. Zighed, N. Bounour, and A.-D. Seriai, “Comparative Analysis of Object-Oriented Software Maintainability Prediction Models,” Foundations of Computing and Decision Sciences, vol. 43, no. 4, pp. 359–374, 2018, doi: 10.1515/fcds-2018-0018.

H. Alsolai and M. Roper, “Application of Ensemble Techniques in Predicting Object-Oriented Software Maintainability,” in Proceedings of the Evaluation and Assessment on Software Engineering, 2019, pp. 370–373, doi: 10.1145/3319008.3319716.

L. Kumar, D. K. Naik, and S. K. Rath, “Validating the effectiveness of object-oriented metrics for predicting maintainability,” Procedia Computer Science, vol. 57, pp. 798–806, 2015, doi: 10.1016/j.procs.2015.07.479.

R. Malhotra and A. Chug, “Application of Group Method of Data Handling model for software maintainability prediction using object oriented systems,” International Journal of System Assurance Engineering and Management, vol. 5, pp. 165–173, 2014, doi: 10.1007/s13198-014-0227-4.

H. Alsolai, “Predicting Software Maintainability in Object-Oriented Systems Using Ensemble Techniques,” in 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME), 2018, pp. 716–721, doi: 10.1109/ICSME.2018.00088.

A. Chug and R. Malhotra, “Benchmarking framework for maintainability prediction of open source software using object oriented metrics,” International Journal of Innovative Computing, Information and Control, vol. 12, no. 2, pp. 615–634, 2016.

S. R. Chidamber and C. F. Kemerer, “A metrics suite for object oriented design,” IEEE Transactions on software engineering, vol. 20, no. 6, pp. 476–493, 1994, doi: 10.1109/32.295895.

R. Malhotra and A. Chug, “Software Maintainability: Systematic Literature Review and Current Trends,” International Journal of Software Engineering and Knowledge Engineering, vol. 26, no. 8, pp. 1221–1253, 2016, doi: 10.1142/S0218194016500431.

D. Michie, D. J. Spiegelhalter, C. C. Taylor, and others, “Machine learning,” Neural and Statistical Classification, vol. 13, no. 1994, pp. 1–298, 1994.

M. Sharma, S. Sharma, and G. Singh, “Performance analysis of statistical and supervised learning techniques in stock data mining,” Data, vol. 3, no. 4, p. 54, 2018, doi: 10.3390/data3040054.

X. Zhong and D. Enke, “Predicting the daily return direction of the stock market using hybrid machine learning algorithms,” Financial Innovation, vol. 5, no. 1, p. 4, 2019, doi: 10.1186/s40854-019-0138-0.

K. C. Rasekhschaffe and R. C. Jones, “Machine learning for stock selection,” Financial Analysts Journal, vol. 75, no. 3, pp. 70–88, 2019, doi: 10.1080/0015198X.2019.1596678.

P. Kaur and M. Sharma, “Diagnosis of human psychological disorders using supervised learning and nature-inspired computing techniques: a meta-analysis,” Journal of medical systems, vol. 43, no. 7, p. 204, 2019, doi: 10.1007/s10916-019-1341-2.

G. T. Reddy, M. P. K. Reddy, K. Lakshmanna, D. S. Rajput, R. Kaluri, and G. Srivastava, “Hybrid genetic algorithm and a fuzzy logic classifier for heart disease diagnosis,” Evolutionary Intelligence, vol. 13, no. 2, pp. 185–196, 2020, doi: 10.1007/s12065-019-00327-1.

M. Sharma and P. Kaur, “A Comprehensive Analysis of Nature-Inspired Meta-Heuristic Techniques for Feature Selection Problem,” Archives of Computational Methods in Engineering, pp. 1–25, 2020, doi: 10.1007/ s11831-020-09412-6.

X. Ma and S. Lv, “Financial credit risk prediction in internet finance driven by machine learning,” Neural Computing and Applications, vol. 31, no. 12, pp. 8359–8367, 2019, doi: 10.1007/s00521-018-3963-6.

H. Ghoddusi, G. G. Creamer, and N. Rafizadeh, “Machine learning in energy economics and finance: A review,” Energy Economics, vol. 81, pp. 709–727, 2019, doi: 10.1016/j.eneco.2019.05.006.

S. K. Dubey, A. Rana, and Y. Dash, “Maintainability prediction of object-oriented software system by multilayer perceptron model,” ACM SIGSOFT Software Engineering Notes, vol. 37, no. 5, pp. 1–4, 2012, doi: 10.1145/2347696.2347703.

R. Malhotra and A. Chug, “Application of evolutionary algorithms for software maintainability prediction using object-oriented metrics,” in Proceedings of the 8th International Conference on Bioinspired Information and Communications Technologies, 2014, pp. 348–351, doi: 10.4108/icst.bict.2014.258044.

L. Kumar and S. K. Rath, “Hybrid functional link artificial neural network approach for predicting maintainability of object-oriented software,” Journal of Systems and Software, vol. 121, pp. 170–190, 2016, doi: 10.1016/j.jss.2016.01.003.

M. O. Elish, H. Aljamaan, and I. Ahmad, “Three empirical studies on predicting software maintainability using ensemble methods,” Soft Computing, vol. 19, no. 9, pp. 2511–2524, 2015, doi: 10.1007/s00500-014-1576-2.

J. Zheng, “Cost-sensitive boosting neural networks for software defect prediction,” Expert Systems with Applications, vol. 37, no. 6, pp. 4537–4543, 2010, doi: 10.1016/j.eswa.2009.12.056.

E. O. Costa, G. A. de Souza, A. T. R. Pozo, and S. R. Vergilio, “Exploring genetic programming and boosting techniques to model software reliability,” IEEE Transactions on Reliability, vol. 56, no. 3, pp. 422–434, 2007, doi: 10.1109/TR.2007.903269.

M. Akour, I. Alsmadi, and I. Alazzam, “Software fault proneness prediction: a comparative study between bagging, boosting, and stacking ensemble and base learner methods,” International Journal of Data Analysis Techniques and Strategies, vol. 9, no. 1, pp. 1–16, 2017, doi: 10.1504/IJDATS.2017.10003991.

Y. Freund, “Boosting a weak learning algorithm by majority,” Information and computation, vol. 121, no. 2, pp. 256–285, 1995, doi: 10.1006/inco.1995.1136.

D. Nielsen, “Tree boosting with xgboost-why does xgboost win ‘every’ machine learning competition?,” NTNU, 2016.

T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016, pp. 785–794, doi: 10.1145/2939672.2939785.

W. Li and S. Henry, “Object-oriented metrics that predict maintainability,” Journal of systems and software, vol. 23, no. 2, pp. 111–122, 1993, doi: 10.1016/0164-1212(93)90077-B.

M. Dagpinar and J. H. Jahnke, “Predicting maintainability with object-oriented metrics-an empirical comparison,” in 10th Working Conference on Reverse Engineering, 2003. WCRE 2003. Proceedings., 2003, pp. 155–164, doi: 10.1109/WCRE.2003.1287246.

M. M. T. Thwin and T.-S. Quah, “Application of neural networks for software quality prediction using object-oriented metrics,” Journal of systems and software, vol. 76, no. 2, pp. 147–156, 2005, doi: 10.1016/j.jss.2004.05.001.

C. Van Koten and A. R. Gray, “An application of Bayesian network for predicting object-oriented software maintainability,” Information and Software Technology, vol. 48, no. 1, pp. 59–67, 2006, doi: 10.1016/j.infsof.2005.03.002.

K. K. Aggarwal, Y. Singh, A. Kaur, and R. Malhotra, “Application of artificial neural network for predicting maintainability using object-oriented metrics,” Transactions on Engineering, Computing and Technology, vol. 15, pp. 285–289, 2006, doi: 10.5281/zenodo.1058483.

Y. Zhou and H. Leung, “Predicting object-oriented software maintainability using multivariate adaptive regression splines,” Journal of systems and software, vol. 80, no. 8, pp. 1349–1361, 2007, doi: 10.1016/j.jss.2006.10.049.

M. O. Elish and K. O. Elish, “Application of treenet in predicting object-oriented software maintainability: A comparative study,” in 2009 13th European Conference on Software Maintenance and Reengineering, 2009, pp. 69–78, doi: 10.1109/CSMR.2009.57.

A. Kaur, K. Kaur, and R. Malhotra, “Soft computing approaches for prediction of software maintenance effort,” International Journal of Computer Applications, vol. 1, no. 16, pp. 69–75, 2010, doi: 10.5120/339-515.

R. Malhotra1 and A. Chug2, “Software Maintainability Prediction using Machine Learning Algorithms,” Software engineering: an international Journal (SeiJ), vol. 2, no. 2, pp. 19–36, 2012.

L. Kumar and S. K. Rath, “Software maintainability prediction using hybrid neural network and fuzzy logic approach with parallel computing concept,” International Journal of System Assurance Engineering and Management, vol. 8, no. 2, pp. 1487–1502, 2017, doi: 10.1007/s13198-017-0618-4.

N. Baskar and C. Chandrasekar, “An Evolving Neuro-PSO-based Software Maintainability Prediction,” International Journal of Computer Applications, 2018, doi: 10.5120/ijca2018916305.

S. Jha et al., “Deep learning approach for software maintainability metrics prediction,” Ieee Access, vol. 7, pp. 61840–61855, 2019, doi: 10.1109/ACCESS.2019.2913349.

X. Wang, A. Gegov, F. Arabikhan, Y. Chen, and Q. Hu, “Fuzzy network based framework for software maintainability prediction,” International Journal of Uncertainty, Fuzziness and Knowledge Based Systems, vol. 27, no. 5, pp. 841–862, 2019, doi: 10.1142/S0218488519500375.

S. Gupta and A. Chug, “Assessing Cross-Project Technique for Software Maintainability Prediction,” in Procedia Computer Science, 2020, vol. 167, pp. 656–665, doi: 10.1016/j.procs.2020.03.332.

S. Gupta and A. Chug, “Software maintainability prediction using an enhanced random forest algorithm,” Journal of Discrete Mathematical Sciences and Cryptography, vol. 23, no. 2, pp. 441–449, 2020, doi: 10.1080/09720529.2020.1728898.

S. Gupta and A. Chug, “Software maintainability prediction of open source datasets using least squares support vector machines,” Journal of Statistics and Management Systems, vol. 23, no. 6, pp. 1011–1021, 2020, doi: 10.1080/09720510.2020.1799501.

S. R. Chidamber and C. F. Kemerer, “Towards a metrics suite for object oriented design,” 1991, doi: 10.1145/118014.117970.

R. Malhotra and A. Chug, “An empirical study to redefine the relationship between software design metrics and maintainability in high data intensive applications,” in Proceedings of the World Congress on Engineering and Computer Science, 2013, vol. 1.

B. Henderson-Sellers, Object-oriented metrics: measures of complexity. Prentice-Hall, Inc., 1995.

J. Bansiya and C. G. Davis, “A hierarchical model for object-oriented design quality assessment,” IEEE Transactions on software engineering, vol. 28, no. 1, pp. 4–17, 2002, doi: 10.1109/32.979986.

“MinMaxScaler Link.” https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html (accessed Dec. 14, 2019).

R. Kohavi and G. H. John, “Wrappers for feature subset selection,” Artificial intelligence, vol. 97, no. 1–2, pp. 273–324, 1997, doi: 10.1016/ S0004-3702(97)00043-X.

“RFE Documentation.” https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html (accessed Dec. 16, 2019).

K. T. Khaing, “Enhanced Features Ranking and Selection using Recursive Feature Elimination (RFE) and k-Nearest Neighbor Algorithms in Support Vector Machine for Intrusion Detection System,” International Journal of Network and Mobile Technologies, vol. 1, no. 1, pp. 1832–6758, 2010.

Y. Freund, R. E. Schapire, and others, “Experiments with a new boosting algorithm,” in Thirteenth International Conference on International Conference on Machine Learning (ICML’96), 1996, vol. 96, pp. 148–156.

Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,” Journal of computer and system sciences, vol. 55, no. 1, pp. 119–139, 1997, doi: 10.1006/jcss.1997.1504.

J. H. Friedman, “Stochastic gradient boosting,” Computational statistics & data analysis, vol. 38, no. 4, pp. 367–378, 2002, doi: 10.1016/S0167-9473(01)00065-2.

J. H. Friedman, “Greedy function approximation: a gradient boosting machine,” Annals of statistics, pp. 1189–1232, 2001, doi: 10.1214/aos/1013203451.

G. Ke et al., “Lightgbm: A highly efficient gradient boosting decision tree,” in Advances in Neural Information Processing Systems, 2017, pp. 3146–3154.

L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, “CatBoost: unbiased boosting with categorical features,” in Advances in Neural Information Processing Systems, 2018, pp. 6638–6648.

R. Kohavi and others, “A study of cross-validation and bootstrap for accuracy estimation and model selection,” in 14th International Joint Conference on Artificial Intelligence (IJCAI’95), 1995, vol. 14, no. 2, pp. 1137–1145.

S. D. Conte, H. E. Dunsmore, and Y. E. Shen, Software Engineering Metrics and Models. Benjamin-Cummings Publishing Co., Inc. Redwood City, CA, USA, 1986.

B. A. Kitchenham, S. MacDonell, L. Pickard, and M. Shepperd, “Assessing prediction systems,” The Information Science Discussion Paper Series, University of Otago, vol. 99/14, 1999, [Online]. Available: http://hdl.handle.net/10523/1015

B. A. Kitchenham, L. M. Pickard, S. G. MacDonell, and M. J. Shepperd, “What accuracy statistics really measure,” IEE Proceedings-Software, vol. 148, no. 3, pp. 81–85, 2001, doi: 10.1049/ip-sen:20010506.

B. Iglewicz, “Robust scale estimators and confidence intervals for location,” Understanding robust and exploratory data analysis, p. 405431, 1983.

M. Friedman, “A comparison of alternative tests of significance for the problem of m rankings,” The Annals of Mathematical Statistics, vol. 11, no. 1, pp. 86–92, 1940, doi: 10.1214/aoms/1177731944.

S. Lessmann, B. Baesens, C. Mues, and S. Pietsch, “Benchmarking classification models for software defect prediction: A proposed framework and novel findings,” IEEE Transactions on Software Engineering, vol. 34, no. 4, pp. 485–496, 2008, doi: 10.1109/TSE.2008.35.

J. Demšar, “Statistical comparisons of classifiers over multiple data sets,” Journal of Machine learning research, vol. 7, no. Jan, pp. 1–30, 2006.

S. R. Safavian and D. Landgrebe, “A survey of decision tree classifier methodology,” IEEE transactions on systems, man, and cybernetics, vol. 21, no. 3, pp. 660–674, 1991, doi: 10.1109/21.97458.

S. K. Pal and S. Mitra, “Multilayer perceptron, fuzzy sets, classification,” IEEE Transactions on Neural Networks, vol. 3, no. 5, pp. 683–697, 1992, doi: 10.1109/72.159058.

L. Breiman, “Bagging predictors,” Machine learning, vol. 24, no. 2, pp. 123–140, 1996, doi: 10.1023/A:1018054314350.

H. Zou and T. Hastie, “Regularization and variable selection via the elastic net,” Journal of the royal statistical society: series B (statistical methodology), vol. 67, no. 2, pp. 301–320, 2005, doi: 10.1111/j.1467- 9868.2005.00527.x.