NSL-BP: A Meta Classifier Model Based Prediction of Amazon Product Reviews.

Pravin Kumar; Mohit Dayal; Manju Khari; Giuseppe Fenza; Mariacristina Gallo

doi:10.9781/ijimai.2020.10.001

Authors

Pravin Kumar Indian Institute of Technology Dhanbad
Mohit Dayal Ambedkar University Delhi
Manju Khari Netaji Subhas University of Technology
Giuseppe Fenza University of Salerno
Mariacristina Gallo University of Salerno

DOI:

https://doi.org/10.9781/ijimai.2020.10.001

Keywords:

Logistic Regression, Machine Learning, Naïve Bayes, Metamodel

Abstract

In machine learning, the product rating prediction based on the semantic analysis of the consumers' reviews is a relevant topic. Amazon is one of the most popular online retailers, with millions of customers purchasing and reviewing products. In the literature, many research projects work on the rating prediction of a given review. In this research project, we introduce a novel approach to enhance the accuracy of rating prediction by machine learning methods by processing the reviewed text. We trained our model by using many methods, so we propose a combined model to predict the ratings of products corresponding to a given review content. First, using k-means and LDA, we cluster the products and topics so that it will be easy to predict the ratings having the same kind of products and reviews together. We trained low, neutral, and high models based on clusters and topics of products. Then, by adopting a stacking ensemble model, we combine Naïve Bayes, Logistic Regression, and SVM to predict the ratings. We will combine these models into a two-level stack. We called this newly introduced model, NSL model, and compared the prediction performance with other methods at state of the art.

Downloads

Download data is not yet available.

References

Y. Qiang, R. Law, B. Gu, and W. Chen. “The influence of user-generated content on traveler behavior: An empirical investigation on the effects of e-word-of-mouth to hotel online bookings.” Computers in Human behavior 27, no. 2, pp. 634-639, 2011.

G. Gayatree, N. Elhadad, and A. Marian. “Beyond the stars: improving rating predictions using review text content.” In WebDB, vol. 9, pp. 1-6. 2009.

B. Stefano, A. Esuli, and F. Sebastiani. “Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining.” In Lrec, vol. 10, no. 2010, pp. 2200-2204. 2010.

K. B. Sotiris, I. Zaharakis, and P. Pintelas. “Supervised machine learning: A review of classification techniques.” Emerging artificial intelligence applications in computer engineering. Vol. 160, no. 1, pp. 3-14, 2007.

L. Pasquale, M. D. Gemmis, and G. Semeraro. “Content-based recommender systems: State of the art and trends.” In Recommender systems handbook, pp. 73-105. Springer, Boston, MA, 2011.

C. G. William, 2007. “Sampling techniques”. John Wiley & Sons, 2007.

T. F. Brian, J. H. Patterson, and W. V. Gehrlein. “A comparative evaluation of heuristic line balancing techniques.” Management science 32, no. 4 (1986): 430-454.

Y. P. Chaubey, “Resampling-based multiple testing: Examples and methods for p-value adjustment.” (1993): 450-451.

D. M. Hawkins, 2004. The problem of overfitting. Journal of chemical information and computer sciences, 2004, 44(1), pp.1-12.

S. Wararat. “The analysis and prediction of customer review rating using opinion mining.” In 2017 IEEE 15th International Conference on Software Engineering Research, Management and Applications (SERA), pp. 71-77. IEEE, 2017.

L. Xiaojiang, X. Qian, and G. Zhao. “Rating prediction based on social sentiment from textual reviews.” IEEE transactions on multimedia 18, no. 9 (2016): 1910-1921.

B. Moez, F. Mamalet, C. Wolf, C. Garcia, and A. Baskurt. “Sequential deep learning for human action recognition.” In International workshop on human behavior understanding, pp. 29-39. Springer, Berlin, Heidelberg, 2011.

S. C. Reddy, K. U. Kumar, J. D. Keshav, B. R. Prasad, and S. Agarwal. “Prediction of star ratings from online reviews.” In TENCON 2017-2017 IEEE Region 10 Conference, pp. 1857-1861. IEEE, 2017.

K. Noriaki. “Predicting future reviews: sentiment analysis models for collaborative filtering.” In Proceedings of the fourth ACM international conference on Web search and data mining, pp. 605-614. 2011.

H. Jiawei, and KC-C. Chang. “Data mining for web intelligence.” Computer 35, no. 11 (2002): 64-70.

P. D. Turney, and M. L. Littman. “Unsupervised learning of semantic orientation from a hundred-billion-word corpus.” arXiv preprint cs/0212012 (2002).

S. B. Kotsiantis, I. Zaharakis, and P. Pintelas. “Supervised machine learning: A review of classification techniques.” Emerging artificial intelligence applications in computer engineering 160, no. 1 (2007): 3-24.

A. B. Goldberg, and X. Zhu. “Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization.” In Proceedings of TextGraphs: The first workshop on graph based methods for natural language processing, pp. 45-52. 2006.

L. LiZhen, W. Wang, and H. Wang. “Summarizing customer reviews based on product features.” In 2012 5th International Congress on Image and Signal Processing, pp. 1615-1619. IEEE, 2012.

M. Jeff, and M. Lapata. “Vector-based models of semantic composition.” In proceedings of ACL-08: HLT, pp. 236-244. 2008.

T. Amit, S. Berkovsky, M. A. Kaafar, D. Vallet, T. Chen, and T. Kuflik. “Improving business rating predictions using graph based features.” In Proceedings of the 19th international conference on Intelligent User Interfaces, pp. 17-26. 2014.

Y. Zhang, R. Jin, and Z.H. Zhou. “Understanding bag-of-words model: a statistical framework.” International Journal of Machine Learning and Cybernetics 1, no. 1-4 (2010): 43-52.

A. Tiroshi, S. Berkovsky, M. A. Kaafar, D. Vallet, T. Chen, and T. Kuflik. “Improving business rating predictions using graph based features.” In Proceedings of the 19th international conference on Intelligent User Interfaces, pp. 17-26. 2014.

S. B. Kotsiantis, I. Zaharakis, and P. Pintelas. “Supervised machine learning: A review of classification techniques.” Emerging artificial intelligence applications in computer engineering 160, no. 1 (2007): 3-24.

D. M. Greig, B. T. Porteous, and A. H. Seheult. “Exact maximum a posteriori estimation for binary images.” Journal of the Royal Statistical Society: Series B (Methodological) 51, no. 2 (1989): 271-279.

R. Kohavi, “A study of cross-validation and bootstrap for accuracy estimation and model selection.” In Ijcai, vol. 14, no. 2, pp. 1137-1145. 1995.

H. Jelodar, Y. Wang, C. Yuan, X. Feng, X. Jiang, Y. Li, and L. Zhao. “Latent Dirichlet Allocation (LDA) and Topic modeling: models, applications, a survey.” Multimedia Tools and Applications 78, no. 11 (2019): 15169-15211.

A. K. Jain, “Data clustering: 50 years beyond K-means.” Pattern recognition letters 31, no. 8 (2010): 651-666.

J. Ramos, “Using tf-idf to determine word relevance in document queries.” In Proceedings of the first instructional conference on machine learning, vol. 242, pp. 133-142. 2003.

T. K. Landauer, P. W. Foltz, and D. Laham. “An introduction to latent semantic analysis.” Discourse processes 25, no. 2-3 (1998): 259-284.

E. R. Henry, and J. Hofrichter. “Singular value decomposition: Application to analysis of experimental data.” In Methods in enzymology, vol. 210, pp. 129-192. Academic Press, 1992.

L. V. Maaten, and G. Hinton. “Visualizing data using t-SNE.” Journal of machine learning research 9, no. Nov (2008): 2579-2605.

L. E. Sucar, “Probabilistic graphical models.” Advances in Computer Vision and Pattern Recognition. London: Springer London. doi 10(2015): 978-1.

J. Friedman, T. Hastie, and R. Tibshirani. The elements of statistical learning. Vol. 1, no. 10. New York: Springer series in statistics, 2001.