Neighborhood Structure-Based Model for Multilingual Arbitrarily-Oriented Text Localization in Images/Videos.

H. T. Basavaraju; V. N. Manjunath Aradhya; D. S. Guru

doi:10.9781/ijimai.2021.05.003

Authors

H. T. Basavaraju JSS Science and Technology University
V. N. Manjunath Aradhya JSS Science and Technology University
D. S. Guru University of Mysore

DOI:

https://doi.org/10.9781/ijimai.2021.05.003

Keywords:

Multilingual Text, Clustering, Computer vision, Image Processing, Maxmin Cluster, Arbitrarily-Oriented

Abstract

The text matter in an image or a video provides more important clue and semantic information of the particular event in the actual situation. Text localization task stands an interesting and challenging research-oriented process in the zone of image processing due to irregular alignments, brightness, degradation, and complexbackground. The multilingual textual information has different types of geometrical shapes and it makes further complex to locate the text information. In this work, an effective model is presented to locate the multilingual arbitrary oriented text. The proposed method developed a neighborhood structure model to locate the text region. Initially, the maxmin cluster is applied along with 3X3 sliding window to sharpen the text region. The neighborhood structure creates the boundary for every component using normal deviation calculated from the sharpened image. Finally, the double stroke structure model is employed to locate the accurate text region. The presented model is analyzed on five standard datasets such as NUS, arbitrarily oriented text, Hua's, MRRC and real-time video dataset with performance metrics such as recall, precision, and f-measure.

Downloads

Download data is not yet available.

References

V. N. Manjunath Aradhya, H. T. Basavaraju, and D. S. Guru, “Decade research on text detection in images/videos: a review,” Evolutionary Intelligence, 2019, pp. 1-27, https://doi.org/10.1007/s12065-019-00248-z

V.N. M. Aradhya, and M. S. Pavithra, “An application of LBF energy in image/video frame text detection,” 14th international conference on frontiers in handwriting recognition, 2014, pp.760–765.

V. N. M. Aradhya, M. S. Pavithra and C. Naveena, “A robust multilingual text detection approach based on transforms and wavelet entropy,” Procedia Technology, 2012, pp. 232-237.

V. N. M. Aradhya, M. S. Pavithra, and S. K. Niranjan, “An exploration of wavelet transform and level set method for text detection in images and video frames,” In Recent advances in intelligent informatics, 2014, pp. 419-426.

S. Unar, A. H. Jalbani, M. M. Jawaid, M. Shaikh, and A. A. Chandio, “Artificial Urdu text detection and localization from individual video frames,” Mehran University research journal of engineering and technology, vol. 37, no. 2, 2018, pp. 429–438.

K. Dutta, N. Das, M. Kundu, and M. Nasipuri, “Text localization in natural scene images using extreme learning machine,” In second international conference on advanced computational and communication paradigms (ICACCP), 2019, pp.1–6.

M. Jiang, J. Cheng, M. Chen, and X. Ku, “An improved text localization method for natural scene images,” In journal of Physics: conference series, Vol. 960, No. 1, 2018, pp. 012027.

P. Shivakumara, D. S. Guru, and H. T. Basavaraju, “Color and gradient features for text segmentation from video frames,” International conference on multimedia processing, communication and computing applications, 2013, pp.267–278.

T. He, W. Huang, Y. Qiao, and J. Yao, “Accurate text localization in natural image with cascaded convolutional text network,” arXiv preprint arXiv:1603.09423,2016.

M. S. Pavithra, and V. N. M. Aradhya, “A comprehensive of transforms, Gabor filter and k-means clustering for text detection in images and video,” Applied computing and informatics, 2014, pp. 1–15.

V. N. M. Aradhya, and M. S. Pavithra, “An application of k-means clustering for improving video text detection,” In intelligent informatics, 2013, pp. 41-47.

H. T. Basavaraju, V. N. M. Aradhya, and D. S. Guru, “A novel arbitrary-oriented multilingual text detection in images/video,” In information and decision sciences, 2018, pp. 519–529.

B. H. Shekar, M. L. Smitha, and P. Shivakumara, “Discrete wavelet transform and gradient difference based approach for text localization in videos,” In fifth international conference on signal and image processing, 2014, pp. 280–284.

L. Neumann, and J. Matas, “Scene text localization and recognition with oriented stroke detection,” In Proceedings of the IEEE international conference on computer vision, 2013, pp. 97–104.

H. T. Basavaraju, V. N. M. Aradhya, and D. S. Guru, “Text detection through hidden Markov random field and EM-algorithm,” In information systems design and intelligent applications, 2019, pp. 19–29.

M. Xue, P. Shivakumara, C. Zhang, T. Lu, and U. Pal, “Curved text detection in blurred/non-blurred video/scene images,” Multimedia tools and applications, 2019, pp. 1–25.

Y. Liu, L. Jin, S. Zhang, C. Luo, and S. Zhang, “Curved scene text detection via transverse and longitudinal sequence connection,” Pattern Recognition, Vol. 90, 2019, pp. 337–345.

X. Li, W. Wang, W. Hou, R. Z. Liu, T. Lu, and J. Yang, “Shape robust text detection with progressive scale expansion network,” arXiv preprint arXiv:1806.02559, 2018.

E. Xie, Y. Zang, S. Shao, G. Yu, C. Yao, and G. Li, “Scene text detection with supervised pyramid context network,” In proceedings of the AAAI conference on artificial intelligence, Vol. 33, 2019, pp. 9038–9045.

K. S. Satwashil, and V. R. Pawar “English text localization and recognition from natural scene image,” In international conference on intelligent computing and control systems (ICICCS), 2017, pp. 555–559.

M. Busta, L. Neumann, and J. Matas, “Deep text spotter: An end-toend trainable scene text localization and recognition framework,” In proceedings of the IEEE international conference on computer vision, 2017, pp. 2204–2212.

D. Wu, R. Wang, P. Dai, Y. Zhang, and X. Cao, “Deep strip-based network with cascade learning for scene text localization,”. In 14th IAPR international conference on document analysis and recognition (ICDAR), Vol. 1, 2017, pp. 826–831.

S. Panda, S. Ash, N. Chakraborty, A. F. Mollah, S. Basu, and R. Sarkar, “Parameter tuning in mser for text localization in multi-lingual camera-captured scene text images,” In computational intelligence in pattern recognition. Springer, 2020, pp. 999–1009.

M. Villamizar, O. Can´evet, and J. M. Odobez, “Multi-scale sequential network for semantic text segmentation and localization,” in Pattern Recognition Letters, Vol. 129, Elsevier, 2020, pp. 63–69.

Z. Zhang, Z. Tang, Y. Wang, J. Qin, H. Zhang, and S. Yan, “Fast dense residual network: Enhancing global dense feature flow for text recognition,” in arXiv preprint arXiv:2001.09021.

R. Ghoshal and A. Banerjee, “Svm and mlp based segmentation and recognition of text from scene images through an effective binarization scheme,” In computational intelligence in pattern recognition. Springer, 2020, pp. 237–246.

X. S. Hua, L. Wenyin, and H. J. Zhang, “An automatic performance evaluation protocol for video text detection algorithms,” IEEE Trans CSVT, 2004, pp. 498–507.

C. Lu, C. Wang, and R. Dai, “Text detection in images based on unsupervised classification of edge based features”, In: Proceedings. ICDAR, 2005, pp. 610–614.

Multi-script robust reading competition. http://mile.ee.iisc.ernet.in/mrrc/index.html

P. Shivakumara, H. T. Basavaraju, D. S. Guru, and C. L. Tan, “Detection of curved text in video: quadtree based method,” In: 12th international conference on document analysis and recognition (ICDAR), 2013, pp. 594–598.

J. Zhou, L. Xu, B. Xiao, and R. Dai, “A robust system for text extraction in video,” In: Proceedings of ICMV, 2007, pp. 119–124.

E. K. Wong, and M. Chen, “A new robust algorithm for video text extraction,” Pattern Recognition, 2003, pp. 1397–1406.

N Sharma, P. Shivakumara, U. Pal, M Blumenstein, and C. L. Tan, “New method for arbitrarily oriented text detection in video,” In: Proceedings of DAS, 2012, pp. 74–78.

P. Shivakumara, T. Q. Phan, and C. L. Tan, “New Fourier-statistical features in RGB space for video text detection,” IEEE Transaction on CSVT, 2010, pp. 1520–1532.

C. Lu, C. Wang, and R. Dai, “Text detection in images based on unsupervised classification of edge based features,” In: Proceedings of ICDAR, 2005, pp. 610–614.

P. Shivakumara, R. P. Sreedhar, T. Q. Phan, S. Lu, and C. L. Tan, “Multi-oriented video scene text detection through Bayesian classification and boundary growing,” IEEE Trans. CSVT, 2012, pp. 1227–235.

X. C. Yin, X. Yin, K. Huang, and H. W. Hao, “Robust text detection in natural scene images”, IEEE Trans. PAMI 36, 2014, pp. 970–983.