PeopleNet: A Novel People Counting Framework for Head-Mounted Moving Camera Videos.

Ankit Tomar; Santosh Kumar; Bhasker Pant

doi:10.9781/ijimai.2023.04.002

Authors

Ankit Tomar Koneru Lakshmaiah Education Foundation
Santosh Kumar Koneru Lakshmaiah Education Foundation
Bhasker Pant Koneru Lakshmaiah Education Foundation

DOI:

https://doi.org/10.9781/ijimai.2023.04.002

Keywords:

Deep Learning, Density Map, Feature Extraction, Moving Camera Videos, Counting Individuals

Supporting Agencies

We are grateful to Graphic Era University to provide the computational resources execution of this research work. We also give special thanks to the research team and websites for providing the public Mall, SmartCity, and ShanghaiTech-B dataset repositories. We are also grateful to the anonymous reviewers for providing us with valuable comments to make this research work better.

Abstract

Traditional crowd counting (optical flow or feature matching) techniques have been upgraded to deep learning (DL) models due to their lack of automatic feature extraction and low-precision outcomes. Most of these models were tested on surveillance scene crowd datasets captured by stationary shooting equipment. It is very challenging to perform people counting from the videos shot with a head-mounted moving camera; this is mainly due to mixing the temporal information of the moving crowd with the induced camera motion. This study proposed a transfer learning-based PeopleNet model to tackle this significant problem. For this, we have made some significant changes to the standard VGG16 model, by disabling top convolutional blocks and replacing its standard fully connected layers with some new fully connected and dense layers. The strong transfer learning capability of the VGG16 network yields in-depth insights of the PeopleNet into the good quality of density maps resulting in highly accurate crowd estimation. The performance of the proposed model has been tested over a self-generated image database prepared from moving camera video clips, as there is no public and benchmark dataset for this work. The proposed framework has given promising results on various crowd categories such as dense, sparse, average, etc. To ensure versatility, we have done self and cross-evaluation on various crowd counting models and datasets, which proves the importance of the PeopleNet model in adverse defense of society.

Downloads

Download data is not yet available.

References

A. Ferligoj, V. Batagelj, “Direct multicriteria clustering algorithms,” Journal of classification, vol. 9, no. 1, pp. 43– 61, 1992.

H. Faris, I. Aljarah, S. Mirjalili, “Training feedforward neural networks using multi-verse optimizer for binary classification problems,” Applied Intelligence, vol. 45, pp. 322–332, 2016.

A. Korotayev, J. Zinkina, “Egypt’s 2011 revolution: A demographic structural analysis,” in Handbook of revolutions in the 21st century: The new waves of revolutions, and the causes and effects of disruptive political change, Springer, 2022, pp. 651–683.

C. A. Martin, C. Marshall, P. Patel, C. Goss, D. R. Jenkins, C. Ellwood, L. Barton, A. Price, N. J. Brunskill, K. Khunti, et al., “Association of demographic and occupational factors with sars-cov-2 vaccine uptake in a multi-ethnic uk healthcare workforce: a rapid real- world analysis,” MedRXiv, pp. 2021–02, 2021.

E. A. Felemban, F. U. Rehman, S. A. A. Biabani, A. Ahmad, A. Naseer, A. R. M. A. Majid, O. K. Hussain, A. M. Qamar, R. Falemban, F. Zanjir, “Digital revolution for hajj crowd management: a technology survey,” IEEE Access, vol. 8, pp. 208583–208609, 2020.

R. T. Collins, A. J. Lipton, T. Kanade, H. Fujiyoshi, D. Duggins, Y. Tsin, D. Tolliver, N. Enomoto, O. Hasegawa, P. Burt, et al., “A system for video surveillance and monitoring,” VSAM final report, vol. 2000, no. 1-68, p. 1, 2000.

M. Adimoolam, S. Mohan, G. Srivastava, et al., “A novel technique to detect and track multiple objects in dynamic video surveillance systems,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 7, pp. 112–120, 2022.

A. Sobral, E.-h. Zahzah, “Matrix and tensor completion algorithms for background model initialization: A comparative evaluation,” Pattern Recognition Letters, vol. 96, pp. 22–33, 2017.

B. Xu, G. Qiu, “Crowd density estimation based on rich features and random projection forest,” in 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), 2016, pp. 1–8, IEEE.

C. Arteta, V. Lempitsky, J. A. Noble, A. Zisserman, “Interactive object counting,” in Computer Vision– ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part III 13, 2014, pp. 504–518, Springer.

C. Zhang, Z. Liu, C. Bi, S. Chang, “Dependent motion segmentation in moving camera videos: A survey,” IEEE Access, vol. 6, pp. 55963–55975, 2018.

D. Wang, “Unsupervised video segmentation based on watersheds and temporal tracking,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 8, no. 5, pp. 539–546, 1998.

D. Comaniciu, P. Meer, “Mean shift: A robust approach toward feature space analysis,” IEEE Transactions on pattern analysis and machine intelligence, vol. 24, no. 5, pp. 603–619, 2002.

A. Ghazvini, S. N. H. S. Abdullah, M. Ayob, “A recent trend in individual counting approach using deep network,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 5, pp. 7–14, 2019.

A. Ghosh, B. N. Subudhi, S. Ghosh, “Object detection from videos captured by moving camera by fuzzy edge incorporated markov random field and local histogram matching,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 8, pp. 1127–1135, 2012.

P.-M. Jodoin, M. Mignotte, C. Rosenberger, “Segmentation framework based on label field fusion,” IEEE Transactions on Image Processing, vol. 16, no. 10, pp. 2535–2550, 2007.

B. Lee, M. Hedley, “Background estimation for video surveillance,” in Image and Vision Computing, 2002, pp. 315–320, CSIRO.

C. Stauffer, W. E. L. Grimson, “Adaptive background mixture models for real-time tracking,” in Proceedings. 1999 IEEE computer society conference on computer vision and pattern recognition, vol. 2, 1999, pp. 246–252, IEEE.

O. Munteanu, T. Bouwmans, E. Zahzah, R. Vasiu, “The detection of moving objects in video by background subtraction using dempstershafer theory,” Transactions on Electronics and Communications, vol. 60, no. 1, pp. 1–9, 2015.

C. Marghes, T. Bouwmans, R. Vasiu, “Background modeling and foreground detection via a reconstructive and discriminative subspace learning approach,” in International Conference on Image Processing, Computer Vision, and Pattern Recognition, IPCV, vol. 2012, 2012.

R. C. Joshi, A. G. Singh, M. Joshi, S. Mathur, “A low-cost and computationally efficient approach for occlusion handling in video surveillance systems,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 5, pp. 28–38, 2019.

J. A. Ramirez-Quintana, M. I. Chacon-Murguia, “Self- adaptive somcnn neural system for dynamic object detection in normal and complex scenarios,” Pattern Recognition, vol. 48, no. 4, pp. 1137–1149, 2015.

J. A. Ramírez-Quintana, M. I. Chacon-Murguía, “Self- organizing retinotopic maps applied to background modeling for dynamic object segmentation in video sequences,” in The 2013 International Joint Conference on Neural Networks (IJCNN), 2013, pp. 1–8, IEEE.

C. R. Wren, A. Azarbayejani, T. Darrell, A. P. Pentland, “Pfinder: Realtime tracking of the human body,” IEEE Transactions on pattern analysis and machine intelligence, vol. 19, no. 7, pp. 780–785, 1997.

H.-x. Zhang, D. Xu, “Fusing color and gradient features for background model,” in 2006 8th International Conference on Signal Processing, vol. 2, 2006, IEEE.

D. Zeng, M. Zhu, A. Kuijper, “Combining background subtraction algorithms with convolutional neural network,” Journal of Electronic Imaging, vol. 28, no. 1, pp. 013011–013011, 2019.

M. Babaee, D. T. Dinh, G. Rigoll, “A deep convolutional neural network for background subtraction,” arXiv preprint arXiv:1702.01731, 2017.

L. Xu, Y. Li, Y. Wang, E. Chen, “Temporally adaptive restricted boltzmann machine for background modeling,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29, 2015.

M. J. Shafiee, P. Siva, P. Fieguth, A. Wong, “Real- time embedded motion detection via neural response mixture modeling,” Journal of Signal Processing Systems, vol. 90, pp. 931–946, 2018.

M. J. Shafiee, P. Siva, P. Fieguth, A. Wong, “Embedded motion detection via neural response mixture background modeling,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), 2016, pp. 837–844, IEEE.

R. Guo, H. Qi, “Partially-sparse restricted boltzmann machine for background modeling and subtraction,” in 2013 12th International Conference on Machine Learning and Applications, vol. 1, 2013, pp. 209– 214, IEEE.

P. Xu, M. Ye, X. Li, Q. Liu, Y. Yang, J. Ding, “Dynamic background learning through deep auto- encoder networks,” in Proceedings of the 22nd ACM international conference on Multimedia, 2014, pp. 107– 116.

Z. Qu, S. Yu, M. Fu, “Motion background modeling based on contextencoder,” in 2016 Third International Conference on Artificial Intelligence and Pattern Recognition (AIPR), 2016, pp. 1–5, IEEE.

M. Braham, M. Van Droogenbroeck, “Deep background subtraction with scene-specific convolutional neural networks,” in 2016 international conference on systems, signals and image processing (IWSSIP), 2016, pp. 1–4, IEEE.

L. P. Cinelli, “Anomaly detection in surveillance videos using deep residual networks,” Universidade Federal do Rio de Janeiro, Rio de Janeiro, 2017.

C. M. Bautista, C. A. Dy, M. I. Mañalac, R. A. Orbe, M. Cordel, “Convolutional neural network for vehicle detection in low-resolution traffic videos,” in 2016 IEEE Region 10 Symposium (TENSYMP), 2016, pp. 277–281, IEEE.

X. Zhao, Y. Chen, M. Tang, J. Wang, “Joint background reconstruction and foreground segmentation via a two-stage convolutional neural network,” in 2017 IEEE International Conference on Multimedia and Expo (ICME), 2017, pp. 343–348, IEEE.

J. Wang, K. L. Chan, “Background subtraction based on encoder-decoder structured cnn,” in Pattern Recognition: 5th Asian Conference, ACPR 2019, Auckland, New Zealand, November 26–29, 2019, Revised Selected Papers, Part II 5, 2020, pp. 351–361, Springer.

Y. Wang, Z. Luo, P.-M. Jodoin, “Interactive deep learning method for segmenting moving objects,” Pattern Recognition Letters, vol. 96, pp. 66–75, 2017.

Y. Zhang, D. Zhou, S. Chen, S. Gao, Y. Ma, “Single- image crowd counting via multi-column convolutional neural network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 589– 597.

K. He, X. Zhang, S. Ren, J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.

L. Boominathan, S. S. Kruthiventi, R. V. Babu, “Crowdnet: A deep convolutional network for dense crowd counting,” in Proceedings of the 24th ACM international conference on Multimedia, 2016, pp. 640– 644.

Y. Li, X. Zhang, D. Chen, “Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 1091–1100.

G. Gao, J. Gao, Q. Liu, Q. Wang, Y. Wang, “Cnn-based density estimation and crowd counting: A survey,” arXiv preprint arXiv:2003.12783, 2020.

V. Lempitsky, A. Zisserman, “Learning to count objects in images,” Advances in neural information processing systems, vol. 23, 2010.

S. Kumagai, K. Hotta, T. Kurita, “Mixture of counting cnns,” Machine Vision and Applications, vol. 29, no. 7, pp. 1119–1126, 2018.

Y. Zhang, S. J. Kiselewich, W. A. Bauson, R. Hammoud, “Robust moving object detection at distance in the visible spectrum and beyond using a moving camera,” in 2006 conference on computer vision and pattern recognition workshop (CVPRW’06), 2006, pp. 131–131, IEEE.

K. K. Verma, B. M. Singh, “Deep multi-model fusion for human activity recognition using evolutionary algorithms,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 7, pp. 44– 58, 2021.

T. Li, H. Chang, M. Wang, B. Ni, R. Hong, S. Yan, “Crowded scene analysis: A survey,” IEEE transactions on circuits and systems for video technology, vol. 25, no. 3, pp. 367–386, 2014.

Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.

L. Liu, J. Jiang, W. Jia, S. Amirgholipour, Y. Wang, M. Zeibots, X. He, “Denet: A universal network for counting crowd with varying densities and scales,” IEEE Transactions on Multimedia, vol. 23, pp. 1060–1068, 2020.

L. Liu, H. Lu, H. Zou, H. Xiong, Z. Cao, C. Shen, “Weighing counts: Sequential crowd counting by reinforcement learning,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part X 16, 2020, pp. 164–181, Springer.

M. Hossain, M. Hosseinzadeh, O. Chanda, Y. Wang, “Crowd counting using scale-aware attention networks,” in 2019 IEEE winter conference on applications of computer vision (WACV), 2019, pp. 1280–1288, IEEE.

X. Ding, Z. Lin, F. He, Y. Wang, Y. Huang, “A deeply- recursive convolutional network for crowd counting,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 1942–1946, IEEE.

Z.-Q. Cheng, J.-X. Li, Q. Dai, X. Wu, A. G. Hauptmann, “Learning spatial awareness to improve crowd counting,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 6152–6161.

H. Idrees, I. Saleemi, C. Seibert, M. Shah, “Multi- source multi-scale counting in extremely dense crowd images,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 2547– 2554.

A. B. Chan, Z.-S. J. Liang, N. Vasconcelos, “Privacy preserving crowd monitoring: Counting people without people models or tracking,” in 2008 IEEE conference on computer vision and pattern recognition, 2008, pp. 1–7, IEEE.