Real World Anomalous Scene Detection and Classification using Multilayer Deep Neural Networks.

Atif Jan; Gul Muhammad Khan

doi:10.9781/ijimai.2021.10.010

Authors

Atif Jan University of Engineering and Technology Peshawar
Gul Muhammad Khan University of Engineering and Technology Peshawar

DOI:

https://doi.org/10.9781/ijimai.2021.10.010

Keywords:

Volume Crime Classification, Crime Detection, Malicious Activity Detection, Deep Learning

Supporting Agencies

Indeed, it was a tough journey to compile all this work. At many instances, I felt that I would not be able to do this. But thanks to my team at National Center of Artificial Intelligence who helped, supported, and motivated me on every step.

Abstract

Surveillance videos record malicious events in a locality utilizing various machine learning algorithms for detection. Deep-learning algorithms being the most prominent AI algorithms are data-hungry as well as computationally expensive. These algorithms perform better when trained over a diverse and huge set of examples. These modern AI methods have a dire need of utilizing human intelligence to pamper the problem in such a way as to reduce the ultimate effort in terms of computational cost. In this research work, a novel methodology termed Bag of Focus (BoF) based training methodology has been proposed. BoF is based on the concept of selecting motion-intensive blocks in a long video, for training different deep neural networks (DNN's). The methodology reduced the computational overhead by 90% (ten times) in comparison to when full-length videos are entertained. It has been observed that training networks using BoF are equally effective in terms of performance for the same network trained over the full-length dataset. In this research work, firstly, a fine-grained annotated dataset including instance and activity information has been developed for real-world volume crimes. Secondly, a BoF-based methodology has been introduced for effective training of the state-of-the-art 3D, and 2D Convolutional Neural Networks (CNNs). Lastly, a comparison between the state-of-the-art networks have been presented for malicious event recognition in videos. It has been observed that 2D CNN even with lesser parameters achieved a promising classification accuracy of 98.7% and Area under the curve (AUC) of 99.7%.

Downloads

Download data is not yet available.

References

T. Ainsworth, “Buyer beware,” Security Oz, vol. 19, pp. 18–26, 2002.

S. Calderara, U. Heinemann, A. Prati, R. Cucchiara, and N. Tishby, “Detecting anomalies in people’s trajectories using spectral graph analysis,” Computer Vision and Image Understanding., vol. 115, no. 8, pp. 1099–1111, 2011.

M. Ravanbakhsh, M. Nabi, E. Sangineto, L. Marcenaro, C. Regazzoni, and N. Sebe, “Abnormal event detection in videos using generative adversarial nets,” in 2017 IEEE International Conference on Image Processing (ICIP), 2017, pp. 1577–1581.

S. Wu, B. E. Moore, and M. Shah, “Chaotic invariants of lagrangian particle trajectories for anomaly detection in crowded scenes,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010, pp. 2054–2060.

T. Wang and H. Snoussi, “Detection of abnormal visual events via global optical flow orientation histogram,” IEEE Transactions on Information Forensics and Security, vol. 9, no. 6, pp. 988–998, 2014.

T. Xiao, C. Zhang, and H. Zha, “Learning to detect anomalies in surveillance video,” IEEE Signal Processing Letters, vol. 22, no. 9, pp. 1477– 1481, 2015.

M. U. K. Khan, H.-S. Park, and C.-M. Kyung, “Rejecting motion outliers for efficient crowd anomaly detection,” IEEE Transactions on Information Forensics and Security, vol. 14, no. 2, pp. 541–556, 2018.

F. Jiang, J. Yuan, S. A. Tsaftaris, and A. K. Katsaggelos, “Anomalous video event detection using spatiotemporal context,” Computer Vision and Image Understanding, vol. 115, no. 3, pp. 323–333, 2011.

B. T. Morris and M. M. Trivedi, “Trajectory learning for activity understanding: Unsupervised, multilevel, and long-term adaptive approach,” IEEE transactions on pattern analysis and machine intelligence, vol. 33, no. 11, pp. 2287–2301, 2011.

O. Boiman and M. Irani, “Detecting irregularities in images and in video,” International journal of computer vision, vol. 74, no. 1, pp. 17–31, 2007.

V. Reddy, C. Sanderson, and B. C. Lovell, “Improved anomaly detection in crowded scenes via cell-based analysis of foreground speed, size and texture,” in CVPR 2011 WORKSHOPS, 2011, pp. 55–61.

E. B. Ermis, V. Saligrama, P.-M. Jodoin, and J. Konrad, “Motion segmentation and abnormal behavior detection via behavior clustering,” in 2008 15th IEEE International Conference on Image Processing, 2008, pp. 769–772.

W. Li, V. Mahadevan, and N. Vasconcelos, “Anomaly detection and localization in crowded scenes,” IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 1, pp. 18–32, 2013.

K. Tahboub, A. R. Reibman, and E. J. Delp, “Accuracy prediction for pedestrian detection,” in 2017 IEEE International Conference on Image Processing (ICIP), 2017, pp. 4192–4196.

S. Amraee, A. Vafaei, K. Jamshidi, and P. Adibi, “Anomaly detection and localization in crowded scenes using connected component analysis,” Multimedia Tools and Applications, vol. 77, no. 12, pp. 14767–14782, 2018.

W. Sultani, C. Chen, and M. Shah, “Real-world anomaly detection in surveillance videos,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6479–6488.

J. Wang, J. Jiao, L. Bao, S. He, Y. Liu, and W. Liu, “Self-supervised spatiotemporal representation learning for videos by predicting motion and appearance statistics,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 4006–4015.

L. Duan, T. Hu, E. Cheng, J. Zhu, and C. Gao, “Deep convolutional neural networks for spatiotemporal crime prediction,” in Proceedings of the International Conference on Information and Knowledge Engineering (IKE), 2017, pp. 61–67.

S. Zhou, W. Shen, D. Zeng, M. Fang, Y. Wei, and Z. Zhang, “Spatial--temporal convolutional neural networks for anomaly detection and localization in crowded scenes,” Signal Processing: Image Communication, vol. 47, pp. 358–368, 2016.

V. Kaltsa, A. Briassouli, I. Kompatsiaris, L. J. Hadjileontiadis, and M. G. Strintzis, “Swarm intelligence for detecting interesting events in crowded environments,” IEEE transactions on image processing, vol. 24, no. 7, pp. 2153–2166, 2015.

M. Khari, A. K. Garg, R. Gonzalez-Crespo, and E. Verdú, “Gesture Recognition of RGB and RGB-D Static Images Using Convolutional Neural Networks,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 5, no. 7, p. 22, 2019.

T. Lima, B. Fernandes, and P. Barros, “Human action recognition with 3D convolutional neural network,” in 2017 IEEE Latin American Conference on Computational Intelligence (LA-CCI), 2017, pp. 1–6.

J. D.Pujari, R. Yakkundimath, and A. S. Byadgi, “SVM and ANN Based Classification of Plant Diseases Using Feature Reduction Technique,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 3, no. 7, p. 6, 2016.

N. L. Hakim et al., “Dynamic Hand Gesture Recognition Using 3DCNN and LSTM with FSM Context-Aware Model,” Sensors, vol. 19, no. 24, p. 5429, 2019.

F. Cronje, “Human action recognition with 3D convolutional neural networks,” University of Cape Town, 2015.

D. Tran, H. Wang, L. Torresani, J. Ray, Y. LeCun and M. Paluri, “A Closer Look at Spatiotemporal Convolutions for Action Recognition,” 2018 IEEE/ CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 6450-6459.

Z. Liu, C. Zhang, and Y. Tian, “3D-based deep convolutional neural network for action recognition with depth sequences,” Image and Vision Computing, vol. 55, pp. 93–100, 2016.

D. Gong et al., “Memorizing Normality to Detect Anomaly: Memoryaugmented Deep Autoencoder for Unsupervised Anomaly Detection,” arXiv Prepr. arXiv1904.02639, 2019.

K. Hara, H. Kataoka, and Y. Satoh, “Learning spatio-Temporal features with 3D residual networks for action recognition,” Proc. - 2017 IEEE International Conference on Computer Vision Workshops. ICCVW 2017, vol. 2018-January, pp. 3154–3160, 2017.

J. Chen, J. Hsiao, and C. M. Ho, “Residual Frames with Efficient Pseudo3D CNN for Human Action Recognition,” arXiv preprint pp. 5–9, 2020.