A Hybrid Multi-Person Fall Detection Scheme Based on Optimized YOLO and ST-GCN.

Lei Liu; Yeguo Sun; Xianlei Ge

doi:10.9781/ijimai.2024.09.003

Authors

Lei Liu Huainan Normal University.
Yeguo Sun Huainan Normal University.
Xianlei Ge Huainan Normal University.

DOI:

https://doi.org/10.9781/ijimai.2024.09.003

Keywords:

Computer vision, Elderly Protection, Fall Detection, Graph Convolution Network (GCN), Human Pose Estimation

Supporting Agencies

This study received support from the following sources: the University Natural Science Foundation of Anhui Province (Grant no. 2023AH051542 and Grant no.2022AH010085).

Abstract

Human falls are a serious health issue for elderly and disabled people living alone. Studies have shown that if fallers could be helped immediately after a fall, it would greatly reduce their risk of death and the percentage of them requiring long-term treatment. As a real-time automatic fall detection solution, vision-based human fall detection technology has received extensive attention from researchers. In this paper, a hybrid model based on YOLO and ST-GCN is proposed for multi-person fall detection application scenarios. The solution uses the ST-GCN model based on a graph convolutional network to detect the fall action, and enhances the model with YOLO for accurate and fast recognition of multi-person targets. Meanwhile, our scheme accelerates the model through optimization methods to meet the model's demand for lightweight and real-time performance. Finally, we conducted performance tests on the designed prototype system and using both publicly available single-person datasets and our own multi-person dataset. The experimental results show that under better environmental conditions, our model possesses high detection accuracy compared to state-of-the-art schemes, while it significantly outperforms other models in terms of inference speed. Therefore, this hybrid model based on YOLO and ST-GCN, as a preliminary attempt, provides a new solution idea for multi-person fall detection for the elderly.

Downloads

Download data is not yet available.

References

Ageing and health, World Health Organization., 2022. Accessed: June. 8, 2024. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/ageing-and-health.

World Population Ageing: 1950–2050, Global Action on Aging., NY, USA, 2002. Accessed: June. 8, 2024. [Online]. Available: http://globalag.igc.org/ruralaging/world/ageingo.htm.

S. Usmani, A. Saboor, M. Haris, M. A. Khan, and H. Park, “Latest Research Trends in Fall Detection and Prevention Using Machine Learning: A Systematic Review,” Sensors, vol. 21, no. 15, pp. 5134-5156, 2021, doi: 10.3390/s21155134.

S.-H. Jung, J.-M. Hwang, and C.-H. Kim, “Inversion Table Fall Injury, the Phantom Menace: Three Case Reports on Cervical Spinal Cord Injury,” Healthcare, vol. 9, no. 5, pp. 492-500, 2021, doi: 10.3390/healthcare9050492.

Falls, World Health Organization., 2021. Accessed: June. 8, 2024. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/falls.

H. Ramirez, S. A. Velastín, I. Meza, E. Fabregas, D. Makris, and G. Farias, “Fall Detection and Activity Recognition Using Human Skeleton Features,” IEEE Access, vol. 9, pp. 33532-33542, 2021, doi: 10.1109/ACCESS.2021.3061626.

H. Pham, L. Khoudour, A. Crouzil, P. Zegers, and S. A. Velastín, “Video-based Human Action Recognition using Deep Learning: A Review,” ArXiv, vol. abs/2208.03775, pp. 1-25, 2022, doi: 10.48550/arXiv.2208.03775.

X. Li, J. Li, J. Lai, Z. Zheng, W. Jia, and B. Liu, “A Heterogeneous Ensemble Learning-Based Acoustic Fall Detection Method for Elderly People in Indoor Environment,” Artificial Intelligence in HCI: First International Conference, AI-HCI 2020, Held as Part of the 22nd HCI International Conference, HCII 2020, Copenhagen, Denmark, July 19–24, 2020, Proceedings 22. Springer International Publishing, 2020, pp. 369-383., doi: 10.1007/978-3-030-50334-5_25.

A. Sucerquia, J. D. López, and J. F. Vargas-Bonilla, “Real-Life/Real-Time Elderly Fall Detection with a Triaxial Accelerometer,” Sensors, vol. 18, no. 4, pp. 1101-1118, 2018, doi: 10.3390/s18041101.

J. Gutiérrez, V. Rodríguez, and S. Martín, “Comprehensive Review of Vision-Based Fall Detection Systems,” Sensors, vol. 21, no. 3, pp. 947-996, 2021, doi: 10.3390/s21030947.

R. Josyula and S. Ostadabbas, “A Review on Human Pose Estimation,” ArXiv, vol. abs/2110.06877, pp. 1-24, 2021, doi: 10.48550/arXiv.2110.06877

J.-L. Chung, L.-Y. Ong, and M. C. Leow, “Comparative Analysis of Skeleton-Based Human Pose Estimation,” Future Internet, vol. 14, no. 12, pp. 380-198,.2022, doi: 10.3390/fi14120380.

L. Mourot, L. Hoyet, F. L. Clerc, F. Schnitzler, and P. Hellier, “A Survey on Deep Learning for Skeleton‐Based Human Animation,” Computer Graphics Forum, vol. 41, no. 1, pp. 122-157, 2021, doi: 10.1111/cgf.14426.

W. W. Y. Ng, M. Zhang, and T. Wang, “Multi-Localized Sensitive Autoencoder-Attention-LSTM For Skeleton-based Action Recognition,” IEEE Transactions on Multimedia, vol. 24, pp. 1678-1690, 2022, doi: 10.1109/TMM.2021.3070127.

S. A. Khowaja and S.-L. Lee, “Skeleton-based human action recognition with sequential convolutional-LSTM networks and fusion strategies,” Journal of Ambient Intelligence Humanized Computing, vol. 13, no. 8, pp. 3729-3746, 2022, doi: 10.1007/s12652-022-03848-3 .

L. Lu, C. Zhang, K. Cao, T. Deng, and Q. Yang, “A Multichannel CNN-GRU Model for Human Activity Recognition,” IEEE Access, vol. 10, pp. 66797-66810, 2022, doi: 10.1109/ACCESS.2022.3185112.

G. Weiss, Y. Goldberg, and E. Yahav, “On the Practical Computational Power of Finite Precision RNNs for Language Recognition,” ArXiv, vol. abs/1805.04908, pp. 1-9, 2018, doi: 10.48550/arXiv.1805.04908.

X. Guo and J. Choi, “Human Motion Prediction via Learning Local Structure Representations and Temporal Dependencies,” Proceedings of the AAAI Conference on Artificial Intelligence,Honolulu, Hawaii, USA, January 27–February 1, 2019, AAAI Press Publishing, vol. 33, no. 1, pp: 2580-2587, 2019, doi: 10.1609/aaai.v33i01.33012580.

H. Wang, E. S. L. Ho, H. P. H. Shum, and Z. Zhu, “Spatio-Temporal Manifold Learning for Human Motions via Long-Horizon Modeling,”IEEE Transactions on Visualization Computer Graphics, vol. 27, no. 1, pp. 216-227, 2019, doi: 10.1109/TVCG.2019.2936810.

Y. Li et al., “Efficient convolutional hierarchical autoencoder for human motion prediction,” The Visual Computer, vol. 35, pp. 1143-1156, 2019, doi: 10.1007/s00371-019-01692-9.

C. Li, Z. Zhang, W. S. Lee, and G. H. Lee, “Convolutional Sequence to Sequence Model for Human Dynamics,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, June 18-22, 2018, pp. 5226-5234, 2018, doi: 10.1109/CVPR.2018.00548.

C. Zang, M. Pei, and Y. Kong, “Few-shot Human Motion Prediction via Learning Novel Motion Dynamics,” Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20), Yokohama, Japan, July 11-17, 2020, pp. 846-852, 2020, doi: 10.24963/ijcai.2020/118.

M. Al-Faris, J. Chiverton, D. L. Ndzi, and A. I. Ahmed, “A Review on Computer Vision-Based Methods for Human Action Recognition,” Journal of Imaging, vol. 6, no. 6, pp. 46-77, 2020, doi:10.3390/jimaging6060046.

K. Cheng, Y. Zhang, X. He, W. Chen, J. Cheng, and H. Lu, “Skeleton-Based Action Recognition with Shift Graph Convolutional Network,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 180-189, doi: 10.1109/cvpr42600.2020.00026.

L. Feng, Y. Zhao, W. Zhao, and J. Tang, “A comparative review of graph convolutional networks for human skeleton-based action recognition,” Artificial Intelligence Review, vol. 55, pp. 4275-4305, 2021, doi: 10.1007/s10462-021-10107-y.

L. Huang, Y. Huang, W. Ouyang, and L. Wang, “Part-Level Graph Convolutional Network for Skeleton-Based Action Recognition,” Proceedings of the AAAI conference on artificial intelligence, New York, USA, February 7–12, 2020, AAAI Press Publishing, vol. 34, no. 7, pp. 11045-11052, 2020, doi: 10.1609/AAAI.V34I07.6759.

U. Bhattacharya, T. Mittal, R. Chandra, T. Randhavane, A. Bera, and D. Manocha, “STEP: Spatial Temporal Graph Convolutional Networks for Emotion Perception from Gaits,” Proceedings of the AAAI conference on artificial intelligence, New York, USA, February 7–12, 2020, AAAI Press Publishing, vol. 34, no. 2, pp. 1342-1350, 2020, doi: 10.1609/aaai.v34i02.5490.

S. Yan, Y. Xiong, and D. Lin, “Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition,” Proceedings of the AAAI conference on artificial intelligence, New Orleans, Lousiana, USA, February 2-7, 2018,AAAI Press Publishing, vol. 32, no. 1, pp. 1-9, 2018, doi: 10.1609/aaai.v32i1.12328.

W. Peng, X. Hong, H. Chen, and G. Zhao, “Learning Graph Convolutional Network for Skeleton-based Human Action Recognition by Neural Searching,” Proceedings of the AAAI conference on artificial intelligence, New York, USA, February 7–12, 2020, AAAI Press Publishing, vol. 34, no. 3, pp. 2669-2676, 2020, doi: 10.1609/AAAI.V34I03.5652.

L. Shi, Y. Zhang, J. Cheng, and H. Lu, “Skeleton-Based Action Recognition With Multi-Stream Adaptive Graph Convolutional Networks,” IEEE Transactions on Image Processing, vol. 29, pp. 9532-9545, 2020, doi: 10.1109/TIP.2020.3028207.

D. Zhang, H. Wang, C. Weng, and X. Shi, “Video Human Action Recognition with Channel Attention on ST-GCN,” Journal of Physics: Conference Series, IOP Publishing, vol. 2010, no. 1, pp. 012131-012136, 2021, doi: 10.1088/1742-6596/2010/1/012131.

J. Cai, N. Jiang, X. Han, K. Jia, and J. Lu, “JOLO-GCN: Mining Joint-Centered Light-Weight Information for Skeleton-Based Action Recognition,” Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision(WACV), 2021, pp. 2735-2744, doi: 10.1109/WACV48630.2021.00278.

M. Taufeeque, S. Koita, N. Spicher, and T. M. Deserno, “Multi-camera, multi-person, and real-time fall detection using long short term memory,” Medical Imaging 2021: Imaging Informatics for Healthcare, Research, and Applications. SPIE, vol. 11601, pp. 35-42., 2021, doi: 10.1117/12.2580700.

M. Meratwal, N. Spicher, and T. M. Deserno, “Multi-camera and multi-person indoor activity recognition for continuous health monitoring using long short term memory,” Medical imaging 2022: imaging informatics for healthcare, research, and applications. SPIE, vol. 12307, pp. 64-71, 2022, doi: 10.1117/12.2612642.

T. Xu, J. Chen, Z. Li, and Y. Cai, “Fall Detection Based on Person Detection and Multi-target Tracking,” 11th International Conference on Information Technology in Medicine and Education (ITME). IEEE, pp. 60-65, 2021, doi: 10.1109/ITME53901.2021.00023.

S. Maldonado-Bascón, C. Iglesias-Iglesias, P. Martín-Martín, and S. Lafuente-Arroyo, “Fallen People Detection Capabilities Using Assistive Robot,” Electronics, vol. 8, no. 9, pp. 915-934, 2019, doi: 10.3390/ELECTRONICS8090915.

Y. Zhang, J. Yu, Y. Chen, W. Yang, W. Zhang, and Y. He, “Real-time strawberry detection using deep neural networks on embedded system (rtsd-net): An edge AI application,” Comput. Electron. Agric, vol. 192, pp. 106586-106604, 2022, doi: 10.1016/j.compag.2021.106586.

A. Singh et al., “Artificial intelligence in edge devices,” Advances in Computers, vol. 127, pp. 437-484, 2022, doi: 10.1016/bs.adcom.2022.02.013.

R. Poojary and A. Pai, “Comparative Study of Model Optimization Techniques in Fine-Tuned CNN Models,” International Conference on Electrical Computing Technologies Applications, pp. 1-4, 2019, doi: 10.1109/ICECTA48151.2019.8959681.

R. Mishra, H. P. Gupta, and T. Dutta, “A Survey on Deep Neural Network Compression: Challenges, Overview, and Solutions,” ArXiv, vol. abs/2010.03954, pp. 1-14, 2020, doi: 10.48550/arXiv.2010.03954.

B. L. Deng, G. Li, S. Han, L. Shi, and Y. Xie, “Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey,” Proceedings of the IEEE, vol. 108, no. 4, pp. 485-532, 2020, doi: 10.1109/JPROC.2020.2976475.

T. Choudhary, V. K. Mishra, A. Goswami, and S. Jagannathan, “A comprehensive survey on model compression and acceleration,” Artificial Intelligence Review, vol. 53, pp. 5113-5155, 2020, doi: 10.1007/s10462-020-09816-7.

M. Sandler, A. G. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: Inverted Residuals and Linear Bottlenecks,” Proceedings of the IEEE/CVF Conference on Computer Vision Pattern Recognition, Salt Lake City, USA, June 18-22, 2018, pp. 4510-4520, 2018, doi: 10.1109/CVPR.2018.00474.

J. Han and Y. Yang, “L-Net: lightweight and fast object detector-based ShuffleNetV2,” Journal of Real-Time Image Processing, vol. 18, no. 6, pp. 2527-2538, 2021, doi: 10.1007/s11554-021-01145-4.

J.-H. Kim, S. Chang, and N. Kwak, “PQK: Model Compression via Pruning, Quantization, and Knowledge Distillation,” ArXiv, vol. abs/2106.14681, pp. 1-5, 2021, doi: 10.48550/arXiv.2106.14681.

M. Zhu and S. Gupta, “To prune, or not to prune: exploring the efficacy of pruning for model compression,” ArXiv, vol. abs/1710.01878, pp. 1-11, 2017, doi: 10.48550/arXiv.1710.01878.

A. Polino, R. Pascanu, and D. Alistarh, “Model compression via distillation and quantization,” ArXiv, vol. abs/1802.05668, pp. 1-21, 2018, doi: 10.48550/arXiv.1802.05668.

L. Chen, Y. Chen, J. Xi, and X. Le, “Knowledge from the original network: restore a better pruned network with knowledge distillation,” Complex Intelligent Systems, vol. 8, pp. 709-718, 2021, doi: 10.1007/s40747-020-00248-y.

Y.-W. Hong, J.-S. Leu, and M. Faisal, “Analysis of Model Compression Using Knowledge Distillation,” IEEE Access, vol. 10, pp. 85095-85105, 2022, doi: 10.1109/access.2022.3197608.

V. Viswanatha, K. ChandanaR, and C. RamachandraA., “Real Time Object Detection System with YOLO and CNN Models: A Review,” ArXiv, vol. abs/2208.00773, pp. 1-8, 2022, doi: 10.48550/arXiv.2208.00773.

J. Zheng, H. Wu, H. Zhang, Z. Wang, and W. Xu, “Insulator-Defect Detection Algorithm Based on Improved YOLOv7,” Sensors, vol. 22, no. 22, pp. 8801-8823, 2022, doi: 10.3390/s22228801.

Z. Cao, G. Hidalgo, T. Simon, S.-E. Wei, and Y. Sheikh, “OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields,” IEEE Transactions on Pattern Analysis Machine Intelligence, vol. 43, pp. 172-186, 2018, doi: 10.1109/TPAMI.2019.2929257.

E. Alam, A. Sufian, P. Dutta, and M. Leo, “Vision-based Human Fall Detection Systems using Deep Learning: A Review,” Computers in biology medicine, vol. 146, pp. 105626-105664, 2022, doi: 10.1016/j.compbiomed.2022.105626.

E. Auvinet, C. Rougier, J.Meunier, A. St-Arnaud, and J. Rousseau, “Multiple cameras fall dataset,” D.-U. d. Montréal, Ed., ed, 2010.

I. Charfi, J. Mitéran, J. Dubois, M. Atri, and R. Tourki, “Optimized spatio-temporal descriptors for real-time fall detection: comparison of support vector machine and Adaboost-based classification,” Journal of Electronic Imaging, vol. 22, no. 4, pp. 041106-041123, 2013, doi: 10.1117/1.JEI.22.4.041106.

K. Adhikari, A. Bouchachia, and H. Nait-Charif, “Activity recognition for indoor fall detection using convolutional neural network,” 15th IAPR International Conference on Machine Vision Applications(MVA), pp. 81-84, 2017, doi: 10.23919/MVA.2017.7986795.

B. Kwolek and M. Kepski, “Human fall detection on embedded platform using depth maps and wireless accelerometer,” Computer methods programs in biomedicine, vol. 117, no. 3, pp. 489-501, 2014, doi: 10.1016/j.cmpb.2014.09.005.

A. Shimanoe and M. Amemiya, “Identifying factors of public acceptance for usage of CCTV image,” Journal of the City Planning Institute of Japan, vol.54, no. 3, pp. 750-757, 2019, doi: 10.11361/journalcpij.54.750.

A. Zatserkovnyy and E. Nurminski, “Identification of Location and Camera Parameters for Public Live Streaming Web Cameras,” Mathematics, vol. 10, no. 9, pp. 3601-3620, 2022, doi: 10.3390/math10193601.

C.-B. Lin, Z. Dong, W.-K. Kuan, and Y.-F. J. A. S. Huang, “A Framework for Fall Detection Based on OpenPose Skeleton and LSTM/GRU Models,” Applied Sciences, vol. 11, no. 1, pp. 329-348, 2020, doi: 10.3390/app11010329.

L. Zhao and L. Wang, “A new lightweight network based on MobileNetV3,”KSII Transactions on Internet Information Systems, vol. 16, no. 1, pp. 1-15, 2022, doi: 10.3837/tiis.2022.01.001.

J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, “Squeeze-and-Excitation Networks,” IEEE Transactions on Pattern Analysis Machine Intelligence, vol. 42, pp. 2011-2023, 2018, doi: 10.1109/TPAMI.2019.2913372.

A. G. Howard et al., “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” ArXiv, vol. abs/1704.04861, pp. 1-9, 2017, doi: 10.48550/arXiv.1704.04861.

E. Kurniawan et al., “Deep neural network-based physical distancing monitoring system with tensorRT optimization,” International Journal of Advances in Intelligent Informatics, vol. 8, no. 2, pp. 1-16, 2022, doi: DOI:10.26555/ijain.v8i2.824.

L. Liu, E. B. Blancaflor, and M. B. Abisado, “A Lightweight Multi-Person Pose Estimation Scheme Based on Jetson Nano,” Applied Computer Science, vol. 19, no. 1, pp. 1-14, 2023, doi: 10.35784/acs-2023-01.

M. S. Pavithra, K. Saruladha, and K. Sathyabama, “GRU Based Deep Learning Model for Prognosis Prediction of Disease Progression,” in 3rd International Conference on Computing Methodologies Communication, vol. 2019, pp. 840-844, 2019, doi: 10.1109/ICCMC.2019.8819830.

A. Carlier, P. Peyramaure, K. Favre, and M. Pressigout, “Fall Detector Adapted to Nursing Home Needs through an Optical-Flow based CNN,” 42nd Annual International Conference of the IEEE Engineering in Medicine Biology Society, vol. 2020, pp. 5741-5744, 2020, doi: 10.1109/EMBC44109.2020.9175844.

P. Zhang, C. Lan, J. Xing, W. Zeng, J. Xue, and N. Zheng, “View Adaptive Neural Networks for High Performance Skeleton-Based Human Action Recognition,” IEEE Transactions on Pattern Analysis Machine Intelligence, vol. 41, no. 8, pp. 1963-1978, 2019, doi: 10.1109/TPAMI.2019.2896631.

I. Kareem, S. F. Ali, and A. Sheharyar, “Using Skeleton based Optimized Residual Neural Network Architecture of Deep Learning for Human Fall Detection,” in IEEE 23rd International Multitopic Conference, vol. 2020, pp. 1-5, 2020, doi: 10.1109/INMIC50486.2020.9318061.

S. K. Yadav, K. Tiwari, H. M. Pandey, and S. A. Akbar, “Skeleton-based human activity recognition using ConvLSTM and guided feature learning,” Soft Computing, vol. 26, no. 2, pp. 877-890, 2021, doi: 10.1007/s00500-021-06238-7.

Y. Zheng, D. Zhang, L. Yang, and Z. Zhou, “Fall detection and recognition based on GCN and 2D Pose,” 6th International Conference on Systems Informatics, IEEE, vol. 2019, pp. 558-562, 2019, doi: 10.1109/ICSAI48974.2019.9010197.

J. Lee and S. J. Kang, “Skeleton action recognition using Two-Stream Adaptive Graph Convolutional Networks,” 36th International Technical Conference on Circuits/Systems, Computers Communications (ITC-CSCC). IEEE, vol. 2021, pp. 1-3, 2021, doi: 10.23919/CCC55666.2022.9901587.

B. U. Maheswari, R. Sonia, M. P. Rajakumar, and J. Ramya, “Novel Machine Learning for Human Actions Classification Using Histogram of Oriented Gradients and Sparse Representation,” Information Technology and Control, vol. 50, no. 4, pp. 686-705, 2021, doi: 10.5755/j01.itc.50.4.27845.

S. Kiran et al., “Multi-Layered Deep Learning Features Fusion for Human Action Recognition,” Computers Materials & Continua, vol. 69, no. 3, pp. 1-15, 2021. doi: 10.32604/cmc.2021.017800.