Point Cloud Deep Learning Solution for Hand Gesture Recognition.

César Osimani; Juan Jesus Ojeda-Castelo; José A. Piedra-Fernandez

doi:10.9781/ijimai.2023.01.001

Authors

César Osimani Universidad Blas Pascal
Juan Jesus Ojeda-Castelo University of Almería
José A. Piedra-Fernandez University of Almería

DOI:

https://doi.org/10.9781/ijimai.2023.01.001

Keywords:

Artificial Neural Networks, Computer vision, Hand Gesture, Point Cloud

Supporting Agencies

This work was funded by the EU ERDF and the Spanish Ministry of Economy and Competitiveness (MINECO) under AEI Project TIN2017-83964-R. http://acg.ual.es/projects/cosmart/

Abstract

In the last couple of years, there has been an increasing need for Human-Computer Interaction (HCI) systems that do not require touching the devices to control them, such as ATMs, self service kiosks in airports, terminals in public offices, among others. The use of hand gestures offers a natural alternative to achieve control without touching the devices. This paper presents a solution that allows the recognition of hand gestures by analyzing three-dimensional landmarks using deep learning. These landmarks are extracted by using a model created with machine learning techniques from a single standard RGB camera in order to define the skeleton of the hand with 21 landmarks distributed as follows: one on the wrist and four on each finger. This study proposes a deep neural network that was trained with 9 gestures receiving as input the 21 points of the hand. One of the main contributions, that considerably improves the performance, is a first layer of normalization and transformation of the landmarks. In our experimental analysis, we reach an accuracy of 99.87% recognizing of 9 hand gestures.

Downloads

Download data is not yet available.

References

M. Palieri, B. Morrell, A. Thakur, K. Ebadi, J. Nash, A. Chatterjee, C. Kanellakis, L. Carlone, C. Guaragnella, A. a. Aghamohammadi, “Locus: A multi-sensor lidar-centric solution for high-precision odometry and 3d mapping in real-time,” IEEE Robotics and Automation Letters, pp. 1–1, 2020.

W. Zhang, D. Yang, “Lidar-based fast 3d stockpile modeling,” in 2019 International Conference on Intelligent Computing, Automation and Systems (ICICAS), 2019, pp. 703–707.

S. Muhammad, G. Kim, “Visual object detection based lidar point cloud classification,” in 2020 IEEE International Conference on Big Data and Smart Computing (BigComp), 2020, pp. 438–440.

C. P. Hsu, B. Li, B. Solano-Rivas, A. R. Gohil, P. H. Chan, A. D. Moore, V. Donzella, “A review and perspective on optical phased array for automotive lidar,” IEEE Journal of Selected Topics in Quantum Electronics, vol. 27, no. 1, pp. 1–16, 2021.

E. de Oliveira, E. W. Gonzalez, D. G. Trevisan, L. C. de Castro Salgado, “Investigating users’ natural engagement with a 3d design approach in an egocentric vision scenario,” in 2020 22nd Symposium on Virtual and Augmented Reality (SVR), 2020, pp. 74–82.

F. Zhang, V. Bazarevsky, A. Vakunov, G. Sung, C.-L. Chang, M. Grundmann, A. Tkachenka, “Mediapipe hands: On-device real-time hand tracking,” in CVPR Workshop on Computer Vision for Augmented and Virtual Reality, June 2020.

Y. Kartynnik, A. Ablavatski, I. Grishchenko, M. Grundmann, “Real-time facial surface geometry from monocular video on mobile gpus,” in CVPR Workshop on Computer Vision for Augmented and Virtual Reality, June 2019.

V. Bazarevsky, I. Grischenko, K. Raveendran, M. Grundmann, F. Zhang, T. Zhu, “Blazepose: On-device real-time body pose tracking,” in CVPR Workshop on Computer Vision for Augmented and Virtual Reality, June 2020.

S. Shi, X. Wang, H. Li, “Pointrcnn: 3d object proposal generation and detection from point cloud,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 770–779.

Y. Zhou, O. Tuzel, “Voxelnet: End-to-end learning for point cloud based 3d object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4490–4499.

B. Li, “3d fully convolutional network for vehicle detection in point cloud,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, pp. 1513–1518, IEEE.

S. K. Arachchi, N. L. Hakim, H.-H. Hsu, S. V. Klimenko, T. K. Shih, “Realtime static and dynamic gesture recognition using mixed space features for 3d virtual world’s interactions,” in 2018 32nd International Conference on Advanced Information Networking and Applications Workshops (WAINA), 2018, pp. 627–632, IEEE.

H. Seo, S. Joo, “Influence of preprocessing and augmentation on 3d point cloud classification based on a deep neural network: Pointnet,” in 2020 20th International Conference on Control, Automation and Systems (ICCAS), 2020, pp. 895–899.

Zhirong Wu, S. Song, A. Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, J. Xiao, “3d shapenets: A deep representation for volumetric shapes,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1912–1920.

Z. Li, W. Li, H. Liu, Y. Wang, G. Gui, “Optimized pointnet for 3d object classification,” in International Conference on Advanced Hybrid Information Processing, 2019, pp. 271–278, Springer.

A. Jertec, D. Bojanić, K. Bartol, T. Pribanić, T. Petković, S. Petrak, “On using pointnet architecture for human body segmentation,” in 2019 11th International Symposium on Image and Signal Processing and Analysis (ISPA), 2019, pp. 253–257, IEEE.

L. Ge, Y. Cai, J. Weng, J. Yuan, “Hand pointnet: 3d hand pose estimation using point sets,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8417–8426.

Y. Yang, C. Feng, Y. Shen, D. Tian, “Foldingnet: Point cloud auto-encoder via deep grid deformation,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 206–215.

Y. Yu, F. Li, Y. Zheng, M. Han, X. Le, “Clustering-enhanced pointcnn for point cloud classification learning,” in 2019 International Joint Conference on Neural Networks (IJCNN), 2019, pp. 1–6.

C. R. Qi, L. Yi, H. Su, L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” in Advances in Neural Information Processing Systems, vol. 30, 2017, pp. 5099–5108, Curran Associates, Inc.

Y. Momma, W. Wang, E. Simo-Serra, S. Iizuka, R. Nakamura, H. Ishikawa, “P2net: A post-processing network for refining semantic segmentation of lidar point cloud based on consistency of consecutive frames,” in 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2020, pp. 4110–4115.

D. Kulon, R. A. Guler, I. Kokkinos, M. M. Bronstein, S. Zafeiriou, “Weaklysupervised mesh-convolutional hand reconstruction in the wild,” in CVPR Workshop on Computer Vision for Augmented and Virtual Reality, June 2020.

K. K. Verma, B. M. Singh, H. Mandoria, P. Chauhan, “Two-stage human activity recognition using 2d-convnet.,” International Journal of Interactive Multimedia & Artificial Intelligence, vol. 6, no. 2, 2020.

M. Khari, A. K. Garg, R. G. Crespo, E. Verdú, “Gesture recognition of rgb and rgb-d static images using convolutional neural networks.,” International Journal of Interactive Multimedia & Artificial Intelligence, vol. 5, no. 7, 2019.

M. Kim, J. Cho, S. Lee, Y. Jung, “Imu sensor-based hand gesture recognition for human-machine interfaces,” Sensors, vol. 19, no. 18, p. 3827, 2019.

J.-H. Kim, G.-S. Hong, B.-G. Kim, D. P. Dogra, “deepgesture: Deep learning-based gesture recognition scheme using motion sensors,” Displays, vol. 55, pp. 38–45, 2018.

A. Thakur, “American Sign Language Dataset for Image Classifcation.” https://www.kaggle.com/ayuraj/asl-dataset, 2019 [Online; accessed 2-July-2021].

A. Memo, L. Minto, P. Zanuttigh, “Exploiting Silhouette Descriptors and Synthetic Data for Hand Gesture Recognition.” https://lttm.dei.unipd.it/downloads/gesture/, 2015 [Online; accessed 2-July-2021].

P. Bao, A. I. Maqueda, C. R. del Blanco, N. García, “Image database for tiny hand gesture recognition.” https://sites.google.com/view/handgesturedb/home, 2017 [Online; accessed 2-July-2021].

C. R. Qi, H. Su, K. Mo, L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652–660.