Multi-Class Dental CBCT Segmentation in Data-Constrained Scenarios Through Transformers

Rafael C. Giménez-Aguilar; Sergio Paraíso-Medina; Miguel García-Remesal; Guillermo Jesús Pradíes Ramiro; Monica Bonfanti-Gris; Raúl Alonso-Calvo

doi:10.9781/ijimai.2025.03.003

Authors

Rafael C. Giménez-Aguilar Universidad Politécnica de Madrid
Sergio Paraíso-Medina Universidad Politécnica de Madrid
Miguel García-Remesal Universidad Politécnica de Madrid
Guillermo Jesús Pradíes Ramiro Universidad Complutense de Madrid
Monica Bonfanti-Gris Universidad Complutense de Madrid
Raúl Alonso-Calvo Universidad Politécnica de Madrid

DOI:

https://doi.org/10.9781/ijimai.2025.03.003

Keywords:

Dental CBCT, Deep Learning, Instance Segmentation, Multiclass Segmentation, Transformer

Supporting Agencies

This research Project has been funded by the Comunidad de Madrid through the call Research Grants for Young Investigators from Universidad Politécnica de Madrid within the project NanoMoDL with grant number APOYO-JOVENES-21-MJK8FF-90-8I2C6U, and by the ITI Research Grant within the project “Bridging Centers, Enhancing Knowledge: A Research Protocol on Federated Learning for Dental Implant Classification and Pathology Identification in Periapical Radiographs.” With grant number 1868-2024.

Abstract

Accurate segmentation of dental structures from cone-beam computed tomography (CBCT) images has become an active research field due to the widespread use of this technology in clinical practice. In recent years, contributions have shifted from traditional computer vision methods to deep learning-based approaches. However, most of these works are based solely on convolutional neural networks (CNNs), whereas the image segmentation state-of-the-art is currently moving towards attention-based architectures. Furthermore, contributions on dental CBCTs predominantly present methods focused on a single object category, mainly teeth. In this article we tackle the segmentation of multiple oral structures by implementing previously unutilized query-based segmentation transformers. The proposed method achieves similar results to the stateof-the-art, especially on tooth segmentation, while employing a considerably smaller training dataset than prior contributions.

Downloads

Download data is not yet available.

References

F. A. Yalda, J. Holroyd, M. Islam, C. Theodorakou, and K. Horner, “Current practice in the use of cone beam computed tomography: a survey of UK dental practices,” British Dental Journal, vol. 226, no. 2, pp. 115–124, Jan. 2019, doi: 10.1038/sj.bdj.2019.49.

A. J. Pakchoian DDS, “The Use of Cone Beam in Private Dental Practices in the United States: Cost and Reporting Patterns,” Master’s Thesis, University of Connecticut, 2016.

L. Lenchik et al., “Automated Segmentation of Tissues Using CT and MRI: A Systematic Review,” Academic Radiology, vol. 26, no. 12, pp. 1695–1706, Dec. 2019, doi: 10.1016/j.acra.2019.07.006.

N. O’Mahony et al., “Deep Learning vs. Traditional Computer Vision,” in Advances in Computer Vision, vol. 943, K. Arai and S. Kapoor, Eds., in Advances in Intelligent Systems and Computing, vol. 943. , Cham: Springer International Publishing, 2020, pp. 128–144. doi: 10.1007/978- 3-030-17795-9_10.

Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, “A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 12, pp. 6999–7019, 2022, doi: 10.1109/TNNLS.2021.3084827.

H. Lamecker et al., “Automatic segmentation of mandibles in low-dose CT-data,” International Journal of Computer Assisted Radiology and Surgery, vol. 1, p. 393, 2006.

D. Kainmueller, H. Lamecker, H. Seim, M. Zinser, and S. Zachow, “Automatic Extraction of Mandibular Nerve and Bone from Cone-Beam CT Data,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2009, G.-Z. Yang, D. Hawkes, D. Rueckert, A. Noble, and C. Taylor, Eds., Berlin, Heidelberg: Springer Berlin Heidelberg, 2009, pp. 76–83.

H. Gao and O. Chae, “Individual tooth segmentation from CT images using level set method with shape and intensity prior,” Pattern Recognition, vol. 43, no. 7, pp. 2406–2417, 2010, doi: https://doi.org/10.1016/j.patcog.2010.01.010.

N. T. Duy, H. Lamecker, D. Kainmueller, and S. Zachow, “Automatic Detection and Classification of Teeth in CT Data,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2012, N. Ayache, H. Delingette, P. Golland, and K. Mori, Eds., Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 609–616.

H.-T. Yau, T.-J. Yang, and Y.-C. Chen, “Tooth model reconstruction based upon data fusion for orthodontic treatment simulation,” Computers in Biology and Medicine, vol. 48, pp. 8–16, 2014, doi: https://doi.org/10.1016/j.compbiomed.2014.02.001.

Y. Gan, Z. Xia, J. Xiong, G. Li, and Q. Zhao, “Tooth and Alveolar Bone Segmentation From Dental Computed Tomography Images,” IEEE Journal of Biomedical and Health Informatics, vol. 22, no. 1, pp. 196–204, 2018, doi: 10.1109/JBHI.2017.2709406.

Y. Fan et al., “Marker-based watershed transform method for fully automatic mandibular segmentation from CBCT images,” Dentomaxillofacial Radiology, vol. 48, no. 2, p. 20180261, 2019, doi: 10.1259/dmfr.20180261.

D. X. Ji, S. H. Ong, and K. W. C. Foong, “A level-set based approach for anterior teeth segmentation in cone beam computed tomography images,” Computers in Biology and Medicine, vol. 50, pp. 116–128, 2014, doi: https://doi.org/10.1016/j.compbiomed.2014.04.006.

L. Hiew, S. Ong, K. W. Foong, and C. Weng, “Tooth segmentation from cone-beam CT using graph cut,” in Proceedings of the Second APSIPA Annual Summit and Conference, ASC, Singapore, 2010, pp. 272–275.

J. Keustermans, D. Vandermeulen, and P. Suetens, “Integrating Statistical Shape Models into a Graph Cut Framework for Tooth Segmentation,” in Machine Learning in Medical Imaging, F. Wang, D. Shen, P. Yan, and K. Suzuki, Eds., Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 242–249.

P. Mortaheb, M. Rezaeian, and H. Soltanian-Zadeh, “Automatic dental CT image segmentation using mean shift algorithm,” in 2013 8th Iranian Conference on Machine Vision and Image Processing (MVIP), 2013, pp. 121–126. doi: 10.1109/IranianMVIP.2013.6779962.

Z. Cui, C. Li, and W. Wang, “ToothNet: Automatic Tooth Instance Segmentation and Identification From Cone Beam CT Images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2019. doi: 10.1109/CVPR.2019.00653.

M. A. A. Hegazy, M. H. Cho, M. H. Cho, and S. Y. Lee, “U-net based metal segmentation on projection domain for metal artifact reduction in dental CT,” Biomedical Engineering Letters, vol. 9, no. 3, pp. 375–385, Aug. 2019, doi: 10.1007/s13534-019-00110-2.

S. Lee, S. Woo, J. Yu, J. Seo, J. Lee, and C. Lee, “Automated CNN-Based Tooth Segmentation in Cone-Beam CT for Dental Implant Planning,” IEEE Access, vol. 8, pp. 50507–50518, 2020, doi: 10.1109/ACCESS.2020.2975826.

Y. Rao, Y. Wang, F. Meng, J. Pu, J. Sun, and Q. Wang, “A Symmetric Fully Convolutional Residual Network With DCRF for Accurate Tooth Segmentation,” IEEE Access, vol. 8, pp. 92028–92038, 2020, doi: 10.1109/ ACCESS.2020.2994592.

Y. Chen et al., “Automatic Segmentation of Individual Tooth in Dental CBCT Images From Tooth Surface Map by a Multi-Task FCN,” IEEE Access, vol. 8, pp. 97296–97309, 2020, doi: 10.1109/ACCESS.2020.2991799.

T. J. Jang, K. C. Kim, H. C. Cho, and J. K. Seo, “A fully automated method for 3D individual tooth identification and segmentation in dental CBCT,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1, 2021, doi: 10.1109/TPAMI.2021.3086072.

H. Wang, J. Minnema, K. J. Batenburg, T. Forouzanfar, F. J. Hu, and G. Wu, “Multiclass CBCT Image Segmentation for Orthodontics with Deep Learning,” Journal of Dental Research, vol. 100, no. 9, pp. 943–949, 2021, doi: 10.1177/00220345211005338.

Z. Zheng, H. Yan, F. C. Setzer, K. J. Shi, M. Mupparapu, and J. Li, “Anatomically Constrained Deep Learning for Automating Dental CBCT Segmentation and Lesion Detection,” IEEE Transactions on Automation Science and Engineering, vol. 18, no. 2, pp. 603–614, 2021, doi: 10.1109/ TASE.2020.3025871.

N. Morgan, A. Van Gerven, A. Smolders, K. de Faria Vasconcelos, H. Willems, and R. Jacobs, “Convolutional neural network for automatic maxillary sinus segmentation on cone-beam computed tomographic images,” Scientific Reports, vol. 12, no. 1, p. 7523, May 2022, doi: 10.1038/ s41598-022-11483-3.

Z. Cui et al., “A fully automatic AI system for tooth and alveolar bone segmentation from cone-beam CT images,” Nature Communications, vol. 13, no. 1, p. 2096, Apr. 2022, doi: 10.1038/s41467-022-29637-2.

G. Dot, A. Chaurasia, G. Dubois, C. Savoldelli, S. Haghighat, S. Azimian, A. R. Taramsari, G. Sivaramakrishnan, J. Issa, A. Dubey, T. Schouman, y L. Gajny, “DentalSegmentator: Robust open source deep learning-based CT and CBCT image segmentation,” Journal of Dentistry, vol. 147, p. 105130, Aug. 2024, doi: 10.1016/j.jdent.2024.105130.

F. Hu, Z. Chen, and F. Wu, “A novel difficult-to-segment samples focusing network for oral CBCT image segmentation,” Scientific Reports, vol. 14, no. 1, p. 5068, Mar. 2024, doi: 10.1038/s41598-024-55522-7.

Y. Jing, J. Liu, W. Liu, Z. Yang, Z. Zhou, and Z. Yu, “USCT: Uncertaintyregularized symmetric consistency learning for semi-supervised teeth segmentation in CBCT,” Biomedical Signal Processing and Control, vol. 91, p. 106032, May 2024, doi: 10.1016/j.bspc.2024.106032.

F. Nogueira-Reis, N. Morgan, I. R. Suryani, C. P. M. Tabchoury, and R. Jacobs, “Full virtual patient generated by artificial intelligence-driven integrated segmentation of craniomaxillofacial structures from CBCT images,” Journal of Dentistry, vol. 141, p. 104829, Feb. 2024, doi: 10.1016/j. jdent.2023.104829.

C. Wang, J. Yang, B. Wu, R. Liu, and P. Yu, “Trans-VNet: Transformerbased tooth semantic segmentation in CBCT images,” Biomedical Signal Processing and Control, vol. 97, p. 106666, Nov. 2024, doi: 10.1016/j.bspc.2024.106666.

S. Khan, M. Naseer, M. Hayat, S. W. Zamir, F. S. Khan, and M. Shah, “Transformers in Vision: A Survey,” ACM Computing Surveys (CSUR), vol. 54, no. 10s, Sep. 2022, doi: 10.1145/3505244.

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, y N. Houlsby, “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” arXiv preprint, arXiv:2010.11929, 2020. [Online]. Available: https://arxiv.org/abs/2010.11929, doi: 10.48550/ arXiv.2010.11929.

B. Cheng, I. Misra, A. G. Schwing, A. Kirillov, and R. Girdhar, “Maskedattention mask transformer for universal image segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1290–1299.

T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, y C. L. Zitnick, “Microsoft COCO: Common Objects in Context,” en Computer Vision – ECCV 2014, D. Fleet, T. Pajdla, B. Schiele, y T. Tuytelaars, Eds. Cham, Switzerland: Springer International Publishing, 2014, pp. 740–755.

M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The Cityscapes Dataset for Semantic Urban Scene Understanding,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016.

B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso, and A. Torralba, “Scene Parsing through ADE20K Dataset,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI: IEEE, Jul. 2017, pp. 5122–5130. doi: 10.1109/CVPR.2017.544.

G. Neuhold, T. Ollmann, S. R. Bulo, and P. Kontschieder, “The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes,” in 2017 IEEE International Conference on Computer Vision (ICCV), Venice: IEEE, Oct. 2017, pp. 5000–5009. doi: 10.1109/ICCV.2017.534.

L. Wang, Y. Gao, F. Shi, G. Li, K.-C. Chen, Z. Tang, J. J. Xia, and D. Shen, “Automated segmentation of dental CBCT image with prior-guided sequential random forests: Automated segmentation of dental CBCT image,” Medical Physics, vol. 43, no. 1, pp. 336–346, Dec. 2015, doi: 10.1118/1.4938267.

O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi, Eds., Cham: Springer International Publishing, 2015, pp. 234–241.

D. M. Pelt and J. A. Sethian, “A mixed-scale dense convolutional neural network for image analysis,” Proceedings of the National Academy of Sciences, vol. 115, no. 2, pp. 254–259, 2018, doi: 10.1073/pnas.1715832114.

K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), Oct. 2017.

D. Bahdanau, K. Cho, and Y. Bengio, “Neural Machine Translation by Jointly Learning to Align and Translate,” 2014, arXiv. doi: 10.48550/ARXIV.1409.0473.

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds. Curran Associates, Inc., 2017. [Online]. Available: https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

E. A. Nadaraya, “On Estimating Regression,” Theory of Probability & Its Applications, vol. 9, no. 1, pp. 141–142, 1964, doi: 10.1137/1109020.

G. S. Watson, “Smooth Regression Analysis,” Sankhyā: The Indian Journal of Statistics, Series A (1961-2002), vol. 26, no. 4, pp. 359–372, 1964.

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota: Association for Computational Linguistics, Jun. 2019, pp. 4171–4186. doi: 10.18653/ v1/N19-1423.

J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” arXiv preprint, arXiv:1607.06450, 2016. [Online]. Available: https://arxiv.org/abs/1607.06450.

W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, and D. Liang, “Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 568–578.

Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin Transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Oct. 2021, pp. 10012–10022.

Y. Liu, Y. Zhang, Y. Wang, F. Hou, J. Yuan, J. Tian, Y. Zhang, Z. Shi, J. Fan, and Z. He, “A survey of visual transformers,” IEEE Transactions on Neural Networks and Learning Systems, vol. 35, no. 6, pp. 7478–7498, June 2024, doi: 10.1109/TNNLS.2022.3227717.

J. Brooks, “COCO Annotator.” 2019. [Online]. Available: https://github.com/jsbroks/coco-annotator/

B. Cheng, A. G. Schwing, and A. Kirillov, “Per-Pixel Classification is Not All You Need for Semantic Segmentation,” in Neural Information Processing Systems, 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:235829267

T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature Pyramid Networks for Object Detection,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI: IEEE, Jul. 2017, pp. 936–944. doi: 10.1109/CVPR.2017.106.

C. H. Sudre, W. Li, T. Vercauteren, S. Ourselin, and M. Jorge Cardoso, “Generalised Dice Overlap as a Deep Learning Loss Function for Highly Unbalanced Segmentations,” in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, M. J. Cardoso, T. Arbel, G. Carneiro, T. Syeda-Mahmood, J. M. R. S. Tavares, M. Moradi, A. Bradley, H. Greenspan, J. P. Papa, A. Madabhushi, J. C. Nascimento, J. S. Cardoso, V. Belagiannis, and Z. Lu, Eds., Cham: Springer International Publishing, 2017, pp. 240–248.

F. Bolelli, S. Allegretti, L. Baraldi, and C. Grana, “Spaghetti Labeling: Directed Acyclic Graphs for Block-Based Connected Components Labeling,” IEEE Transactions on Image Processing, vol. 29, pp. 1999–2012, 2020, doi: 10.1109/TIP.2019.2946979.

I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017.

S. Minaee, Y. Y. Boykov, F. Porikli, A. J. Plaza, N. Kehtarnavaz, and D. Terzopoulos, “Image Segmentation Using Deep Learning: A Survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1, 2021, doi: 10.1109/TPAMI.2021.3059968.