Dataset and Baselines for IID and OOD Image Classification Considering Data Quality and Evolving Environments.

Zhuo Zhang; Yang Li; Yicheng Gong; Yue Yang; Shukun Ma; Xiaolan Guo; Sezai Ercisli

doi:10.9781/ijimai.2023.01.007

Authors

Zhuo Zhang Tianjin University
Yang Li Shihezi University
Yicheng Gong Tianjin University
Yue Yang Tianjin University
Shukun Ma Tianjin University
Xiaolan Guo Tianjin University
Sezai Ercisli Atatürk University

DOI:

https://doi.org/10.9781/ijimai.2023.01.007

Keywords:

Active Learning, Data Quality, Efficient Dataset, Evolving Environments, Generalization

Supporting Agencies

This work was supported by the National Natural Science Foundation of China under grant No.32101612 and No.61871283.

Abstract

At present, artificial intelligence is in a period of rapid development, and deep learning has begun to be applied in various fields. Data, as a key part of the deep learning, its efficiency and stability, will directly affect the performance of the model, so it is valued by people. In order to make the dataset efficient, many active learning methods have been proposed, the dataset containing independent identically distribution (IID) samples is reduced with excellent performance; in order to make the dataset more stable, it should be solved that the model encounters out-of-distribution (OOD) samples to improve generalization performance. However, the current active learning method design and the method of adding OOD samples lack guidance, and people do not know what samples should be selected and which OOD samples will be added to better improve the generalization performance. In this paper, we propose a dataset containing a variety of elements called a dataset with Complete Sample Elements(CSE), the labels such as rotation angle and distance in addition to the common classification labels. These labels can help people analyze the distribution characteristics of each element of an efficient dataset, thereby inspiring new active learning methods; we also construct a corresponding OOD test set, which can not only detect the generalization performance of the model, but also helps explore metrics between OOD samples and existing dataset to guide the selected method of OOD samples, so that it can improve generalization efficiently. In this paper, we explore the distribution characteristics of efficient datasets in terms of angle element, and confirm that an efficient dataset tends to contain samples with different appearance. At the same time, experiments have proved the positive influence of the addition of OOD samples on the generalization performance of dataset.

Downloads

Download data is not yet available.

References

A. Diaz-Pinto, N. Ravikumar, R. Attar, A. Suinesiaputra, Y. Zhao, E. Levelt, E. Dall’Armellina, M. Lorenzi, Q. Chen, T. D. Keenan, et al., “Predicting myocardial infarction through retinal scans and minimal personal information,” Nature Machine Intelligence, pp. 1–7, 2022.

Y. Li, J. Yang, Z. Zhang, J. Wen, P. Kumar, “Healthcare data quality assessment for cybersecurity intelligence,” IEEE Transactions on Industrial Informatics, 2022.

V. Srivastava, S. Gupta, G. Chaudhary, A. Balodi, M. Khari, V. García Díaz, “An enhanced texture- based feature extraction approach for classification of biomedical images of ct-scan of lungs,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 6, no. 7, pp. 18-25, 2021.

J. Nie, Y. Wang, Y. Li, X. Chao, “Sustainable computing in smart agriculture: survey and challenges,” Turkish Journal of Agriculture and Forestry, vol. 46, no. 4, pp. 550–566, 2022.

S. Qiu, K. Cheng, T. Zhou, R. Tahir, L. Ting, “An EGG signal recognition algorithm during epileptic seizure based on distributed edge computing,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 7, no. 5, pp. 6-13, 2022.

S.-H. Chen, C.-W. Wang, I. Tai, K.-P. Weng, Y.-H. Chen, K.-S. Hsieh, et al., “Modified YOLOv4-densenet algorithm for detection of ventricular septal defects in ultrasound images,” International Journal of Interactive Multimedia and Artificial Intelligence, vol.6, no.7, pp. 101-108, 2021.

Y. Li, J. Yang, “Few-shot cotton pest recognition and terminal realization,” Computers and Electronics in Agriculture, vol. 169, p. 105240, 2020.

J. Yang, X. Guo, Y. Li, F. Marinello, S. Ercisli, Z. Zhang, “A survey of few-shot learning in smart agriculture: developments, applications, and challenges,” Plant Methods, vol. 18, no. 1, pp. 1–12, 2022.

Y. Wang, Q. Yao, J. T. Kwok, L. M. Ni, “Generalizing from a few examples: A survey on few-shot learning,” ACM computing surveys (csur), vol. 53, no. 3, pp. 1–34, 2020.

Y. Li, J. Yang, “Meta-learning baselines and database for few-shot classification in agriculture,” Computers and Electronics in Agriculture, vol. 182, p. 106055, 2021.

T. Isomura, T. Toyoizumi, “Dimensionality reduction to maximize prediction generalization capability,” Nature Machine Intelligence, vol. 3, no. 5, pp. 434–446, 2021.

J. Yang, Y. Zhao, J. Liu, B. Jiang, Q. Meng, W. Lu, X. Gao, “No reference quality assessment for screen content images using stacked autoencoders in pictorial and textual regions,” IEEE transactions on cybernetics, 2020.

K. Sim, J. Yang, W. Lu, X. Gao, “Mad-dls: mean and deviation of deep and local similarity for image quality assessment,” IEEE Transactions on Multimedia, vol. 23, pp. 4037–4048, 2020.

J. Yang, A. Li, S. Xiao, W. Lu, X. Gao, “Mtd-net: learning to detect deepfakes images by multi-scale texture difference,” IEEE Transactions on Information Forensics and Security, vol. 16, pp. 4234–4245, 2021.

M. Bernhardt, D. C. Castro, R. Tanno, A. Schwaighofer, K. C. Tezcan, M. Monteiro, S. Bannur, M. P. Lungren, A. Nori, B. Glocker, et al., “Active label cleaning for improved dataset quality under resource constraints,” Nature Communications, vol. 13, no. 1, pp. 1–11, 2022.

Y. Li, X. Chao, “Toward sustainability: trade-off between data quality and quantity in crop pest recognition,” Frontiers in plant science, vol. 12, 2021.

Y. Li, X. Chao, S. Ercisli, “Disturbed-entropy: A simple data quality assessment approach,” ICT Express, 2022.

S. J. Pan, Q. Yang, “A survey on transfer learning,” IEEE Transactions on knowledge and data engineering, vol. 22, no. 10, pp. 1345–1359, 2009.

J. Yang, J. Wen, B. Jiang, H. Wang, “Blockchain- based sharing and tamper-proof framework of big data networking,” IEEE Network, vol. 34, no. 4, pp. 62–67, 2020.

S. Kornblith, J. Shlens, Q. V. Le, “Do better imagenet models transfer better?,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 2661–2671.

G. Kang, L. Jiang, Y. Yang, A. G. Hauptmann, “Contrastive adaptation network for unsupervised domain adaptation,” in Proceedings of the IEEE/ CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4893–4902.

Z. Hou, B. Yu, Y. Qiao, X. Peng, D. Tao, “Affordance transfer learning for human-object interaction detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 495–504.

F. Zhuang, Z. Qi, K. Duan, D. Xi, Y. Zhu, H. Zhu, H. Xiong, Q. He, “A comprehensive survey on transfer learning,” Proceedings of the IEEE, vol. 109, no. 1, pp. 43– 76, 2020.

K. He, X. Zhang, S. Ren, J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.

K. Simonyan, A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.

S. Zagoruyko, N. Komodakis, “Wide residual networks,” arXiv preprint arXiv:1605.07146, 2016.

R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 618–626.

Y. Li, X. Chao, “Distance-entropy: An effective indicator for selecting informative data,” Frontiers in Plant Science, vol. 12, 2022, doi: 10.3389/fpls.2021.818895.

Y. Li, J. Yang, J. Wen, “Entropy-based redundancy analysis and information screening,” Digital Communications and Networks, 2021, doi: https://doi.org/10.1016/j.dcan.2021.12.001