Empirical evaluation of the impact of class overlap on software defect prediction
Software defect prediction (SDP) utilizes the learning models to detect the defective modules in project, and their performance depends on the quality of training data. The previous researches mainly focus on the quality problems of class imbalance and feature redundancy. However, training data often contain some instances that belong to different class but have similar values on features, and this leads to class overlap to affect the quality of training data. Our goal is to investigate the impact of class overlap on software defect prediction. At the same time, we propose an improved K-Means clustering cleaning approach (IKMCCA) to solve both the class overlap and class imbalance problems. Specifically, we check whether K-Means clustering cleaning approach (KMCCA) or neighborhood cleaning learning (NCL) or IKMCCA is feasible to improve defect detection performance for two cases (i) within-project defect prediction (WPDP) (ii) cross-project defect prediction (CPDP). To have an objective estimate of class overlap, we carry out our investigations on 28 open source projects, and compare the performance of state-of-the-art learning models for the above-mentioned cases by using IKMCCA or KMCCA or NCL VS. Without cleaning data. The experimental results make clear that learning models obtain significantly better performance in terms of balance, Recall and AUC for both WPDP and CPDP when the overlapping instances are removed. Moreover, it is better to consider both class overlap and class imbalance.
Wed 13 NovDisplayed time zone: Tijuana, Baja California change
16:00 - 17:40 | PredictionResearch Papers / Journal First Presentations at Cortez 1 Chair(s): Xin Xia Monash University | ||
16:00 20mTalk | Predicting Licenses for Changed Source Code Research Papers Xiaoyu Liu Department of Computer Science and Engineering, Southern Methodist University, Liguo Huang Dept. of Computer Science, Southern Methodist University, Dallas, TX, 75205, Jidong Ge State Key Laboratory for Novel Software and Technology, Nanjing University, Vincent Ng Human Language Technology Research Institute, University of Texas at Dallas, Richardson, TX 75083-0688 | ||
16:20 20mTalk | Empirical evaluation of the impact of class overlap on software defect prediction Research Papers Lina Gong China University of Mining and Technology, Shujuan Jiang China University of Mining and Technology, Rongcun Wang China University of Mining and Technology, Li Jiang China University of Mining and Technology | ||
16:40 20mTalk | Combining Program Analysis and Statistical Language Model for Code Statement Completion Research Papers Son Nguyen The University of Texas at Dallas, Tien N. Nguyen University of Texas at Dallas, Yi Li New Jersey Institute of Technology, USA, Shaohua Wang New Jersey Institute of Technology, USA | ||
17:00 20mTalk | Balancing the trade-off between accuracy and interpretability in software defect prediction Journal First Presentations Toshiki Mori Corporate Software Engineering & Technology Center, Toshiba Corporation, Naoshi Uchihira School of Knowledge Science, Japan Advanced Institute of Science and Technology (JAIST) Link to publication File Attached | ||
17:20 20mTalk | Fine-grained just-in-time defect prediction Journal First Presentations Luca Pascarella Delft University of Technology, Fabio Palomba Department of Informatics, University of Zurich, Alberto Bacchelli University of Zurich Link to publication |