Download PDFOpen PDF in browser
Switch back to the title and the abstract in Chinese

A Code Naturalness Based Defect Prediction Method at Slice Level

EasyChair Preprint no. 3918, version 2

Versions: 12history
24 pagesDate: November 20, 2020


Software defect prediction is an active research topic in the domain of software quality assurance. It can help developers find potential defects and make better use of resources. How to design more discriminative metrics for the prediction system, taking into account performance and interpretability, is a research direction that people devote to. Aiming at this challenge, a code naturalness feature based defect predictor method (CNDePor) is proposed. This method improves the language model by taking advantage of the bidirectional code-sequence measurement and weighting the samples by using the quality information, so as to increase the defect discrimination of the cross-entropy (CE) type metrics obtained from the model. Aiming at the shortcomings of coarse-grained defect prediction (e.g. difficulties in focusing on defect areas and high cost of code reviews), a new fine-grained defect prediction problem, statement-oriented slice level defect prediction, is studied. Four metrics are designed for this problem, and the effectiveness of these metrics and CNDePor are verified on two types of security defect datasets. The experimental results show that: CE-type metrics are learnable, which contain the relevant knowledge learned from the corpus by language model; the improved CE metrics are significantly better than the original metrics and traditional size metrics; the CNDePor method has significant advantages over the traditional defect prediction methods and an existing method based on code naturalness, and own comparable performance and stronger interpretability than a state-of-the-art mothed based on deep learning.

Keyphrases: code naturalness, cross-entropy, deep learning, defect prediction, language model, machine learning, slice level, software defect prediction, Software fault prediction, 交叉熵, 代码自然性, 切片粒度, 深度学习, 缺陷预测, 语言模型, 软件质量保障

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
  author = {Xian Zhang and Ke-Rong Ben and Jie Zeng},
  title = {A Code Naturalness Based Defect Prediction Method at Slice Level},
  howpublished = {EasyChair Preprint no. 3918},

  year = {EasyChair, 2020}}
Download PDFOpen PDF in browser