An empirical study on the effect of testing on code quality using topic models: A case study on software development systems

Authors: Tse-Hsun Chen Stephen W. Thomas Hadi Hemmati Meiyappan Nagappan Ahmed E. Hassan

Venue: IEEE Transactions on Reliability, Vol. 66, No. 3, pp. 806-824, 2017

Year: 2017

Abstract: Previous research in defect prediction has proposed approaches to determine which files require additional testing resources. However, practitioners typically create tests at a higher level of abstraction, which may span across many files. In this paper, we study software testing, especially test resource prioritization, from a different perspective. We use topic models to generate topics that provide a high-level view of a system, allowing developers to look at the test case coverage from a different angle. We propose measures of how well tested and defect prone a topic is, allowing us to discover which topics are well tested and which are defect prone. We conduct case studies on the histories of Mylyn, Eclipse, and NetBeans. We find that 34-78% of topics are shared between source code and test files, indicating that we can use topic models to study testing; well-tested topics are usually less defect prone, defect-prone topics are usually undertested; we can predict which topics are defect prone but not well tested with an average precision and recall of 75% and 77%, respectively; our approach complements traditional prediction-based approaches by saving testing and code inspection effort; and our approach is not particularly sensitive to the parameters that we use.

BibTeX:

@article{tse-hsunchen2017aesoteotocqutmacsosds,
    author = "Tse-Hsun Chen and Stephen W. Thomas and Hadi Hemmati and Meiyappan Nagappan and Ahmed E. Hassan",
    title = "An empirical study on the effect of testing on code quality using topic models: A case study on software development systems",
    year = "2017",
    pages = "806-824",
    journal = "IEEE Transactions on Reliability",
    volume = "66",
    number = "3"
}

Plain Text:

Tse-Hsun Chen, Stephen W. Thomas, Hadi Hemmati, Meiyappan Nagappan, and Ahmed E. Hassan, "An empirical study on the effect of testing on code quality using topic models: A case study on software development systems," IEEE Transactions on Reliability, pp. 806-824