Topic-based software defect explanation

Authors: Tse-Hsun Peter Chen Weiyi Shang Meiyappan Nagappan Ahmed E. Hassan Stephen W. Thomas

Venue: JSS   Journal of Systems and Software, Vol. 129, pp. 79-106, 2016

Year: 2016

Abstract: Researchers continue to propose metrics using measurable aspects of software systems to understand software quality. However, these metrics largely ignore the functionality, i.e., the conceptual concerns, of software systems. Such concerns are the technical concepts that reflect the system’s business logic. For instance, while lines of code may be a good general measure for defects, a large file responsible for simple I/O tasks is likely to have fewer defects than a small file responsible for complicated compiler implementation details. In this paper, we study the effect of concerns on software quality. We use a statistical topic modeling approach to approximate software concerns as topics (related words in source code). We propose various metrics using these topics to help explain the file defect-proneness. Case studies on multiple versions of Firefox, Eclipse, Mylyn, and NetBeans show that (i) some topics are more defect-prone than others; (ii) defect-prone topics tend to remain so over time; (iii) our topic-based metrics provide additional explanatory power for software quality over existing structural and historical metrics; and (iv) our topic-based cohesion metric outperforms state-of-the-art topic-based cohesion and coupling metrics in terms of defect explanatory power, while being simpler to implement and more intuitive to interpret.

BibTeX:

@article{tse-hsunpeterchen2016tsde,
    author = "Tse-Hsun Peter Chen and Weiyi Shang and Meiyappan Nagappan and Ahmed E. Hassan and Stephen W. Thomas",
    title = "Topic-based software defect explanation",
    year = "2016",
    pages = "79-106",
    journal = "Journal of Systems and Software",
    volume = "129"
}

Plain Text:

Tse-Hsun Peter Chen, Weiyi Shang, Meiyappan Nagappan, Ahmed E. Hassan, and Stephen W. Thomas, "Topic-based software defect explanation," Journal of Systems and Software, pp. 79-106