The Relationship between Commit Message Detail and Defect Proneness in Java Projects on GitHub

Authors: Jacob G. Barnett Charles K. Gathuru Luke S. Soldano Shane McIntosh

Venue: MSR   2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR), pp. 496-499, 2016

Year: 2016

Abstract: Just-In-Time (JIT) defect prediction models aim to predict the commits that will introduce defects in the future. Traditionally, JIT defect prediction models are trained using metrics that are primarily derived from aspects of the code change itself (e.g., the size of the change, the author's prior experience). In addition to the code that is submitted during a commit, authors write commit messages, which describe the commit for archival purposes. It is our position that the level of detail in these commit messages can provide additional explanatory power to JIT defect prediction models. Hence, in this paper, we analyze the relationship between the defect proneness of commits and commit message volume (i.e., the length of the commit message) and commit message content (approximated using spam filtering technology). Through analysis of JIT models that were trained using 342 GitHub repositories, we find that our JIT models outperform random guessing models, achieving AUC and Brier scores that range between 0.63-0.96 and 0.01-0.21, respectively. Furthermore, our metrics that are derived from commit message detail provide a statistically significant boost to the explanatory power to the JIT models in 43%-80% of the studied systems, accounting for up to 72% of the explanatory power. Future JIT studies should consider adding commit message detail metrics.

BibTeX:

@inproceedings{jacobg.barnett2016trbcmdadpijpog,
    author = "Jacob G. Barnett and Charles K. Gathuru and Luke S. Soldano and Shane McIntosh",
    title = "The Relationship between Commit Message Detail and Defect Proneness in Java Projects on GitHub",
    year = "2016",
    pages = "496-499",
    booktitle = "Proceedings of 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR)"
}

Plain Text:

Jacob G. Barnett, Charles K. Gathuru, Luke S. Soldano, and Shane McIntosh, "The Relationship between Commit Message Detail and Defect Proneness in Java Projects on GitHub," 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR), pp. 496-499