Studying the relationship between logging characteristics and the code quality of platform software

Authors: Weiyi Shang Meiyappan Nagappan Ahmed E. Hassan

Venue: EMSE   Empirical Software Engineering, Vol. 20, No. 1, pp. 1-27, 2015

Year: 2015

Abstract: Platform software plays an important role in speeding up the development of large scale applications. Such platforms provide functionalities and abstraction on which applications can be rapidly developed and easily deployed. Hadoop and JBoss are examples of popular open source platform software. Such platform software generate logs to assist operators in monitoring the applications that run on them. These logs capture the doubts, concerns, and needs of developers and operators of platform software. We believe that such logs can be used to better understand code quality. However, logging characteristics and their relation to quality has never been explored. In this paper, we sought to empirically study this relation through a case study on four releases of Hadoop and JBoss. Our findings show that files with logging statements have higher post-release defect densities than those without logging statements in 7 out of 8 studied releases. Inspired by prior studies on code quality, we defined log-related product metrics, such as the number of log lines in a file, and log-related process metrics such as the number of changed log lines. We find that the correlations between our log-related metrics and post-release defects are as strong as their correlations with traditional process metrics, such as the number of pre-release defects, which is known to be one the metrics with the strongest correlation with post-release defects. We also find that log-related metrics can complement traditional product and process metrics resulting in up to 40 % improvement in explanatory power of defect proneness. Our results show that logging characteristics provide strong indicators of defect-prone source code files. However, we note that removing logs is not the answer to better code quality. Instead, our results show that it might be the case that developers often relay their concerns about a piece of code through logs. Hence, code quality improvement efforts (e.g., testing and inspection) should focus more on the source code files with large amounts of logs or with large amounts of log churn.

Preprint: PDF

BibTeX:

@article{weiyishang2015strblcatcqops,
    author = "Weiyi Shang and Meiyappan Nagappan and Ahmed E. Hassan",
    title = "Studying the relationship between logging characteristics and the code quality of platform software",
    year = "2015",
    pages = "1-27",
    journal = "Empirical Software Engineering",
    volume = "20",
    number = "1"
}

Plain Text:

Weiyi Shang, Meiyappan Nagappan, and Ahmed E. Hassan, "Studying the relationship between logging characteristics and the code quality of platform software," Empirical Software Engineering, pp. 1-27