An industrial case study of automatically identifying performance regression-causes

Authors: Thanh H. D. Nguyen Meiyappan Nagappan Ahmed E. Hassan Mohamed Nasser Parminder Flora

Venue: MSR   11th Working Conference on Mining Software Repositories (MSR), pp. 232-241, 2014

Year: 2014

Abstract: Even the addition of a single extra field or control statement in the source code of a large-scale software system can lead to performance regressions. Such regressions can considerably degrade the user experience. Working closely with the members of a performance engineering team, we observe that they face a major challenge in identifying the cause of a performance regression given the large number of performance counters (e.g., memory and CPU usage) that must be analyzed. We propose the mining of a regression-causes repository (where the results of performance tests and causes of past regressions are stored) to assist the performance team in identifying the regression-cause of a newly-identified regression. We evaluate our approach on an open-source system, and a commercial system for which the team is responsible. The results show that our approach can accurately (up to 80% accuracy) identify performance regression-causes using a reasonably small number of historical test runs (sometimes as few as four test runs per regression-cause).

Preprint: PDF

BibTeX:

@inproceedings{thanhh.d.nguyen2014aicsoaipr,
    author = "Thanh H. D. Nguyen and Meiyappan Nagappan and Ahmed E. Hassan and Mohamed Nasser and Parminder Flora",
    title = "An industrial case study of automatically identifying performance regression-causes",
    year = "2014",
    pages = "232-241",
    booktitle = "Proceedings of the 11th working conference on mining software repositories"
}

Plain Text:

Thanh H. D. Nguyen, Meiyappan Nagappan, Ahmed E. Hassan, Mohamed Nasser, and Parminder Flora, "An industrial case study of automatically identifying performance regression-causes," 11th Working Conference on Mining Software Repositories (MSR), pp. 232-241