A Large-Scale Study of the Impact of Feature Selection Techniques on Defect Classification Models

Authors: Baljinder Ghotra Shane McIntosh Ahmed E. Hassan

Venue: MSR   2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), pp. 146-157, 2017

Year: 2017

Abstract: The performance of a defect classification model depends on the features that are used to train it. Feature redundancy, correlation, and irrelevance can hinder the performance of a classification model. To mitigate this risk, researchers often use feature selection techniques, which transform or select a subset of the features in order to improve the performance of a classification model. Recent studies compare the impact of different feature selection techniques on the performance of defect classification models. However, these studies compare a limited number of classification techniques and have arrived at contradictory conclusions about the impact of feature selection techniques. To address this limitation, we study 30 feature selection techniques (11 filter-based ranking techniques, six filter based subset techniques, 12 wrapper-based subset techniques, and a no feature selection configuration) and 21 classification techniques when applied to 18 datasets from the NASA and PROMISE corpora. Our results show that a correlation-based filter-subset feature selection technique with a BestFirst search method outperforms other feature selection techniques across the studied datasets (it outperforms in 70%-87% of the PROMISE-NASA data sets) and across the studied classification techniques (it outperforms for 90% of the techniques). Hence, we recommend the application of such a selection technique when building defect classification models.

BibTeX:

@inproceedings{baljinderghotra2017alsotiofstodcm,
    author = "Baljinder Ghotra and Shane McIntosh and Ahmed E. Hassan",
    title = "A Large-Scale Study of the Impact of Feature Selection Techniques on Defect Classification Models",
    year = "2017",
    pages = "146-157",
    booktitle = "Proceedings of 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR)"
}

Plain Text:

Baljinder Ghotra, Shane McIntosh, and Ahmed E. Hassan, "A Large-Scale Study of the Impact of Feature Selection Techniques on Defect Classification Models," 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), pp. 146-157