Can Duplicate Questions on Stack Overflow Benefit the Software Development Community?

Authors: Durham Abric Oliver E. Clark Matthew Caminiti Keheliya Gallaba Shane McIntosh

Venue: MSR   2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), pp. 230-234, 2019

Year: 2019

Abstract: Duplicate questions on Stack Overflow are questions that are flagged as being conceptually equivalent to a previously posted question. Stack Overflow suggests that duplicate questions should not be discussed by users, but rather that attention should be redirected to their previously posted counterparts. Roughly 53% of closed Stack Overflow posts are closed due to duplication. Despite their supposed overlapping content, user activity suggests duplicates may generate additional or superior answers. Approximately 9% of duplicates receive more views than their original counterparts despite being closed. In this paper, we analyze duplicate questions from two perspectives. First, we analyze the experience of those who post duplicates using activity and reputation-based heuristics. Second, we compare the content of duplicates both in terms of their questions and answers to determine the degree of similarity between each duplicate pair. Through analysis of the MSR challenge dataset, we find that although duplicate questions are more likely to be created by inexperienced users, they often receive dissimilar answers to their original counterparts. Indeed, supplementary textual analysis using Natural Language Processing (NLP) techniques suggests duplicate questions provide additional information about the underlying concepts being discussed. We recommend that the Stack Overflow's duplication policy be revised to account for the benefits that leaving duplicate questions open may have for the developer community.

BibTeX:

@inproceedings{durhamabric2019cdqosobtsdc,
    author = "Durham Abric and Oliver E. Clark and Matthew Caminiti and Keheliya Gallaba and Shane McIntosh",
    title = "Can Duplicate Questions on Stack Overflow Benefit the Software Development Community?",
    year = "2019",
    pages = "230-234",
    booktitle = "Proceedings of 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)"
}

Plain Text:

Durham Abric, Oliver E. Clark, Matthew Caminiti, Keheliya Gallaba, and Shane McIntosh, "Can Duplicate Questions on Stack Overflow Benefit the Software Development Community?," 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), pp. 230-234