The complexity of software systems has grown exponentially in the last years, pushing developers to face more challenging tasks during evolution and maintenance activities. The developers’ ability to cope with a task might be hindered when it impacts several software components possibly spread across different subsystems. In this paper, we propose two measures capturing the structural and semantic distance between the components a developer is working on in a given time period. We call these measures structural and semantic confusion and we hypothesize that they have an impact on the likelihood of introducing defects during code change activities. To validate our conjecture, we use the defined measures to build a bug prediction model. We evaluated our model on five open source systems and compared it with two competitive techniques: the first is a prediction model exploiting the change-proneness of code components as predictor variable while the second is the Basic Code Change Model proposed by Hassan and using code change entropy information. The achieved results show the superiority of our model with respect to the two competitive approaches, and the orthogonality of the defined confusion measures with respect to standard predictors commonly used in the literature.

Experimental Material

Raw data

Classifier choice

Comparison between Basic Code Change Model and Changes Model

Design downloaded from free website templates.