Abstract
Code smells are symptoms of poor design and implementation choices that may hinder code comprehensibility and maintainability. Despite the high effort invested by the research community in studying the code smells phenomenon, there is still a lack of large empirical investigations on the relevance of code smell presence, their diffusion, and the magnitude of their effects on software maintainability. In this paper we present a large scale empirical investigation on code smell diffusion, co-occurrence and evolution in software projects. We also analyze the impact of code smells on both code change and fault-proneness. The study has been conducted across a total of 395 releases of 30 open source projects, and considering 17,350 manually validated instances of 13 different types of code smells. Our results show that: (i) smells characterized by long and/or complex code (e.g., Complex Class) are highly diffused, (ii) six pairs of code smells (e.g., Message Chain and Refused Bequest) frequently co-occur, (iii) more often than not the number of smell instances affecting the software systems increases over time, and (iv) smelly classes have a statistically significant higher change- and fault- proneness than smell-free classes.
Experimental Material
Raw Data
- Code Smell Detection Tool
- Dataset used in the Study
- Evolution Data for All the Subject Systems
- Analysis of the normalized change-proneness
- Analysis of the normalized fault-proneness
- Analysis of the change-proneness considering smelly and non-smelly classes of different size
- Analysis of the fault-proneness considering smelly and non-smelly classes of different size