ACM Sigsoft Distinguished Paper Awards

  • ASE 2017

    Automatically Assessing Code Understandability: How Far Are We?
    by S. Scalabrino, G. Bavota, C. Vendome, M. Linares-Vasquez, D. Poshyvanyk, R. Oliveto

    32nd IEEE/ACM International Conference on Automated Software Engineering

    Program understanding plays a pivotal role in software maintenance and evolution: a deep understanding of code is the stepping stone for most software-related activities, such as bug fixing or testing. Being able to measure the understandability of a piece of code might help in estimating the effort required for a maintenance activity, in comparing the quality of alternative implementations, or even in predicting bugs. Unfortunately, there are no existing metrics specifically designed to assess the understandability of a given code snippet. In this paper we perform a first step in this direction, by studying the extent to which several types of metrics computed on code, documentation and developers correlate with code understandability. To perform such an investigation we ran a study with 46 participants who were asked to understand eight code snippets each. We collected a total of 324 evaluations aiming at assessing the perceived understandability, the actual level of understanding and the time needed to understand a code snippet. Our results demonstrate that none of the (existing and new) metrics we considered is able to capture code understandability, not even the ones assumed to assess quality attributes strongly related with it, such as code readability and complexity.

  • ICPC 2016

    Improving Code Readability Models with Textual Features
    by S. Scalabrino, M. Linares-Vasquez, D. Poshyvanyk, R. Oliveto

    24th International Conference on Program Comprehension

    Code reading is one of the most frequent activities in software maintenance; before implementing changes, it is necessary to fully understand source code often written by other developers. Thus, readability is a crucial aspect of source code that might significantly influence program comprehension effort. In general, models used to estimate software readability take into account only structural aspects of source code, e.g., line length and a number of comments. However, code is a particular form of text; therefore, a code readability model should not ignore the textual aspects of source code encapsulated in identifiers and comments. In this paper, we propose a set of textual features that could be used to measure code readability. We evaluated the proposed textual features on 600 code snippets manually evaluated (in terms of readability) by 5K+ people. The results show that the proposed features complement classic structural features when predicting readability judgments. Consequently, a code readability model based on a richer set of features, including the ones proposed in this paper, achieves a significantly better accuracy as compared to all the state-of-the-art readability models.

  • ICSE 2015

    When and Why Your Code Starts to Smell Bad
    by M. Tufano, F. Palomba, G. Bavota, R. Oliveto, M. Di Penta, A. De Lucia, and D. Poshyvanyk

    37th International Conference on Software Engineering

    In past and recent years, the issues related to managing technical debt received significant attention by researchers from both industry and academia. There are several factors that contribute to technical debt. One of these is represented by code bad smells, i.e. symptoms of poor design and implementation choices. While the repercussions of smells on code quality have been empirically assessed, there is still only anecdotal evidence on when and why bad smells are introduced. To fill this gap, we conducted a large empirical study over the change history of 200 open source projects from different software ecosystems and investigated when bad smells are introduced by developers, and the circumstances and reasons behind their introduction. Our study required the development of a strategy to identify smell-introducing commits, the mining of over 0.5M commits, and the manual analysis of 9,164 of them (i.e. those identified as smell-introducing). Our findings mostly contradict common wisdom stating that smells are being introduced during evolutionary tasks. In the light of our results, we also call for the need to develop a new generation of recommendation systems aimed at properly planning smell refactoring activities.

  • ESEC

    Optimizing Energy Consumption of GUIs in Android Apps: A Multi-objective Approach by M. Linares-Vasquez, G. Bavota, C. Bernal-Cardenas, R. Oliveto, M. Di Penta, D. Poshyvanyk

    10th Joint Meeting of the European Software Engineering Conference and the 23rd ACM SIGSOFT Symposium on the Foundations of Software Engineering

    The wide diffusion of mobile devices has motivated research towards optimizing energy consumption of software systems - including apps - targeting such devices. Besides efforts aimed at dealing with various kinds of energy bugs, the adoption of Organic Light-Emitting Diode (OLED) screens has motivated research towards reducing energy consumption by choosing an appropriate color palette. Whilst past research in this area aimed at optimizing energy while keeping an acceptable level of contrast, this paper proposes an approach, named GEMMA (Gui Energy Multi-objective optiMization for Android apps), for generating color palettes using a multi-objective optimization technique, which produces color solutions optimizing energy consumption and contrast while using consistent colors with respect to the original color palette. An empirical evaluation that we performed on 25 Android apps demonstrates not only significant improvements in terms of the three different objectives, but also confirmed that in most cases users still perceived the choices of colors as attractive. Finally, for several apps we interviewed the original developers, who in some cases expressed the intent to adopt the proposed choice of color palette, whereas in other cases pointed out directions for future improvements.

  • ASE 2013

    Detecting Bad Smells in Source Code Using Change History Information
    by F. Palomba, G. Bavota, M. Di Penta, R. Oliveto, A. De Lucia, D. Poshyvanyk

    28th IEEE/ACM International Conference on Automated Software Engineering

    Code smells represent symptoms of poor implementation choices. Previous studies found that these smells make source code more difficult to maintain, possibly also increasing its fault-proneness. There are several approaches that identify smells based on code analysis techniques. However, we observe that many code smells are intrinsically characterized by how code elements change over time. Thus, relying solely on structural information may not be sufficient to detect all the smells accurately. We propose an approach to detect five different code smells, namely Divergent Change, Shotgun Surgery, Parallel Inheritance, Blob, and Feature Envy, by exploiting change history information mined from versioning systems. We applied approach, coined as HIST (Historical Information for Smell deTection), to eight software projects written in Java, and wherever possible compared with existing state-of-the-art smell detectors based on source code analysis. The results indicate that HIST's precision ranges between 61% and 80%, and its recall ranges between 61% and 100%. More importantly, the results confirm that HIST is able to identify code smells that cannot be identified through approaches solely based on code analysis.

Best Paper Awards

  • SCAM 2012

    When does a Refactoring Induce Bugs? An Empirical Study
    by G. Bavota, B. De Carluccio, A. De Lucia, M. Di Penta, R. Oliveto, O. Strollo

    12th IEEE International Working Conference on Source Code Analysis and Manipulation

    Refactorings are - as defined by Fowler - behavior preserving source code transformations. Their main purpose is to improve maintainability or comprehensibility, or also reduce the code footprint if needed. In principle, refactorings are defined as simple operations so that are "unlikely to go wrong" and introduce faults. In practice, refactoring activities could have their risks, as other changes. This paper reports an empirical study carried out on three Java software systems, namely Apache Ant, Xerces, and Ar-go UML, aimed at investigating to what extent refactoring activities induce faults. Specifically, we automatically detect (and then manually validate) 15,008 refactoring operations (of 52 different kinds) using an existing tool (Ref-Finder). Then, we use the SZZ algorithm to determine whether it is likely that refactorings induced a fault. Results indicate that, while some kinds of refactorings are unlikely to be harmful, others, such as refactorings involving hierarchies (e.g., pull up method), tend to induce faults very frequently. This suggests more accurate code inspection or testing activities when such specific refactorings are performed.

  • ICPC 2011

    Improving IR-based Traceability Recovery Using Smoothing Filters
    by A. De Lucia, M. Di Penta, R. Oliveto, A. Panichella, S. Panichella

    19th International Conference on Program Comprehension

    Information Retrieval methods have been largely adopted to identify traceability links based on the textual similarity of software artifacts. However, noise due to word usage in software artifacts might negatively affect the recovery accuracy. We propose the use of smoothing filters to reduce the effect of noise in software artifacts and improve the performances of traceability recovery methods. An empirical evaluation performed on two repositories indicates that the usage of a smoothing filter is able to significantly improve the performances of Vector Space Model and Latent Semantic Indexing. Such a result suggests that other than being used for traceability recovery the proposed filter can be used to improve performances of various other software engineering approaches based on textual analysis.

  • ICSM

    Physical and Conceptual Identifier Dispersion: Measures and Relation to Fault Proneness
    by V. Arnaoudova, L. Eshkevari, R. Oliveto, Y.-G. Guéhéneuc, G. Antoniol

    26th IEEE International Conference on Software Maintenance - ERA Track

    Poorly-chosen identifiers have been reported in the literature as misleading and increasing the program comprehension effort. Identifiers are composed of terms, which can be dictionary words, acronyms, contractions, or simple strings. We conjecture that the use of identical terms in different contexts may increase the risk of faults. We investigate our conjecture using a measure combining term entropy and term context coverage to study whether certain terms increase the odds ratios of methods to be fault-prone. Entropy measures the physical dispersion of terms in a program: the higher the entropy, the more scattered across the program the terms. Context coverage measures the conceptual dispersion of terms: the higher their context coverage, the more unrelated the methods using them. We compute term entropy and context coverage of terms extracted from identifiers in Rhino 1.4R3 and ArgoUML 0.16. We show statistically that methods containing terms with high entropy and context coverage are more fault-prone than others.