Filter by type:

To appear

A Developer Centered Bug Prediction Model

D. Di Nucci, F. Palomba*, G. De Rosa, G. Bavota, R. Oliveto, A. De Lucia
Journal PaperIEEE Transactions on Software Engineering, to appear. IEEE Press.

Abstract

Several techniques have been proposed to accurately predict software defects. These techniques generally exploit characteristics of the code artefacts (e.g., size, complexity, etc.) and/or of the process adopted during their development and maintenance (e.g., the number of developers working on a component) to spot out components likely containing bugs. While these bug prediction models achieve good levels of accuracy, they mostly ignore the major role played by human-related factors in the introduction of bugs. Previous studies have demonstrated that focused developers are less prone to introduce defects than non-focused developers. According to this observation, software components changed by focused developers should also be less error prone than components changed by less focused developers. We capture this observation by measuring the scattering of changes performed by developers working on a component and use this information to build a bug prediction model. Such a model has been evaluated on 26 systems and compared with four competitive techniques. The achieved results show the superiority of our model, and its high complementarity with respect to predictors commonly used in the literature. Based on this result, we also show the results of a "hybrid" prediction model combining our predictors with the existing ones.

When and Why Your Code Starts to Smell Bad (and Whether the Smells Go Away)

M. Tufano, F. Palomba*, G. Bavota, R. Oliveto, M. Di Penta, A. De Lucia
Journal PaperIEEE Transactions on Software Engineering, to appear. IEEE Press.

Abstract

Technical debt is a metaphor introduced by Cunningham to indicate "not quite right code which we postpone making it right". One noticeable symptom of technical debt is represented by code smells, defined as symptoms of poor design and implementation choices. Previous studies showed the negative impact of code smells on the comprehensibility and maintainability of code. While the repercussions of smells on code quality have been empirically assessed, there is still only anecdotal evidence on when and why bad smells are introduced, what is their survivability, and how they are removed by developers. To empirically corroborate such anecdotal evidence, we conducted a large empirical study over the change history of 200 open source projects. This study required the development of a strategy to identify smell-introducing commits, the mining of over half a million of commits, and the manual analysis and classification of over 10K of them. Our findings mostly contradict common wisdom, showing that most of the smell instances are introduced when an artifact is created and not as a result of its evolution. At the same time, 80% of smells survive in the system. Also, among the 20% of removed instances, only 9% is removed as a direct consequence of refactoring operations.

There and Back Again: Can you Compile that Snapshot?

M. Tufano, F. Palomba*, G. Bavota, M. Di Penta, R. Oliveto, A. De Lucia, D. Poshyvanyk
Journal PaperJournal of Software: Evolution and Process, to appear. Wiley InterScience Press.

Abstract

A broken snapshot represents a snapshot from a project's change history that cannot be compiled. Broken snapshots can have significant implications for researchers, as they could hinder any analysis of the past project history that requires code to be compiled. Noticeably, while some broken snapshots may be observable in change history repositories (e.g., no longer available dependencies), some of them may not necessarily happen during the actual development. In this paper, we systematically study the compilability of broken snapshots in 219 395 snapshots belonging to 100 Java projects from the Apache Software Foundation, all relying on Maven as an automated build tool. We investigated broken snapshots from 2 different perspectives: (1) how frequently they happen and (2) likely causes behind them. The empirical results indicate that broken snapshots occur in most (96%) of the projects we studied and that they are mainly due to problems related to the resolution of dependencies. On average, only 38% of the change history of the analyzed systems is currently successfully compilable.

An Empirical Study on Developer Related Factors Characterizing Fix-Inducing Commits

M. Tufano, G. Bavota, D. Poshyvanyk, M. Di Penta, R. Oliveto, A. De Lucia
Journal PaperJournal of Software: Evolution and Process, to appear. Wiley InterScience Press.

Abstract

This paper analyzes developer-related factors that could influence the likelihood for a commit to induce a fix. Specifically, we focus on factors that could potentially hinder developers' ability to correctly understand the code components involved in the change to be committed as follows: (i) the coherence of the commit (i.e., how much it is focused on a specific topic); (ii) the experience level of the developer on the files involved in the commit; and (iii) the interfering changes performed by other developers on the files involved in past commits. The results of our study indicate that "fix-inducing" commits (i.e., commits that induced a fix) are significantly less coherent than "clean" commits (i.e., commits that did not induce a fix). Surprisingly, "fix-inducing" commits are performed by more experienced developers; yet, those are the developers performing more complex changes in the system. Finally, "fix-inducing" commits have a higher number of past interfering changes as compared with "clean" commits. Our empirical study sheds light on previously unexplored factors and presents significant results that can be used to improve approaches for defect prediction.

2017

Supporting Software Developers with a Holistic Recommender System

L. Ponzanelli, S. Scalabrino*, G. Bavota, A. Mocci, R. Oliveto, M. Di Penta, M. Lanza
Conference Paper39th International Conference on Software Engineering, 11 pages, Buenos Aires, Argentina, 2017. Acceptance Rate: 68/398 (17%)

Abstract

The promise of recommender systems is to provide intelligent support to developers during their programming tasks. Such support ranges from suggesting program entities to taking into account pertinent Q&A pages. However, current recommender systems limit the context analysis to change history and developers' activities in the IDE, without considering what a developer has already consulted or perused, e.g., by performing searches from the Web browser. Given the faceted nature of many programming tasks, and the incompleteness of the information provided by a single artifact, several heterogeneous resources are required to obtain the broader picture needed by a developer to accomplish a task. We present Libra, a holistic recommender system. It supports the process of searching and navigating the information needed by constructing a holistic meta-information model of the resources perused by a developer, analyzing their semantic relationships, and augmenting the web browser with a dedicated interactive navigation chart. The quantitative and qualitative evaluation of Libra provides evidence that a holistic analysis of a developer's information context can indeed offer comprehensive and contextualized support to information navigation and retrieval during software development.

2016

Using Cohesion and Coupling for Software Remodularization: Is it Enough?

I. Candela*, G. Bavota, B. Russo, R. Oliveto
Journal PaperTransactions on Software Engineering and Methodology, 25(3): 1-28, 2016. ACM press.

Abstract

Refactoring, and in particular, remodularization operations can be performed to repair the design of a software system and remove the erosion caused by software evolution. Various approaches have been proposed to support developers during the remodularization of a software system. Most of these approaches are based on the underlying assumption that developers pursue an optimal balance between quality metrics—such as cohesion and coupling—when modularizing the classes of their systems. Thus, a remodularization recommender proposes a solution that implicitly provides a (near) optimal balance between such quality metrics. However, there is still a lack of empirical evidence that such a balance is the desideratum by developers. This paper aims at bridging this gap by analyzing both objectively and subjectively the aforementioned phenomenon. Specifically, we present the results of (i) a large study analyzing the modularization quality, in terms of package cohesion and coupling, of 100 open source systems, and (ii) a survey conducted with 34 developers aimed at understanding the driving factors they consider when performing modularization tasks. The results achieved have been used to distill a set of lessons learned that might be considered to design more effective remodularization recommenders.

Turning the IDE into a Self-confident Programming Assistant

L. Ponzanelli, G. Bavota, M. Di Penta, R. Oliveto, M. Lanza
Journal PaperEmpirical Software Engineering Journal, 21(5): 2190-2231, 2016. Springer press.

Abstract

Developers often require knowledge beyond the one they possess, which boils down to asking co-workers for help or consulting additional sources of information, such as Application Programming Interfaces (API) documentation, forums, and Q&A websites. However, it requires time and energy to formulate one's problem, peruse and process the results. We propose a novel approach that, given a context in the Integrated Development Environment (IDE), automatically retrieves pertinent discussions from StackOverflow, evaluates their relevance using a multi-faceted ranking model, and, if a given confidence threshold is surpassed, notifies the developer. We have implemented our approach in Prompter, an Eclipse plug-in. Prompter was evaluated in two empirical studies. The first study was aimed at evaluating Prompter's ranking model and involved 33 participants. The second study was conducted with 12 participants and aimed at evaluating Prompter's usefulness when supporting developers during development and maintenance tasks. Since Prompter uses "volatile information" crawled from the web, we also replicated Study I after one year to assess the impact of such a "volatility" on recommenders like Prompter. Our results indicate that (i) Prompter recommendations were positively evaluated in 74% of the cases on average, (ii) Prompter significantly helps developers to improve the correctness of their tasks by 24% on average, but also (iii) 78% of the provided recommendations are ``volatile" and can change at one year of distance. While Prompter revealed to be effective, our studies also point out issues when building recommenders based on information available on online forums.

Parameterizing and Assembling IR-Based Solutions for SE Tasks Using Genetic Algorithms

A. Panichella, B. Dit, R. Oliveto, M. Di Penta, D. Poshyvanyk, A. De Lucia
Conference Paper23rd International Conference on Software Analysis, Evolution, and Reengineering, pp. 314-325, Osaka, Japan, 2016. Acceptance Rate: 52/140 (37%)

Abstract

Information Retrieval (IR) approaches are nowadays used to support various software engineering tasks, such as feature location, traceability link recovery, clone detection, or refactoring. However, previous studies showed that inadequate instantiation of an IR technique and underlying process could significantly affect the performance of such approaches in terms of precision and recall. This paper proposes the use of Genetic Algorithms (GAs) to automatically configure and assemble an IR process for software engineering tasks. The approach (named GA-IR) determines the (near) optimal solution to be used for each stage of the IR process, i.e., term extraction, stop word removal, stemming, indexing and an IR algebraic method calibration. We applied GA-IR on two different software engineering tasks, namely traceability link recovery and identification of duplicate bug reports. The results of the study indicate that GA-IR outperforms approaches previously published in the literature, and that it does not significantly differ from an ideal upper bound that could be achieved by a supervised and combinatorial approach.

Search-based Testing of Procedural Programs: Iterative Single-target or Multi-target Approach?

S. Scalabrino*, G. Grano, D. di Nucci, R. Oliveto, A. De Lucia
Conference Paper8th International Symposium on Search Based Software Engineering, pp. 64-79, Austin, Texas, USA, 2016.

Abstract

In the context of testing of Object-Oriented (OO) software systems, researchers have recently proposed search based approaches to automatically generate whole test suites by considering simultaneously all targets (e.g., branches) defined by the coverage criterion (multi-target approach). The goal of whole suite approaches is to overcome the problem of wasting search budget that iterative single-target approaches (which iteratively generate test cases for each target) can encounter in case of infeasible targets. However, whole suite approaches have not been implemented and experimented in the context of procedural programs. In this paper we present OCELOT (Optimal Coverage sEarch-based tooL for sOftware Testing), a test data generation tool for C programs which implements both a state-of-the-art whole suite approach and an iterative single-target approach designed for a parsimonious use of the search budget. We also present an empirical study conducted on 35 open-source C programs to compare the two approaches implemented in OCELOT. The results indicate that the iterative single-target approach provides a higher efficiency while achieving the same or an even higher level of coverage than the whole suite approach.

An Empirical Investigation into the Nature of Test Smells

M. Tufano, F. Palomba*, G. Bavota, M. Di Penta, R. Oliveto, A. De Lucia, D. Poshyvanyk
Conference Paper31st IEEE/ACM International Conference on Automated Software Engineering, p. 4-15, Singapore, 2016. Acceptance Rate: 57/298 (19%).

Abstract

Test smells have been defined as poorly designed tests and, as reported by recent empirical studies, their presence may negatively affect comprehension and maintenance of test suites. Despite this, there are no available automated tools to support identification and repair of test smells. In this paper, we firstly investigate developers' perception of test smells in a study with 19 participants. The results show that developers generally do not recognize (potentially harmful) test smells, highlighting that automated tools for identifying such smells are much needed. However, to build effective tools, deeper insights into the test smells phenomenon are required. To this aim, we conducted a large-scale empirical investigation aimed at analyzing (i) when test smells occur in source code, (ii) what their survivability is, and (iii) whether their presence is associated with the presence of design problems in production code (code smells). The results indicate that test smells are usually introduced when the corresponding test code is committed in the repository for the first time, and they tend to remain in a system for a long time. Moreover, we found various unexpected relationships between test and code smells. Finally, we show how the results of this study can be used to build effective automated tools for test smell detection and refactoring.

Smells like Teen Spirit: Improving Bug Prediction Performance Using the Intensity of Code Smells

F. Palomba*, M. Zanoni, F. Arcelli Fontana, A. De Lucia, R. Oliveto
Conference PaperInternational Conference on Software Maintenance and Evolution, 12 pages, Raleight, USA, 2016. Acceptance Rate: 37/125 (29%).

Abstract

Code smells are symptoms of poor design and implementation choices. Previous studies empirically assessed the impact of smells on code quality and clearly indicate their negative impact on maintainability, including a higher bug- proneness of components affected by code smells. In this paper we capture previous findings on bug-proneness to build a specialized bug prediction model for smelly classes. Specifically, we evaluate the contribution of a measure of the severity of code smells (i.e., code smell intensity) by adding it to existing bug prediction models and comparing the results of the new model against the baseline model. Results indicate that the accuracy of a bug prediction model increases by adding the code smell intensity as predictor. We also evaluate the actual gain provided by the intensity index with respect to the other metrics in the model, including the ones used to compute the code smell intensity. We observe that the intensity index is much more important as compared to other metrics used for predicting the buggyness of smelly classes.

Automatic Test Case Generation: What if Test Code Quality Matters?

F. Palomba*, A. Panichella, A. Zaidman, R. Oliveto, A. De Lucia
Conference PaperInternational Symposium on Software Testing and Analysis, 12 pages, Saarbrücken, Germany, 2016. Acceptance Rate: 37/147 (25%)

Abstract

Test case generation tools that optimize code coverage have been extensively investigated. Recently, researchers have suggested to add other non-coverage criteria, such as memory consumption or readability, to increase the practical usefulness of generated tests. In this paper, we observe that test code quality metrics, and test cohesion and coupling in particular, are valuable candidates as additional criteria. Indeed, tests with low cohesion and/or high coupling have been shown to have a negative impact on future maintenance activities. In an exploratory investigation we show that most generated tests are indeed affected by poor test code quality. For this reason, we incorporate cohesion and coupling metrics into the main loop of search-based algorithm for test case generation. Through an empirical study we show that our approach is not only able to generate tests that are more cohesive and less coupled, but can (i) increase branch coverage up to 10% when enough time is given to the search and (ii) result in statistically shorter tests.

Improving Code Readability Models with Textual Features

S. Scalabrino*, M. Linares-Vasquez, D. Poshyvanyk, R. Oliveto
Conference Paper24th International Conference on Program Comprehension, 10 pages, Austin, Texas, USA, 2016. Acceptance Rate: 20/67 (30%)

Abstract

Code reading is one of the most frequent activities in software maintenance; before implementing changes, it is necessary to fully understand source code often written by other developers. Thus, readability is a crucial aspect of source code that might significantly influence program comprehension effort. In general, models used to estimate software readability take into account only structural aspects of source code, e.g., line length and a number of comments. However, code is a particular form of text; therefore, a code readability model should not ignore the textual aspects of source code encapsulated in identifiers and comments. In this paper, we propose a set of textual features that could be used to measure code readability. We evaluated the proposed textual features on 600 code snippets manually evaluated (in terms of readability) by 5K+ people. The results show that the proposed features complement classic structural features when predicting readability judgments. Consequently, a code readability model based on a richer set of features, including the ones proposed in this paper, achieves a significantly better accuracy as compared to all the state-of-the-art readability models.

A Textual-based Technique for Smell Detection

F. Palomba*, A. Panichella, A. De Lucia, R. Oliveto, Andy Zaidman
Conference Paper24th International Conference on Program Comprehension, 10 pages, Austin, Texas, USA, 2016. Acceptance Rate: 20/67 (30%)

Abstract

In this paper, we present TACO (Textual Analysis for Code Smell Detection), a technique that exploits textual analysis to detect a family of smells of different nature and different levels of granularity. We run TACO on 10 open source projects, comparing its performance with existing smell detectors purely based on structural information extracted from code components. The analysis of the results indicates that TACO’s precision ranges between 67% and 77%, while its recall ranges between 72% and 84%. Also, TACO often outperforms alternative structural approaches confirming, once again, the usefulness of information that can be derived from the textual part of code components.

On the Diffusion of Test Smells in Automatically Generated Test Code: An Empirical Study

F. Palomba*, D. Di Nucci, A. Panichella, R. Oliveto, A. De Lucia
Conference Paper9th International Workshop on Search-based Software Testing, 10 pages, Austin, Texas, USA, 2016. Acceptance Rate: 11/15 (73%)

Abstract

The role of software testing in the software development process is widely recognized as a key activity for successful projects. This is the reason why in the last decade several automatic unit test generation tools have been proposed, focusing particularly on high code coverage. Despite the effort spent by the research community, there is still a lack of empirical investigation aimed at analyzing the characteristics of the produced test code. Indeed, while some studies inspected the effectiveness and the usability of these tools in practice, it is still unknown whether test code is maintainable. In this paper, we conducted a large scale empirical study in order to analyze the diffusion of bad design solutions, namely test smells, in automatically generated unit test classes. Results of the study show the high diffusion of test smells as well as the frequent co-occurrence of different types of design problems. Finally we found that all test smells have strong positive correlation with structural characteristics of the systems such as size or number of classes.

Too Long; Didn't Watch! Extracting Relevant Fragments from Software Development Video Tutorials

L. Ponzanelli, G. Bavota, A. Mocci, M. Di Penta, R. Oliveto, M. Hasan, B. Russo, S. Haiduc, M. Lanza
Conference Paper38th International Conference on Software Engineering, pages 261-272, Austin, Texas, 2016. Acceptance Rate: 101/530 (19%)

Abstract

When facing difficulties solving a task at hand, and knowledgeable colleagues are not available, developers resort to offline and online resources, e.g. official documentation, third-party tutorials, mailing lists, and Q&A websites. These, however, need to be found, read, and understood, which takes its toll in terms of time and mental energy. A more immediate and accessible resource are video tutorials found on the web, which in recent years have seen a steep increase in popularity. Nonetheless, videos are an intrinsically noisy data source, and finding the right piece of information might be even more cumbersome than using the previously mentioned resources. We present CodeTube, an approach which mines video tutorials found on the web, and enables developers to query their contents. The video tutorials are processed and split into coherent fragments, to return only fragments related to the query. As an added benefit, the relevant video fragments are complemented with information from additional sources, such as Stack Overflow discussions. The results of two studies to assess CodeTube indicate that video tutorials - if appropriately processed - represent a useful, yet still under-utilized source of information for software development.

Release Planning of Mobile Apps based on User Reviews

L. Villarroel, G. Bavota, B. Russo, R. Oliveto, M. Di Penta
Conference Paper38th International Conference on Software Engineering, pages 14-24, Austin, Texas, 2016. Acceptance Rate: 101/530 (19%)

Abstract

Developers have to to constantly improve their apps by fixing critical bugs and implementing the most desired features in order to gain shares in the continuously increasing and competitive market of mobile apps. A precious source of information to plan such activities is represented by reviews left by users on the app store. However, in order to exploit such information developers need to manually analyze such reviews. This is something not doable if, as frequently happens, the app receives hundreds of reviews per day. In this paper we introduce CLAP (Crowd Listener for releAse Planning), a thorough solution to (i) categorize user reviews based on the information they carry out (e.g., bug reporting), (ii) cluster together related reviews (e.g., all reviews reporting the same bug), and (iii) automatically prioritize the clusters of reviews to be implemented when planning the subsequent app release. We evaluated all the steps behind CLAP, showing its high accuracy in categorizing and clustering reviews and the meaningfulness of the recommended prioritizations. Also, given the availability of CLAP as a working tool, we assessed its practical applicability in industrial environments.

CodeTube: extracting relevant fragments from software development video tutorials

L. Ponzanelli, G. Bavota, A. Mocci, M. Di Penta, R. Oliveto, B. Russo, S. Haiduc, M. Lanza
Tool demo paper38th International Conference on Software Engineering - Demonstration Track, pages 645-648, Austin, Texas, 2016. Acceptance Rate: 18/56 (32%).

Abstract

Nowadays developers heavily rely on sources of informal documentation, including Q&A forums, slides, or video tutorials, the latter being particularly useful to provide introductory notions for a piece of technology. The current practice is that developers have to browse sources individually, which in the case of video tutorials is cumbersome, as they are lengthy and cannot be searched based on their contents. We present CodeTube, a Web-based recommender system that analyzes the contents of video tutorials and is able to provide, given a query, cohesive and self-contained video fragments, along with links to relevant Stack Overflow discussions. CodeTube relies on a combination of textual analysis and image processing applied on video tutorial frames and speech transcripts to split videos into cohesive fragments, index them and identify related Stack Overflow discussions.

2015

An Experimental Investigation on the Innate Relationship between Quality and Refactoring

G. Bavota, A. De Lucia, M. Di Penta, R. Oliveto, F. Palomba*
Journal PaperJournal of Systems and Software, 107: 1-14, 2015. Elsevier press.

Abstract

Previous studies have investigated the reasons behind refactoring operations performed by developers, and proposed methods and tools to recommend refactorings based on quality metric profiles, or on the presence of poor design and implementation choices, i.e., code smells. Nevertheless, the existing literature lacks of observations about the relations between metrics/code smells and refactoring operations performed by developers. In other words, the characteristics of code components pushing developers to refactor them are still unknown. This paper aims at bridging this gap by analyzing which code characteristics trigger the developers' refactoring attentions. Specifically, we mined the evolution history of three Java open source projects to investigate whether developers' refactoring activities occur on code components for which certain indicators - such as quality metrics or the presence of smells as detected by tools - suggest there might be need for refactoring operations. Results indicate that, more often than not, quality metrics do not show a clear relationship with refactoring. In other words, refactoring operations performed by developers are generally focused on code components for which quality metrics do not suggest there might be need for refactoring operations. We also observed that code components having a high change-proneness attract more refactoring operations aiming at improving code readability. Finally, 42% of refactoring operations are performed by developers on code smells. However, the effectiveness of such operations is quite low; only 7% of the performed operations actually remove the code smells from the affected class.

Defect Prediction as a Multi-Objective Optimization Problem

G. Canfora, A. De Lucia, M. Di Penta, R. Oliveto, A. Panichella*, S. Panichella
Journal PaperSoftware Testing, Verification and Reliability, 25(4): 426-459, 2015. Wiley press.

Abstract

In this paper we formalize the defect prediction problem as a multi-objective optimization problem. Specifically, we propose an approach, coined as MODEP (Multi-Objective DEfect Predictor), based on multi-objective forms of machine learning techniques-logistic regression and decision trees specifically-trained using a genetic algorithm. The multi-objective approach allows software engineers to choose predictors achieving a specific compromise between the number of likely defect-prone classes, or the number of defects that the analysis would likely discover (effectiveness), and LOC to be analyzed/tested (which can be considered as a proxy of the cost of code inspection). Results of an empirical evaluation on 10 datasets from the PROMISE repository indicate the quantitative superiority of MODEP with respect to single-objective predictors, and with respect to trivial baseline ranking classes by size in ascending or descending order. Also, MODEP outperforms an alternative approach for cross-project prediction, based on local prediction upon clusters of similar classes.

How the Apache Community Upgrades Dependencies: An Evolutionary Study

G. Bavota, G. Canfora, M. Di Penta, R. Oliveto, S. Panichella
Journal PaperEmpirical Software Engineering Journal, 20(5): 1275-1317, 2015. Springer press.

Abstract

Software ecosystems consist of multiple software projects, often interrelated by means of dependency relations. When one project undergoes changes, other projects may decide to upgrade their dependency. For example, a project could use a new version of a component from another project because the latter has been enhanced or subject to some bug-fixing activities. In this paper we study the evolution of dependencies between projects in the Java subset of the Apache ecosystem, consisting of 147 projects, for a period of 14 years, resulting in 1,964 releases. Specifically, we investigate (i) how dependencies between projects evolve over time when the ecosystem grows, (ii) what are the product and process factors that can likely trigger dependency upgrades, (iii) how developers discuss the needs and risks of such upgrades, and (iv) what is the likely impact of upgrades on client projects. The study results—qualitatively confirmed by observations made by analyzing the developers’ discussion—indicate that when a new release of a project is issued, it triggers an upgrade when the new release includes major changes (e.g., new features/services) as well as large amount of bug fixes. Instead, developers are reluctant to perform an upgrade when some APIs are removed. The impact of upgrades is generally low, unless it is related to frameworks/libraries used in crosscutting concerns. Results of this study can support the understanding of the of library/component upgrade phenomenon, and provide the basis for a new family of recommenders aimed at supporting developers in the complex (and risky) activity of managing library/component upgrade within their software projects.

Are Test Smells Really Harmful? An Empirical Study

G. Bavota*, A. Qusef*, R. Oliveto, A. De Lucia, D. Binkley
Journal PaperEmpirical Software Engineering Journal, 20(4):1052-1094, 2015. Springer press.

Abstract

Bad code smells have been defined as indicators of potential problems in source code. Techniques to identify and mitigate bad code smells have been proposed and studied. Recently bad test code smells (test smells for short) have been put forward as a kind of bad code smell specific to tests such a unit tests. What has been missing is empirical investigation into the prevalence and impact of bad test code smells. Two studies aimed at providing this missing empirical data are presented. The first study finds that there is a high diffusion of test smells in both open source and industrial software systems with 86 % of JUnit tests exhibiting at least one test smell and six tests having six distinct test smells. The second study provides evidence that test smells have a strong negative impact on program comprehension and maintenance. Highlights from this second study include the finding that comprehension is 30 % better in the absence of test smells.

Mining Version Histories for Detecting Code Smells

F. Palomba*, G. Bavota, M. Di Penta, R. Oliveto, A. De Lucia, D. Poshyvanyk
Journal PaperIEEE Transactions on Software Engineering, 41(5): 462-489, 2015. IEEE press.

Abstract

Code smells are symptoms of poor design and implementation choices that may hinder code comprehension, and possibly increase change- and fault-proneness. While most of the detection techniques just rely on structural information, many code smells are intrinsically characterized by how code elements change over time. In this paper, we propose HIST (Historical Information for Smell deTection), an approach exploiting change history information to detect instances of five different code smells, namely Divergent Change, Shotgun Surgery, Parallel Inheritance, Blob, and Feature Envy.We evaluate HIST in two empirical studies. The first, conducted on twenty open source projects, aimed at assessing the accuracy of HIST in detecting instances of the code smells mentioned above. The results indicate that the precision of HIST ranges between 72% and 86%, and its recall ranges between 58% and 100%. Also, results of the first study indicate that HIST is able to identify code smells that cannot be identified by competitive approaches solely based on code analysis of a single system’s snapshot. Then, we conducted a second study aimed at investigating to what extent the code smells detected by HIST (and by competitive code analysis techniques) reflect developers’ perception of poor design and implementation choices. We involved twelve developers of four open source projects that recognized more than 75% of the code smell instances identified by HIST as actual design/implementation problems.

The Impact of API Change- and Fault-Proneness on the User Ratings of Android Apps

G. Bavota, M. Linares-Vasquez, C. Bernal-Cardenas, M. Di Penta, R. Oliveto, D. Poshyvanyk
Journal PaperIEEE Transactions on Software Engineering, 41(4): 384-407, 2015. IEEE press.

Abstract

The mobile apps market is one of the fastest growing areas in the information technology. In digging their market share, developers must pay attention to building robust and reliable apps. In fact, users easily get frustrated by repeated failures, crashes, and other bugs; hence, they abandon some apps in favor of their competition. In this paper we investigate how the faultand change-proneness of APIs used by Android apps relates to their success estimated as the average rating provided by the users to those apps. First, in a study conducted on 5,848 (free) apps, we analyzed how the ratings that an app had received correlated with the fault- and change-proneness of the APIs such app relied upon. After that, we surveyed 45 professional Android developers to assess (i) to what extent developers experienced problems when using APIs, and (ii) how much they felt these problems could be the cause for unfavorable user ratings. The results of our studies indicate that apps having high user ratings use APIs that are less fault- and change-prone than the APIs used by low rated apps. Also, most of the interviewed Android developers observed, in their development experience, a direct relationship between problems experienced with the adopted APIs and the users’ ratings that their apps received.

Improving Multi-Objective Test Case Selection by Injecting Diversity in Genetic Algorithms

A. Panichella*, R. Oliveto, M. Di Penta, A. De Lucia
Journal PaperIEEE Transactions on Software Engineering, 41(4): 358-383, 2015. IEEE press.

Abstract

A way to reduce the cost of regression testing consists of selecting or prioritizing subsets of test cases from a test suite according to some criteria. Besides greedy algorithms, cost cognizant additional greedy algorithms, multi-objective optimization algorithms, and Multi-Objective Genetic Algorithms (MOGAs), have also been proposed to tackle this problem. However, previous studies have shown that there is no clear winner between greedy and MOGAs, and that their combination does not necessarily produce better results. In this paper we show that the optimality of MOGAs can be significantly improved by diversifying the solutions (sub-sets of the test suite) generated during the search process. Specifically, we introduce a new MOGA, coined as DIV-GA (DIversity based Genetic Algorithm), based on the mechanisms of orthogonal design and orthogonal evolution that increase diversity by injecting new orthogonal individuals during the search process. Results of an empirical study conducted on eleven programs show that DIV-GA outperforms both greedy algorithms and the traditional MOGAs from the optimality point of view. Moreover, the solutions (sub-sets of the test suite) provided by DIV-GA are able to detect more faults than the other algorithms, while keeping the same test execution cost.

A Fine-grained Analysis of the Support Provided by UML Class Diagrams and ER Diagrams During Data Model Maintenance

G. Bavota, C. Gravino, R. Oliveto, A. De Lucia, G. Tortora, M. Genero, J. A. Cruz-Lemus
Journal PaperJournal of Software and System Modeling, 14(1): 287-306, 2015. Springer press.

Abstract

This paper presents the results of an empirical study aiming at comparing the support provided by ER and UML class diagrams during maintenance of data models. We performed one controlled experiment and two replications that focused on comprehension activities (the first activity in the maintenance process) and another controlled experiment on modification activities related to the implementation of given change requests. The results achieved were analyzed at a fine-grained level aiming at comparing the support given by each single building block of the two notations. Such an analysis is used to identify weaknesses (i.e., building blocks not easy to comprehend) in a notation and/or can justify the need of preferring ER or UML for data modeling. The analysis revealed that the UML class diagrams generally provided a better support for both comprehension and modification activities performed on data models as compared to ER diagrams. Nevertheless, the former has some weaknesses related to three building blocks, i.e., multi-value attribute, composite attribute, and weak entity. These findings suggest that an extension of UML class diagrams should be considered to overcome these weaknesses and improve the support provided by UML class diagrams during maintenance of data models.

On the Role of Developer’s Scattered Changes in Bug Prediction

D. Di Nucci, F. Palomba*, S. Siravo*, G. Bavota, R. Oliveto, A. De Lucia
Conference paper31st IEEE International Conference on Software Maintenance and Evolution, pages 241-250, Bremen, Germany, 2015. Acceptance Rate: 32/148 (21.6%).

Abstract

The importance of human-related factors in the introduction of bugs has recently been the subject of a number of empirical studies. However, these observations have not been captured yet in bug prediction models which simply exploit product metrics or process metrics based on the number and type of changes or on the number of developers working on a software component. Some previous studies have demonstrated that focused developers are less prone to introduce defects than non focused developers. According to this observation, software components changed by focused developers should also be less error prone than software components changed by less focused developers. In this paper we capture this observation by measuring the structural and semantic scattering of changes performed by the developers working on a software component and use these two measures to build a bug prediction model. Such a model has been evaluated on five open source systems and compared with two competitive prediction models: the first exploits the number of developers working on a code component in a given time period as predictor, while the second is based on the concept of code change entropy. The achieved results show the superiority of our model with respect to the two competitive approaches, and the complementarity of the defined scattering measures with respect to standard predictors commonly used in the literature.

User Reviews Matter! Tracking Crowdsourced Reviews to Support Evolution of Successful Apps

F. Palomba*, M. Linares Vasquez, G. Bavota, R. Oliveto, M. Di Penta, D. Poshyvanyk, A. De Lucia
Conference paper31st IEEE International Conference on Software Maintenance and Evolution, pages 291-300, Bremen, Germany, 2015. Acceptance Rate: 32/148 (21.6%).

Abstract

Nowadays software applications, and especially mobile apps, undergo frequent release updates through app stores. After installing/updating apps, users can post reviews and provide ratings, expressing their level of satisfaction with apps, and possibly pointing out bugs or desired features. In this paper we show—by performing a study on 100 Android apps—how applications addressing user reviews increase their success in terms of rating. Specifically, we devise an approach, named CRISTAL, for tracing informative crowd reviews onto source code changes, and for monitoring the extent to which developers accommodate crowd requests and follow-up user reactions as reflected in their ratings. The results indicate that developers implementing user reviews are rewarded in terms of ratings. This poses the need for specialized recommendation systems aimed at analyzing informative crowd reviews and prioritizing feedback to be satisfied in order to increase the apps success.

Optimizing Energy Consumption of GUIs in Android Apps: A Multi-objective Approach

M. Linares-Vasquez, G. Bavota, C. Bernal-Cardenas, R. Oliveto, M. Di Penta, D. Poshyvanyk
Conference paper10th Joint Meeting of the European Software Engineering Conference and the 23rd ACM SIGSOFT Symposium on the Foundations of Software Engineering, pages 143-154, Bergamo, Italy. Acceptance Rate: 74/291 (25.4%).

Abstract

The wide diffusion of mobile devices has motivated research towards optimizing energy consumption of software systems - including apps - targeting such devices. Besides efforts aimed at dealing with various kinds of energy bugs, the adoption of Organic Light-Emitting Diode (OLED) screens has motivated research towards reducing energy consumption by choosing an appropriate color palette. Whilst past research in this area aimed at optimizing energy while keeping an acceptable level of contrast, this paper proposes an approach, named GEMMA (Gui Energy Multi-objective optiMization for Android apps), for generating color palettes using a multi-objective optimization technique, which produces color solutions optimizing energy consumption and contrast while using consistent colors with respect to the original color palette. An empirical evaluation that we performed on 25 Android apps demonstrates not only significant improvements in terms of the three different objectives, but also confirmed that in most cases users still perceived the choices of colors as attractive. Finally, for several apps we interviewed the original developers, who in some cases expressed the intent to adopt the proposed choice of color palette, whereas in other cases pointed out directions for future improvements.

Query-based Configuration of Text Retrieval Solutions for Software Engineering Tasks

L. Moreno, G. Bavota, S. Haiduc, M. Di Penta, R. Oliveto, B. Russo, A. Marcus
Conference paper10th Joint Meeting of the European Software Engineering Conference and the 23rd ACM SIGSOFT Symposium on the Foundations of Software Engineering, pages 567-578, Bergamo, Italy. Acceptance Rate: 74/291 (25.4%).

Abstract

Text Retrieval (TR) approaches have been used to leverage the textual information contained in software artifacts to address a multitude of software engineering tasks. However, TR approaches need to be configured properly in order to lead to good results. Current approaches for automatic TR configuration in SE configure a single TR approach and then use it for all possible queries that can be formulated. In this paper, we show that such a configuration strategy leads to suboptimal results and propose QUEST, the first approach bringing TR configuration selection to the query level. QUEST recommends the best TR configuration for a given query, based on a supervised learning approach which determines the TR configuration that performs the best for each query based on its properties. We evaluated QUEST in the context of feature and bug localization, using a dataset with more than 1,000 queries. We found that QUEST is able to recommend one of the top three TR configurations for a query with a 69% accuracy, on average. We compared the results obtained with the configurations recommended by QUEST for every query with those obtained using a single TR configuration for all queries in a system and in the entire dataset. We found that using QUEST we obtain better results than with any of the considered TR configurations.

Landfill: an Open Dataset of Code Smells with Public Evaluation

F. Palomba*, D. Di Nucci, M. Tufano, G. Bavota, R. Oliveto, D. Poshyvanyk, and A. De Lucia
Tool demo paper12th Working Conference on Mining Software Repositories, pages 482-485, Florence, Italy, 2015. Acceptance Rate: 17/25 (68%).

Abstract

Code smells are symptoms of poor design and implementation choices that may hinder code comprehension and possibly increase change- and fault-proneness of source code. Several techniques have been proposed in the literature for detecting code smells. These techniques are generally evaluated by comparing their accuracy on a set of detected candidate code smells against a manually-produced oracle. Unfortunately, such comprehensive sets of annotated code smells are not available in the literature with only few exceptions. In this paper we contribute (i) a dataset of 243 instances of five types of code smells identified from 20 open source software projects, (ii) a systematic procedure for validating code smell datasets, (iii) LANDFILL, a Web-based platform for sharing code smell datasets, and (iv) a set of APIs for programmatically accessing LANDFILL's contents. Anyone can contribute to Landfill by (i) improving existing datasets (e.g., adding missing instances of code smells, flagging possibly incorrectly classified instances), and (ii) sharing and posting new datasets. Landfill is available at www.sesa.unisa.it/landfill/, while the video demonstrating its features in action is available at http://www.sesa.unisa.it/tools/landfill.jsp.

Extract Package Refactoring in ARIES

F. Palomba*, M. Tufano, G. Bavota, R. Oliveto, A. Marcus, D. Poshyvanyk, and A. De Lucia
Tool demo paper37th International Conference on Software Engineering, pages 669-672, Florence, Italy, 2015. Acceptance Rate: 25/42 (59%).

Abstract

Software evolution often leads to the degradation of software design quality. In Object-Oriented (OO) systems, this often results in packages that are hard to understand and maintain, as they group together heterogeneous classes with unrelated responsibilities. In such cases, state-of-the-art re-modularization tools solve the problem by proposing a new organization of the existing classes into packages. However, as indicated by recent empirical studies, such approaches require changing thousands of lines of code to implement the new recommended modularization. In this demo, we present the implementation of an Extract Package refactoring approach in ARIES (Automated Refactoring In EclipSe), a tool supporting refactoring operations in Eclipse. Unlike state-of-the-art approaches, ARIES automatically identifies and removes single low-cohesive packages from software systems, which represent localized design flaws in the package organization, with the aim to incrementally improve the overall quality of the software modularisation.

When and Why Your Code Starts to Smell Bad

M. Tufano, F. Palomba*, G. Bavota, R. Oliveto, M. Di Penta, A. De Lucia, and D. Poshyvanyk
Conference paper37th International Conference on Software Engineering, pages 403-414, Florence, Italy, 2015. Acceptance Rate: 84/452 (18%).

Abstract

In past and recent years, the issues related to managing technical debt received significant attention by researchers from both industry and academia. There are several factors that contribute to technical debt. One of these is represented by code bad smells, i.e. symptoms of poor design and implementation choices. While the repercussions of smells on code quality have been empirically assessed, there is still only anecdotal evidence on when and why bad smells are introduced. To fill this gap, we conducted a large empirical study over the change history of 200 open source projects from different software ecosystems and investigated when bad smells are introduced by developers, and the circumstances and reasons behind their introduction. Our study required the development of a strategy to identify smell-introducing commits, the mining of over 0.5M commits, and the manual analysis of 9,164 of them (i.e. those identified as smell-introducing). Our findings mostly contradict common wisdom stating that smells are being introduced during evolutionary tasks. In the light of our results, we also call for the need to develop a new generation of recommendation systems aimed at properly planning smell refactoring activities.

How Can I Use This Method?

L. Moreno, G. Bavota, M. Di Penta, R. Oliveto, and A. Marcus
Conference paper37th International Conference on Software Engineering, pages 880-890, Florence, Italy, 2015. Acceptance Rate: 84/452 (18%).

Abstract

Code examples are small source code fragments whose purpose is to illustrate how a programming language construct, an API, or a specific function/method works. Since code examples are not always available in the software documentation, researchers have proposed techniques to automatically extract them from existing software or to mine them from developer discussions. In this paper we propose MUSE (Method USage Examples), an approach for mining and ranking actual code examples that show how to use a specific method. MUSE combines static slicing (to simplify examples) with clone detection (to group similar examples), and uses heuristics to select and rank the best examples in terms of reusability, understandability, and popularity. MUSE has been empirically evaluated using examples mined from six libraries, by performing three studies involving a total of 140 developers to: (i) evaluate the selection and ranking heuristics, (ii) provide their perception on the usefulness of the selected examples, and (iii) perform specific programming tasks using the MUSE examples. The results indicate that MUSE selects and ranks examples close to how humans do, most of the code examples (82%). are perceived as useful, and they actually help when performing programming tasks.

Anti-Pattern Detection: Methods, Challenges, and Open Issues

F. Palomba*, G. Bavota, R. Oliveto, A. De Lucia
Book chapterAdvances in Computers volume 95: 201-238. A. Memon (ed.), 2015. Elsevier press.

Abstract

Anti-patterns are poor solutions to recurring design problems. They occur in object-oriented systems when developers unwillingly introduce them while designing and implementing the classes of their systems. Several empirical studies have highlighted that anti-patterns have a negative impact on the comprehension and maintainability of a software systems. Consequently, their identification has received recently more attention from both researchers and practitioners who have proposed various approaches to detect them. This chapter discusses on the approaches proposed in the literature. In addition, from the analysis of the state of the art, we will (i) derive a set of guidelines for building and evaluating recommendation systems supporting the detection of anti-patterns; and (ii) discuss some problems that are still open, to trace future research directions in the field. For this reason, the chapter provides a support to both researchers, who are interested in comprehending the results achieved so far in the identification of anti-patterns, and practitioner, who are interested in adopting a tool to identify anti-patterns in their software systems.

2014

Automating Extract Class Refactoring: an Improved Method and its Evaluation

G. Bavota*, A. De Lucia, A. Marcus, and R. Oliveto
Journal PaperEmpirical Software Engineering, 19(6): 1617-1664, 2014. Springer press.

Abstract

During software evolution the internal structure of the system undergoes continuous modifications. These continuous changes push away the source code from its original design, often reducing its quality, including class cohesion. In this paper we propose a method for automating the Extract Class refactoring. The proposed approach analyzes (structural and semantic) relationships between the methods in a class to identify chains of strongly related methods. The identified method chains are used to define new classes with higher cohesion than the original class, while preserving the overall coupling between the new classes and the classes interacting with the original class. The proposed approach has been first assessed in an artificial scenario in order to calibrate the parameters of the approach. The data was also used to compare the new approach with previous work. Then it has been empirically evaluated on real Blobs from existing open source systems in order to assess how good and useful the proposed refactoring solutions are considered by software engineers and how well the proposed refactorings approximate refactorings done by the original developers. We found that the new approach outperforms a previously proposed approach and that developers find the proposed solutions useful in guiding refactorings.

Labeling Source Code with Information Retrieval Methods: An Empirical Study

A. De Lucia, M. Di Penta, R. Oliveto, A. Panichella*, S. Panichella
Journal PaperEmpirical Software Engineering, 19(5): 1383-1420, 2014. Springer press.

Abstract

To support program comprehension, software artifacts can be labeled—for example within software visualization tools—with a set of representative words, hereby referred to as labels. Such labels can be obtained using various approaches, including Information Retrieval (IR) methods or other simple heuristics. They provide a bird-eye’s view of the source code, allowing developers to look over software components fast and make more informed decisions on which parts of the source code they need to analyze in detail. However, few empirical studies have been conducted to verify whether the extracted labels make sense to software developers. This paper investigates (i) to what extent various IR techniques and other simple heuristics overlap with (and differ from) labeling performed by humans; (ii) what kinds of source code terms do humans use when labeling software artifacts; and (iii) what factors—in particular what characteristics of the artifacts to be labeled—influence the performance of automatic labeling techniques. We conducted two experiments in which we asked a group of students (38 in total) to label 20 classes from two Java software systems, JHotDraw and eXVantage. Then, we analyzed to what extent the words identified with an automated technique—including Vector Space Models, Latent Semantic Indexing (LSI), latent Dirichlet allocation (LDA), as well as customized heuristics extracting words from specific source code elements—overlap with those identified by humans. Results indicate that, in most cases, simpler automatic labeling techniques—based on the use of words extracted from class and method names as well as from class comments—better reflect human-based labeling. Indeed, clustering-based approaches (LSI and LDA) are more worthwhile to be used for source code artifacts having a high verbosity, as well as for artifacts requiring more effort to be manually labeled. The obtained results help to define guidelines on how to build effective automatic labeling techniques, and provide some insights on the actual usefulness of automatic labeling techniques during program comprehension tasks.

Methodbook: Recommending Move Method Refactorings via Relational Topic Models

G. Bavota*, R. Oliveto, M. Gethers, D. Poshyvanik, A. De Lucia
Journal PaperIEEE Transactions on Software Engineering, 40(7): 671-694, 2014. IEEE press.

Abstract

During software maintenance and evolution the internal structure of the software system undergoes continuous changes. These modifications drift the source code away from its original design, thus deteriorating its quality, including cohesion and coupling of classes. Several refactoring methods have been proposed to overcome this problem. In this paper we propose a novel technique to identify Move Method refactoring opportunities and remove the Feature Envy bad smell from source code. Our approach, coined as Methodbook, is based on relational topic models (RTM), a probabilistic technique for representing and modeling topics, documents (in our case methods) and known relationships among these. Methodbook uses RTM to analyze both structural and textual information gleaned from software to better support move method refactoring. We evaluated Methodbook in two case studies. The first study has been executed on six software systems to analyze if the move method operations suggested by Methodbook help to improve the design quality of the systems as captured by quality metrics. The second study has been conducted with eighty developers that evaluated the refactoring recommendations produced by Methodbook. The achieved results indicate that Methodbook provides accurate and meaningful recommendations for move method refactoring operations.

REPENT: Analyzing the Nature of Identifier Renamings

V. Arnaoudova, L. Eshkevari, M. Di Penta, R. Oliveto, G. Antoniol, Y.-G. Guéhéneuc
Journal PaperIEEE Transactions on Software Engineering, 40(5): 502-532, 2014. IEEE press.

Abstract

Source code lexicon plays a paramount role in software quality: poor lexicon can lead to poor comprehensibility and even increase software fault-proneness. For this reason, renaming a program entity, i.e., altering the entity identifier, is an important activity during software evolution. Developers rename when they feel that the name of an entity is not (anymore) consistent with its functionality, or when such a name may be misleading. A survey that we performed with 71 developers suggests that 39 percent perform renaming from a few times per week to almost every day and that 92 percent of the participants consider that renaming is not straightforward. However, despite the cost that is associated with renaming, renamings are seldom if ever documented—for example, less than 1 percent of the renamings in the five programs that we studied. This explains why participants largely agree on the usefulness of automatically documenting renamings. In this paper we propose REanaming Program ENTities (REPENT), an approach to automatically document—detect and classify—identifier renamings in source code. REPENT detects renamings based on a combination of source code differencing and data flow analyses. Using a set of natural language tools, REPENT classifies renamings into the different dimensions of a taxonomy that we defined. Using the documented renamings, developers will be able to, for example, look up methods that are part of the public API (as they impact client applications), or look for inconsistencies between the name and the implementation of an entity that underwent a high risk renaming (e.g., towards the opposite meaning). We evaluate the accuracy and completeness of REPENT on the evolution history of five open-source Java programs. The study indicates a precision of 88 percent and a recall of 92 percent. In addition, we report an exploratory study investigating and discussing how identifiers are renamed in the five programs, according to our taxonomy.

Improving Software Modularization via Automated Analysis of Latent Topics and Dependencies

G. Bavota*, M. Gethers, R. Oliveto, D. Poshyvanik, A. De Lucia
Journal PaperACM Transactions on Software Engineering and Methodologies, 23(1): 4, 2014. ACM press.

Abstract

Oftentimes, during software maintenance the original program modularization decays, thus reducing its quality. One of the main reasons for such architectural erosion is suboptimal placement of source-code classes in software packages. To alleviate this issue, we propose an automated approach to help developers improve the quality of software modularization. Our approach analyzes underlying latent topics in source code as well as structural dependencies to recommend (and explain) refactoring operations aiming at moving a class to a more suitable package. The topics are acquired via Relational Topic Models (RTM), a probabilistic topic modeling technique. The resulting tool, coined as R3 (Rational Refactoring via RTM), has been evaluated in two empirical studies. The results of the first study conducted on nine software systems indicate that R3 provides a coupling reduction from 10% to 30% among the software modules. The second study with 62 developers confirms that R3 is able to provide meaningful recommendations (and explanations) for move class refactoring. Specifically, more than 70% of the recommendations were considered meaningful from a functional point of view.

Recovering Test-To-Code Traceability Using Slicing and Textual Analysis

A. Qusef*, G. Bavota*, R. Oliveto, A. De Lucia, D. Binkley
Journal PaperJournal of Systems and Software, 88: 147-168, 2014. Elsevier press.

Abstract

Test suites are a valuable source of up-to-date documentation as developers continuously modify them to reflect changes in the production code and preserve an effective regression suite. While maintaining traceability links between unit test and the classes under test can be useful to selectively retest code after a change, the value of having traceability links goes far beyond this potential savings. One key use is to help developers better comprehend the dependencies between tests and classes and help maintain consistency during refactoring. Despite its importance, test-to-code traceability is not common in software development and, when needed, traceability information has to be recovered during software development and evolution. We propose an advanced approach, named SCOTCH+ (Source code and COncept based Test to Code traceability Hunter), to support the developer during the identification of links between unit tests and tested classes. Given a test class, represented by a JUnit class, the approach first exploits dynamic slicing to identify a set of candidate tested classes. Then, external and internal textual information associated with the classes retrieved by slicing is analyzed to refine this set of classes and identify the final set of candidate tested classes. The external information is derived from the analysis of the class name, while internal information is derived from identifiers and comments. The approach is evaluated on five software systems. The results indicate that the accuracy of the proposed approach far exceeds the leading techniques found in the literature.

Enhancing Software Artefact Traceability Recovery Processes with Link Count Information

G. Bavota*, A. De Lucia, R. Oliveto, G. Tortora
Journal PaperInformation and Software Technology, 56(2): 163-182, 2014. Elsevier press.

Abstract

Context: The intensive human effort needed to manually manage traceability information has increased the interest in using semi-automated traceability recovery techniques. In particular, Information Retrieval (IR) techniques have been largely employed in the last ten years to partially automate the traceability recovery process. Aim: Previous studies mainly focused on the analysis of the performances of IR-based traceability recovery methods and several enhancing strategies have been proposed to improve their accuracy. Very few papers investigate how developers (i) use IR-based traceability recovery tools and (ii) analyse the list of suggested links to validate correct links or discard false positives. We focus on this issue and suggest exploiting link count information in IR-based traceability recovery tools to improve the performances of the developers during a traceability recovery process. Method: Two empirical studies have been conducted to evaluate the usefulness of link count information. The two studies involved 135 University students that had to perform (with and without link count information) traceability recovery tasks on two software project repositories. Then, we evaluated the quality of the recovered traceability links in terms of links correctly and erroneously traced by the students. Results: The results achieved indicate that the use of link count information significantly increases the number of correct links identified by the participants. Conclusions: The results can be used to derive guidelines on how to effectively use traceability recovery approaches and tools proposed in the literature.

Automatic Generation of Release Notes

L. Moreno, G. Bavota, M. Di Penta, R. Oliveto, A. Marcus, G. Canfora
Conference paper22nd ACM SIGSOFT International Symposium on the Foundations of Software Engineering, pages 484-495, Hong Kong, 2014. Acceptance Rate: 61/273 (22%).

Abstract

This paper introduces ARENA (Automatic RElease Notes generAtor), an approach for the automatic generation of release notes. ARENA extracts changes from the source code, summarizes them, and integrates them with information from versioning systems and issue trackers. It was designed based on the manual analysis of 1,000 existing release notes. To evaluate the quality of the ARENA release notes, we performed three empirical studies involving a total of 53 participants (45 professional developers and 8 students). The results indicate that the ARENA release notes are very good approximations of those produced by the developers and often include important information that is missing in the manually produced release notes.

Recommending Refactorings based on Team Co-Maintenance Patterns

G. Bavota, S. Panichella, N. Tsantalis, M. Di Penta, R. Oliveto, and G. Canfora
Conference paper29th IEEE/ACM International Conference on Automated Software Engineering, 6 pages, Vasteras, Sweden, 2014. Acceptance Rate: 82/337 (24%).

Abstract

Refactoring aims at restructuring existing source code when undisciplined development activities have deteriorated its comprehensibility and maintainability. There exist various approaches for suggesting refactoring opportunities, based on different sources of information, e.g., structural, semantic, and historical. In this paper we claim that an additional source of information for identifying refactoring opportunities, sometimes orthogonal to the ones mentioned above, is team development activity. When the activity of a team working on common modules is not aligned with the current design structure of a system, it would be possible to recommend appropriate refactoring operations - e.g., extract class/method/package - to adjust the design according to the teams' activity patterns. Results of a preliminary study - conducted in the context of extract class refactoring - show the feasibility of the approach, and also suggest that this new refactoring dimension can be complemented with others to build better refactoring recommendation tools.

Prompter: A Self-confident Recommender System

L. Ponzanelli, G. Bavota, M. Di Penta, R. Oliveto, M. Lanza
Tool demo paper30th International Conference on Software Maintenance and Evolution, Victoria, Canada. 4 Pages. IEEE press. Acceptance Rate: 14/27 (52%).

Abstract

Developers often consult different sources of information like Application Programming Interfaces (API) documentation, forums, Q&A websites, etc. With the aim of gathering additional knowledge for the programming task at hand. The process of searching and identifying valuable pieces of information requires developers to spend time and energy in formulating the right queries, assessing the returned results, and integrating the obtained knowledge into the code base. All of this is often done manually. We present Prompter, a plug-in for the Eclipse IDE which automatically searches and identifies relevant Stack Overflow discussions, evaluates their relevance given the code context in the IDE, and notifies the developer if and only if a user-defined confidence threshold is surpassed.

Do they Really Smell Bad? A Study on Developers' Perception of Code Bad Smells

F. Palomba*, G. Bavota, M. Di Penta, R. Oliveto, A. De Lucia
Conference paper30th International Conference on Software Maintenance and Evolution, Victoria, Canada. 10 Pages. IEEE press. Acceptance Rate: 40/210 (19%).

Abstract

In the last decade several catalogues have been defined to characterize bad code smells, i.e., symptoms of poor design and implementation choices. On top of such catalogues, researchers have defined methods and tools to automatically detect and/or remove bad smells. Nevertheless, there is an ongoing debate regarding the extent to which developers perceive bad smells as serious design problems. Indeed, there seems to be a gap between theory and practice, i.e., what is believed to be a problem (theory) and what is actually a problem (practice). This paper presents a study aimed at providing empirical evidence on how developers perceive bad smells. In this study, we showed to developers code entities - belonging to three systems - affected and not by bad smells, and we asked them to indicate whether the code contains a potential design problem, and if any, the nature and severity of the problem. The study involved both original developers from the three projects and outsiders, namely industrial developers and Master's students. The results provide insights on characteristics of bad smells not yet explored sufficiently. Also, our findings could guide future research on approaches for the detection and removal of bad smells.

How the Evolution of Emerging Collaborations Relates to Code Changes: an Empirical Study

S. Panichella, G. Canfora, R. Oliveto, M. Di Penta
Conference paper22nd International Conference on Program Comprehension, pages 117-188, Hyderabad, India, 2014. ACM press. Acceptance Rate: 29/85 (34%).

Abstract

Developers contributing to open source projects spontaneously group into "emerging'' teams, reflected by messages exchanged over mailing lists, issue trackers and other communication means. Previous studies suggested that such teams somewhat mirror the software modularity. This paper empirically investigates how, when a project evolves, emerging teams re-organize themselves-e.g., by splitting or merging. We relate the evolution of teams to the files they change, to investigate whether teams split to work on cohesive groups of files. Results of this study-conducted on the evolution history of four open source projects, namely Apache httpd, Eclipse JDT, Netbeans, and Samba-provide indications of what happens in the project when teams reorganize. Specifically, we found that emerging team splits imply working on more cohesive groups of files and emerging team merges imply working on groups of files that are cohesive from structural perspective. Such indications serve to better understand the evolution of software projects. More important, the observation of how emerging teams change can serve to suggest software remodularization actions.

How do API Changes Trigger Stack Overflow Discussions? A Study on the Android SDK

M. L. Vàsquez, G. Bavota, M. Di Penta, R. Oliveto, D. Poshyvanyk
Conference paper22nd International Conference on Program Comprehension, pages 83-94, Hyderabad, India, 2014. ACM press. Acceptance Rate: 29/85 (34%).

Abstract

The growing number of questions related to mobile development in StackOverflow highlights an increasing interest of software developers in mobile programming. For the Android platform, 213,836 questions were tagged with Android-related labels in StackOverflow between July 2008 and August 2012. This paper aims at investigating how changes occurring to Android APIs trigger questions and activity in StackOverflow, and whether this is particularly true for certain kinds of changes. Our findings suggest that Android developers usually have more questions when the behavior of APIs is modified. In addition, deleting public methods from APIs is a trigger for questions that are (i) more discussed and of major interest for the community, and (ii) posted by more experienced developers. In general, results of this paper provide important insights about the use of social media to learn about changes in software ecosystems, and establish solid foundations for building new recommenders for notifying developers/managers about important changes and recommending them relevant crowdsourced solutions.

Mining Energy-Greedy API Usage Patterns in Android Apps: an Empirical Study

M. L. Vàsquez, G. Bavota, C. Bernal-Càrdenas, R. Oliveto, M. Di Penta, D. Poshyvanyk
Conference paper11th Working Conference on Mining Software Repositories, pages 2-11, Hyderabad, India, 2014. ACM press. Acceptance Rate: 29/85 (34%).

Abstract

Energy consumption of mobile applications is nowadays a hot topic, given the widespread use of mobile devices. The high demand for features and improved user experience, given the available powerful hardware, tend to increase the apps’ energy consumption. However, excessive energy consumption in mobile apps could also be a consequence of energy greedy hardware, bad programming practices, or particular API usage patterns. We present the largest to date quantitative and qualitative empirical investigation into the categories of API calls and usage patterns that—in the context of the Android development framework—exhibit particularly high energy consumption profiles. By using a hardware power monitor, we measure energy consumption of method calls when executing typical usage scenarios in 55 mobile apps from different domains. Based on the collected data, we mine and analyze energy-greedy APIs and usage patterns. We zoom in and discuss the cases where either the anomalous energy consumption is unavoidable or where it is due to suboptimal usage or choice of APIs. Finally, we synthesize our findings into actionable knowledge and recipes for developers on how to reduce energy consumption while using certain categories of Android APIs and patterns

Mining StackOverflow to Turn the IDE into a Self-confident Programming Prompter

L. Ponzanelli, G. Bavota, M. Di Penta, R. Oliveto, M. Lanza
Conference paper11th Working Conference on Mining Software Repositories, pages 102-111, Hyderabad, India, 2014. ACM press. Acceptance Rate: 29/85 (34%).

Abstract

Developers often require knowledge beyond the one they possess, which often boils down to consulting sources of information like Application Programming Interfaces (API) documentation, forums, Q&A websites, etc. Knowing what to search for and how is non- trivial, and developers spend time and energy to formulate their problems as queries and to peruse and process the results. We propose a novel approach that, given a context in the IDE, automatically retrieves pertinent discussions from Stack Overflow, evaluates their relevance, and, if a given confidence threshold is surpassed, notifies the developer about the available help. We have implemented our approach in Prompter, an Eclipse plug-in. Prompter has been evaluated through two studies. The first was aimed at evaluating the devised ranking model, while the second was conducted to evaluate the usefulness of Prompter.

In Medio Stat Virtus: Extract Class Refactoring through Nash Equilibria

G. Bavota*, R. Oliveto, A. De Lucia, A. Marcus, Y.-G. Guéhéneuc, G. Antoniol
Conference paper1st Software Evolution Week (joint meeting of the 21st International Working Conference on Reverse Engineering and the 18th European Conference on Software Maintenance and Reengineering), pages 214-223, Antwerp, Belgium, 2014. IEEE press. Acceptance Rate: 27/87 (31%).

Abstract

Extract Class refactoring (ECR) is used to divide large classes with low cohesion into smaller, more cohesive classes. However, splitting a class might result in increased coupling in the system due to new dependencies between the extracted classes. Thus, ECR requires that a software engineer identifies a trade off between cohesion and coupling. Such a trade off may be difficult to identify manually because of the high complexity of the class to be refactored. In this paper, we present an approach based on game theory to identify refactoring solutions that provide a compromise between the desired increment in cohesion and the undesired increment in coupling. The results of an empirical evaluation indicate that the approach identifies meaningful ECRs from a developer's point-of-view.

Cross-project Defect Prediction Models: L'Union fait la force

A. Panichella*, R. Oliveto A. De Lucia
Conference paper1st Software Evolution Week (joint meeting of the 21st International Working Conference on Reverse Engineering and the 18th European Conference on Software Maintenance and Reengineering), pages 164-173, Antwerp, Belgium, 2014. IEEE press. Acceptance Rate: 27/87 (31%).

Abstract

Existing defect prediction models use product or process metrics and machine learning methods to identify defect-prone source code entities. Different classifiers (e.g., linear regression, logistic regression, or classification trees) have been investigated in the last decade. The results achieved so far are sometimes contrasting and do not show a clear winner. In this paper we present an empirical study aiming at statistically analyzing the equivalence of different defect predictors. We also propose a combined approach, coined as CODEP (COmbined DEfect Predictor), that employs the classification provided by different machine learning techniques to improve the detection of defect-prone entities. The study was conducted on 10 open source software systems and in the context of cross-project defect prediction, that represents one of the main challenges in the defect prediction field. The statistical analysis of the results indicates that the investigated classifiers are not equivalent and they can complement each other. This is also confirmed by the superior prediction accuracy achieved by CODEP when compared to stand-alone defect predictors.

Recommending Refactoring Operations in Large Software Systems

G. Bavota*, A. De Lucia, A. Marcus, R. Oliveto
Book chapterRecommendation Systems in Software Engineering. M. Robillard, W. Maalej, R. J. Walker, and T. Zimmermann (eds.), 2014. Springer press.

Abstract

During its lifecycle, the internal structure of a software system undergoes continuous modifications. These changes push away the source code from its original design, often reducing its quality. In such cases, refactoring techniques can be applied to improve the readability and reducing the complexity of source code, to improve the architecture and provide for better software extensibility. Despite its advantages, performing refactoring in large and nontrivial software systems might be very challenging. Thus, a lot of effort has been devoted to the definition of automatic or semi-automatic approaches to support developer during software refactoring. Many of the proposed techniques are for recommending refactoring operations. In this chapter, we present guidelines on how to build such recommendation systems and how to evaluate them. We also highlight some of the challenges that exist in the field, pointing toward future research directions.

Search Based Software Maintenance: Methods and Tools

G. Bavota, M. Di Penta, R. Oliveto
Book chapterEvolving Software Systems. T. Mens, A. Serebrenik, A. Cleve (eds.), 2014. Springer press.

Abstract

Software evolution is an effort-prone activity, and requires developers to make complex and difficult decisions. This entails the development of automated approaches to support various software evolution-related tasks, for example aimed at suggesting refactoring or remodularization actions. Finding a solution to these problems is intrinsically NP-hard, and exhaustive approaches are not viable due to the size and complexity of many software projects. Therefore, during recent years, several software-evolution problems have been formulated as optimization problems, and resolved with meta-heuristics. This chapter overviews how search-based optimization techniques can support software engineers in a number of software evolution tasks. For each task, we illustrate how the problem can be encoded as a search-based optimization problem, and how meta-heuristics can be used to solve it. Where possible, we refer to some tools that can be used to deal with such tasks.

2013

Evaluating Test-to-Code Traceability Recovery Methods through Controlled Experiments

A. Qusef*, G. Bavota*, R. Oliveto, A. De Lucia, D. Binkley
Journal PaperEmpirical Software Engineering journal, 18(5): 901-932, 2013. Springer Press.

Abstract

Recently, different methods and tools have been proposed to automate or semi-automate test-to-code traceability recovery. Among these, Slicing and Coupling based Test to Code trace Hunter (SCOTCH) exploits slicing and conceptual coupling to identify the classes tested by a JUnit test. However, until now the evaluation of test-to-code traceability recovery methods has been limited to experiments assessing their tracing accuracy rather than the actual support these methods provide to a software engineer during traceability recovery tasks. Indeed, a research method or tool has a better chance of being transferred to practitioners if it is supported by empirical evidence. In this paper, we present the results of two controlled experiments carried out to evaluate the support given by SCOTCH during traceability recovery, when compared with other traceability recovery methods. The results show that SCOTCH is able to suggest a higher number of correct links with higher accuracy, thus sensibly improving the performances of software engineers during test-to-code traceability recovery tasks.

Using Structural and Semantic Measures to Improve Software Modularization

G. Bavota*, A. De Lucia, A. Marcus, and R. Oliveto
Journal PaperEmpirical Software Engineering journal, 18(5): 901-932, 2013. Springer Press.

Abstract

Changes during software evolution and poor design decisions often lead to packages that are hard to understand and maintain, because they usually group together classes with unrelated responsibilities. One way to improve such packages is to decompose them into smaller, more cohesive packages. The difficulty lies in the fact that most definitions and interpretations of cohesion are rather vague and the multitude of measures proposed by researchers usually capture only one aspect of cohesion. We propose a new technique for automatic re-modularization of packages, which uses structural and semantic measures to decompose a package into smaller, more cohesive ones. The paper presents the new approach as well as an empirical study, which evaluates the decompositions proposed by the new technique. The results of the evaluation indicate that the decomposed packages have better cohesion without a deterioration of coupling and the re-modularizations proposed by the tool are also meaningful from a functional point of view.

Improving IR-based Traceability Recovery via Noun-based Indexing of Software Artifacts

G. Capobianco, A. De Lucia, R. Oliveto, A. Panichella*, S. Panichella*
Journal PaperJournal of Software: Evolution and Process, 25(7): 743-762, 2013. Wiley InterScience Press.

Abstract

One of the most successful applications of textual analysis in software engineering is the use of information retrieval (IR) methods to reconstruct traceability links between software artifacts. Unfortunately, because of the limitations of both the humans developing artifacts and the IR techniques any IR-based traceability recovery method fails to retrieve some of the correct links, while on the other hand it also retrieves links that are not correct. This limitation has posed challenges for researchers that have proposed several methods to improve the accuracy of IR-based traceability recovery methods by removing the "noise" in the textual content of software artifacts (e.g., by removing common words or increasing the importance of critical terms). In this paper, we propose a heuristic to remove the "noise" taking into account the linguistic nature of words in the software artifacts. In particular, the language used in software documents can be classified as a technical language, where the words that provide more indication on the semantics of a document are the nouns. The results of a case study conducted on five software artifact repositories indicate that characterizing the context of software artifacts considering only nouns significantly improves the accuracy of IR-based traceability recovery methods.

Applying a Smoothing Filter to Improve IR-based Traceability Recovery Processes: An Empirical Investigation

A. De Lucia, M. Di Penta, R. Oliveto, A. Panichella*, S. Panichella*
Journal PaperInformation & Software Technologies, 55(4): 741-754, 2013. Elsevier press.

Abstract

Context: Traceability relations among software artifacts often tend to be missing, outdated, or lost. For this reason, various traceability recovery approaches—based on Information Retrieval (IR) techniques—have been proposed. The performances of such approaches are often influenced by "noise" contained in software artifacts (e.g., recurring words in document templates or other words that do not contribute to the retrieval itself). Aim: As a complement and alternative to stop word removal approaches, this paper proposes the use of a smoothing filter to remove "noise" from the textual corpus of artifacts to be traced. Method: We evaluate the effect of a smoothing filter in traceability recovery tasks involving different kinds of artifacts from five software projects, and applying three different IR methods, namely Vector Space Models, Latent Semantic Indexing, and Jensen–Shannon similarity model. Results: Our study indicates that, with the exception of some specific kinds of artifacts (i.e., tracing test cases to source code) the proposed approach is able to significantly improve the performances of traceability recovery, and to remove "noise" that simple stop word filters cannot remove. Conclusions: The obtained results not only help to develop traceability recovery approaches able to work in presence of noisy artifacts, but also suggest that smoothing filters can be used to improve performances of other software engineering approaches based on textual analysis.

Detecting Bad Smells in Source Code Using Change History Information

F. Palomba*, G. Bavota, M. Di Penta, R. Oliveto, A. De Lucia, D. Poshyvanyk
Conference paper28th IEEE/ACM International Conference on Automated Software Engineering, pages 268-278, Palo Alto, California, USA, 2013. ACM Press. ACM Distinguished Paper Award. Acceptance Rate: 74/317 (23%).

Abstract

Code smells represent symptoms of poor implementation choices. Previous studies found that these smells make source code more difficult to maintain, possibly also increasing its fault-proneness. There are several approaches that identify smells based on code analysis techniques. However, we observe that many code smells are intrinsically characterized by how code elements change over time. Thus, relying solely on structural information may not be sufficient to detect all the smells accurately. We propose an approach to detect five different code smells, namely Divergent Change, Shotgun Surgery, Parallel Inheritance, Blob, and Feature Envy, by exploiting change history information mined from versioning systems. We applied approach, coined as HIST (Historical Information for Smell deTection), to eight software projects written in Java, and wherever possible compared with existing state-of-the-art smell detectors based on source code analysis. The results indicate that HIST's precision ranges between 61% and 80%, and its recall ranges between 61% and 100%. More importantly, the results confirm that HIST is able to identify code smells that cannot be identified through approaches solely based on code analysis.

API Change- and Fault-proneness: a Threat to the Success of Android Apps

M. L. Vàsquez, G. Bavota, C. Bernal-Càrdenas, M. Di Penta, R. Oliveto, D. Poshyvanyk
Conference paper9th joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, pages 477-487, Saint Petersburg, Russian Federation, 2013. ACM Press. Acceptance Rate: 51/251 (20%).

Abstract

During the recent years, the market of mobile software applications (apps) has maintained an impressive upward trajectory. Many small and large software development companies invest considerable resources to target available opportunities. As of today, the markets for such devices feature over 850K+ apps for Android and 900K+ for iOS. Availability, cost, functionality, and usability are just some factors that determine the success or lack of success for a given app. Among the other factors, reliability is an important criteria: users easily get frustrated by repeated failures, crashes, and other bugs; hence, abandoning some apps in favor of others. This paper reports a study analyzing how the fault- and change-proneness of APIs used by 7,097 (free) Android apps relates to applications' lack of success, estimated from user ratings. Results of this study provide important insights into a crucial issue: making heavy use of fault- and change-prone APIs can negatively impact the success of these apps.

The Evolution of Project Inter-Dependencies in a Software Ecosystem: the Case of Apache

G. Bavota, G. Canfora, M. Di Penta, R. Oliveto, S. Panichella
Conference paper29th IEEE International Conference on Software Maintenance, pages 280-289, Eindhoven, the Netherlands, 2013. IEEE Press. Acceptance Rate: 36/163 (22%).

Abstract

Software ecosystems consist of multiple software projects, often interrelated each other by means of dependency relations. When one project undergoes changes, other projects may decide to upgrade the dependency. For example, a project could use a new version of another project because the latter has been enhanced or subject to some bug-fixing activities. This paper reports an exploratory study aimed at observing the evolution of the Java subset of the Apache ecosystem, consisting of 147 projects, for a period of 14 years, and resulting in 1,964 releases. Specifically, we analyze (i) how dependencies change over time, (ii) whether a dependency upgrade is due to different kinds of factors, such as different kinds of API changes or licensing issues, and (iii) how an upgrade impacts on a related project. Results of this study help to comprehend the phenomenon of library/component upgrade, and provides the basis for a new family of recommenders aimed at supporting developers in the complex (and risky) activity of managing library/component upgrade within their software projects.

An Empirical Investigation on Documentation Usage Patterns in Maintenance Tasks

G. Bavota, G. Canfora, M. Di Penta, R. Oliveto, S. Panichella
Conference paper29th IEEE International Conference on Software Maintenance, pages 210-219, Eindhoven, the Netherlands, 2013. IEEE Press. Acceptance Rate: 36/163 (22%).

Abstract

When developers perform a software maintenance task, they need to identify artifacts-e.g., classes or more specifically methods-that need to be modified. To this aim, they can browse various kind of artifacts, for example use case descriptions, UML diagrams, or source code. This paper reports the results of a study-conducted with 33 participants- aimed at investigating (i) to what extent developers use different kinds of documentation when identifying artifacts to be changed, and (ii) whether they follow specific navigation patterns among different kinds of artifacts. Results indicate that, although participants spent a conspicuous proportion of the available time by focusing on source code, they browse back and forth between source code and either static (class) or dynamic (sequence) diagrams. Less frequently, participants-especially more experienced ones-follow an "integrated" approach by using different kinds of artifacts.

Orthogonal exploration of the search space in evolutionary test case generation

F. M. Kifetew, A. Panichella*, A. De Lucia, R. Oliveto, P. Tonella
Conference paperInternational Symposium on Software Testing and Analysis, pages 257-267, Lugano, Switzerland, 2013. ACM Press. Acceptance Rate: 32/124 (26%).

Abstract

The effectiveness of evolutionary test case generation based on Genetic Algorithms (GAs) can be seriously impacted by genetic drift, a phenomenon that inhibits the ability of such algorithms to effectively diversify the search and look for alternative potential solutions. In such cases, the search becomes dominated by a small set of similar individuals that lead GAs to converge to a sub-optimal solution and to stagnate, without reaching the desired objective. This problem is particularly common for hard-to-cover program branches, associated with an extremely large solution space. In this paper, we propose an approach to solve this problem by integrating a mechanism for orthogonal exploration of the search space into standard GA. The diversity in the population is enriched by adding individuals in orthogonal directions, hence providing a more effective exploration of the solution space. To the best of our knowledge, no prior work has addressed explicitly the issue of evolution direction based diversification in the context of evolutionary testing. Results achieved on 17 Java classes indicate that the proposed enhancements make GA much more effective and efficient in automating the testing process. In particular, effectiveness (coverage) was significantly improved in 47% of the subjects and efficiency (search budget consumed) was improved in 85% of the subjects on which effectiveness remains the same.

Multi-Objective Cross-Project Defect Prediction

G. Canfora, A. De Lucia, M. Di Penta, R. Oliveto, A. Panichella*, S. Panichella
Conference paper6th IEEE International Conference on Software Testing, Verification and Validation, pages 252-261, Luxembourg, Luxembourg, 2013. IEEE Press. Acceptance Rate: 28/152 (25%).

Abstract

Cross-project defect prediction is very appealing because (i) it allows predicting defects in projects for which the availability of data is limited, and (ii) it allows producing generalizable prediction models. However, existing research suggests that cross-project prediction is particularly challenging and, due to heterogeneity of projects, prediction accuracy is not always very good. This paper proposes a novel, multi-objective approach for cross-project defect prediction, based on a multi-objective logistic regression model built using a genetic algorithm. Instead of providing the software engineer with a single predictive model, the multi-objective approach allows software engineers to choose predictors achieving a compromise between number of likely defect-prone artifacts (effectiveness) and LOC to be analyzed/tested (which can be considered as a proxy of the cost of code inspection). Results of an empirical evaluation on 10 datasets from the Promise repository indicate the superiority and the usefulness of the multi-objective approach with respect to single-objective predictors. Also, the proposed approach outperforms an alternative approach for cross-project prediction, based on local prediction upon clusters of similar classes.

Using Code Ownership to Improve IR-based Traceability Link Recovery

D. Diaz, G. Bavota*, A. Marcus, R. Oliveto, S. Takahashi, A. De Lucia
Conference paper21st IEEE International Conference on Program Comprehension, pages 123-132, San Francisco, California, USA, 2013. Acceptance Rate: 19/63 (30%).

Abstract

Information Retrieval (IR) techniques have gained wide-spread acceptance as a method for automating traceability recovery. These techniques recover links between software artifacts based on their textual similarity, i.e., the higher the similarity, the higher the likelihood that there is a link between the two artifacts. A common problem with all IR-based techniques is filtering out noise from the list of candidate links, in order to improve the recovery accuracy. Indeed, software artifacts may be related in many ways and the textual information captures only one aspect of their relationships. In this paper we propose to leverage code ownership information to capture relationships between source code artifacts for improving the recovery of traceability links between documentation and source code. Specifically, we extract the author of each source code component and for each author we identify the “context” she worked on. Thus, for a given query from the external documentation we compute the similarity between it and the context of the authors. When retrieving classes that relate to a specific query using a standard IR-based approach we reward all the classes developed by the authors having their context most similar to the query, by boosting their similarity to the query. The proposed approach, named TYRION (TraceabilitY link Recovery using Information retrieval and code OwNership), has been instantiated for the recovery of traceability links between use cases and Java classes of two software systems. The results indicate that code ownership information can be used to improve the accuracy of an IR-based traceability link recovery technique.

Configuring Topic Models for Software Engineering Tasks in TraceLab

B. Dit, A. Panichella*, E. Moritz, R. Oliveto, M. Di Penta, D. Poshyvanyk, A. De Lucia
Conference paper7th International Workshop on Traceability in Emerging Forms of Software Engineering - Challenge track, pages 105 - 109, San Francisco, California, USA, 2013. ACM Press. Acceptance Rate: 15/25 (60%).

Abstract

A number of approaches in traceability link recovery and other software engineering tasks incorporate topic models, such as Latent Dirichlet Allocation (LDA). Although in theory these topic models can produce very good results if they are configured properly, in reality their potential may be undermined by improper calibration of their parameters (e.g., number of topics, hyper-parameters), which could potentially lead to sub-optimal results. In our previous work we addressed this issue and proposed LDA-GA, an approach that uses Genetic Algorithms (GA) to find a near-optimal configuration of parameters for LDA, which was shown to produce superior results for traceability link recovery and other tasks than reported ad-hoc configurations. LDA-GA works by optimizing the coherence of topics produced by LDA for a given dataset. In this paper, we instantiate LDA-GA as a TraceLab experiment, making publicly available all the implemented components, the datasets and the results from our previous work. In addition, we provide guidelines on how to extend our LDA-GA approach to other IR techniques and other software engineering tasks using existing TraceLab components.

The Role of Artefact Corpus in LSI-based Traceability Recovery

G. Bavota*, A. De Lucia, R. Oliveto, A. Panichella*, F. Ricci*, G. Tortora
Conference paper7th International Workshop on Traceability in Emerging Forms of Software Engineering - Challenge track, pages 83 - 89, San Francisco, California, USA, 2013. ACM Press. Acceptance Rate: 15/25 (60%).

Abstract

Latent Semantic Indexing (LSI) is an advanced method widely and successfully employed in Information Retrieval (IR). It is an extension of Vector Space Model (VSM) and it is able to overcome VSM in canonical IR scenarios where it is used on very large document repositories. LSI has also been used to semi-automatically generate traceability links between software artefacts. However, in such a scenario LSI is not able to overcome VSM. This contradicting result is probably due to the different characteristics of software artefact repositories as compared to document repositories. In this paper we present a preliminary empirical study to analyze how the size and the vocabulary of the repository-in terms of number of documents and terms (i.e., the vocabulary)-affects the retrieval accuracy. Even if replications are needed to generalize our findings, the study presented in this paper provides some insights that might be used as guidelines for selecting the more adequate methods to be used for traceability recovery depending on the particular application context.

When and How Using Structural Information to Improve IR-based Traceability Recovery

A. Panichella*, C. McMillan, E. Moritz, D. Palmieri*, R. Oliveto, D. Poshyvanyk, A. De Lucia
Conference paper17th European Conference on Software Maintenance and Reengineering, pages 199-208, Genova, Italy, 2013. IEEE Press. Acceptance Rate: 29/80 (36%).

Abstract

Information Retrieval (IR) has been widely accepted as a method for automated traceability recovery based on the textual similarity among the software artifacts. However, a notorious difficulty for IR-based methods is that artifacts may be related even if they are not textually similar. A growing body of work addresses this challenge by combining IR-based methods with structural information from source code. Unfortunately, the accuracy of such methods is highly dependent on the IR methods. If the IR methods perform poorly, the combined approaches may perform even worse. In this paper, we propose to use the feedback provided by the software engineer when classifying candidate links to regulate the effect of using structural information. Specifically, our approach only considers structural information when the traceability links from the IR methods are verified by the software engineer and classified as correct links. An empirical evaluation conducted on three systems suggests that our approach outperforms both a pure IR-based method and a simple approach for combining textual and structural information.

Query Quality Prediction and Reformulation for Source Code Search: the Refoqus Tool

S. Haiduc, G. De Rosa, G. Bavota*, R. Oliveto, A. De Lucia, A. Marcus
Tool demo paper35th IEEE/ACM International Conference on Software Engineering, pages 1307-1310, San Francisco, California, USA, 2013. ACM Press. Acceptance Rate: 16/52 (31%).

Abstract

Developers search source code frequently during their daily tasks, to find pieces of code to reuse, to find where to implement changes, etc. Code search based on text retrieval (TR) techniques has been widely used in the software engineering community during the past decade. The accuracy of the TR-based search results depends largely on the quality of the query used. We introduce Refoqus, an Eclipse plugin which is able to automatically detect the quality of a text retrieval query and to propose reformulations for it, when needed, in order to improve the results of TR-based code search.

YODA: Young and newcOmer Developer Assistant

G. Canfora, M. Di Penta, S. Giannantonio*, R. Oliveto, S. Panichella
Tool demo paper35th IEEE/ACM International Conference on Software Engineering, pages 1331-1334, San Francisco, California, USA, 2013. ACM Press. Acceptance Rate: 16/52 (31%).

Abstract

Mentoring project newcomers is a crucial activity in software projects, and requires to identify people having good communication and teaching skills, other than high expertise on specific technical topics. In this demo we present Yoda (Young and newcOmer Developer Assistant), an Eclipse plugin that identifies and recommends mentors for newcomers joining a software project. Yoda mines developers' communication (e.g., mailing lists) and project versioning systems to identify mentors using an approach inspired to what ArnetMiner does when mining advisor/student relations. Then, it recommends appropriate mentors based on the specific expertise required by the newcomer. The demo shows Yoda in action, illustrating how the tool is able to identify and visualize mentoring relations in a project, and suggest appropriate mentors for a developer who is going to work on certain source code files, or on a given topic.

An Empirical Study on the Developers' Perception of Software Coupling

G. Bavota*, B. Dit, R. Oliveto, M. Di Penta, D. Poshynanyk, A. De Lucia
Conference paper35th IEEE/ACM International Conference on Software Engineering, pages 692-701, San Francisco, California, USA, 2013. ACM Press. Acceptance Rate: 85/461 (19%).

Abstract

Coupling is a fundamental property of software systems, and numerous coupling measures have been proposed to support various development and maintenance activities. However, little is known about how developers actually perceive coupling, what mechanisms constitute coupling, and if existing measures align with this perception. In this paper we bridge this gap, by empirically investigating how class coupling - as captured by structural, dynamic, semantic, and logical coupling measures - aligns with developers' perception of coupling. The study has been conducted on three Java open-source systems - namely ArgoUML, JHotDraw and jEdit - and involved 64 students, academics, and industrial practitioners from around the world, as well as 12 active developers of these three systems. We asked participants to assess the coupling between the given pairs of classes and provide their ratings and some rationale. The results indicate that the peculiarity of the semantic coupling measure allows it to better estimate the mental model of developers than the other coupling measures. This is because, in several cases, the interactions between classes are encapsulated in the source code vocabulary, and cannot be easily derived by only looking at structural relationships, such as method calls.

Automatic Query Reformulations for Text Retrieval in Software Engineering

S. Haiduc, G. Bavota*, A. Marcus, R. Oliveto, A. De Lucia, T. Menzies
Conference paper35th IEEE/ACM International Conference on Software Engineering, pages 842-851, San Francisco, California, USA, 2013. ACM Press. Acceptance Rate: 85/461 (19%).

Abstract

There are more than twenty distinct software engineering tasks addressed with text retrieval (TR) techniques, such as, traceability link recovery, feature location, refactoring, reuse, etc. A common issue with all TR applications is that the results of the retrieval depend largely on the quality of the query. When a query performs poorly, it has to be reformulated and this is a difficult task for someone who had trouble writing a good query in the first place. We propose a recommender (called Refoqus) based on machine learning, which is trained with a sample of queries and relevant results. Then, for a given query, it automatically recommends a reformulation strategy that should improve its performance, based on the properties of the query. We evaluated Refoqus empirically against four baseline approaches that are used in natural language document retrieval. The data used for the evaluation corresponds to changes from five open source systems in Java and C++ and it is used in the context of TR-based concept location in source code. Refoqus outperformed the baselines and its recommendations lead to query performance improvement or preservation in 84% of the cases (in average).

How to Effectively Use Topic Models for Software Engineering Tasks? An Approach based on Genetic Algorithms

A. Panichella*, B. Dit, R. Oliveto, M. Di Penta, D. Poshynanyk, A. De Lucia
Conference paper35th IEEE/ACM International Conference on Software Engineering, pages 522-531, San Francisco, California, USA, 2013. ACM Press. Acceptance Rate: 85/461 (19%).

Abstract

Information Retrieval (IR) methods, and in particular topic models, have recently been used to support essential software engineering (SE) tasks, by enabling software textual retrieval and analysis. In all these approaches, topic models have been used on software artifacts in a similar manner as they were used on natural language documents (e.g., using the same settings and parameters) because the underlying assumption was that source code and natural language documents are similar. However, applying topic models on software data using the same settings as for natural language text did not always produce the expected results. Recent research investigated this assumption and showed that source code is much more repetitive and predictable as compared to the natural language text. Our paper builds on this new fundamental finding and proposes a novel solution to adapt, configure and effectively use a topic modeling technique, namely Latent Dirichlet Allocation (LDA), to achieve better (acceptable) performance across various SE tasks. Our paper introduces a novel solution called LDA-GA, which uses Genetic Algorithms (GA) to determine a near-optimal configuration for LDA in the context of three different SE tasks: (1) traceability link recovery, (2) feature location, and (3) software artifact labeling. The results of our empirical studies demonstrate that LDA-GA is ableto identify robust LDA configurations, which lead to a higher accuracy on all the datasets for these SE tasks as compared to previously published results, heuristics, and the results of a combinatorial search.

2012

Who is going to Mentor Newcomers in Open Source Projects?

G. Canfora, M. Di Penta, R. Oliveto, S. Panichella
Conference paper20th ACM SIGSOFT International Symposium On Foundations of Software Engineering, pages 44-53, North Carolina, USA, 2012. ACM Press. Acceptance Rate: 35/201 (17%).

Abstract

When newcomers join a software project, they need to be properly trained to understand the technical and organizational aspects of the project. Inadequate training could likely lead to project delay or failure. In this paper we propose an approach, named Yoda (Young and newcOmer Developer Assistant) aimed at identifying and recommending mentors in software projects by mining data from mailing lists and versioning systems. Candidate mentors are identified among experienced developers who actively interact with newcomers. Then, when a newcomer joins the project, Yoda recommends her a mentor that, among the available ones, has already discussed topics relevant for the newcomer. Yoda has been evaluated on software repositories of five open source projects. We have also surveyed some developers of these projects to understand whether mentoring was actually performed in their projects, and asked them to evaluate the mentoring relations Yoda identified. Results indicate that top committers are not always the most appropriate mentors, and show the potential usefulness of Yoda as a recommendation system to aid project managers in supporting newcomers joining a software project.

Automatic Query Performance Assessment during the Retrieval of Software Artifacts

S. Haiduc, G. Bavota*, R. Oliveto, A. De Lucia, A. Marcus
Conference paper27th IEEE/ACM International Conference On Automated Software Engineering, pages 90-99, Essen, Germany, 2012. IEEE Press. Acceptance Rate: 21/138 (15%).

Abstract

Text-based search and retrieval is used by developers in the context of many SE tasks, such as, concept location, traceability link retrieval, reuse, impact analysis, etc. Solutions for software text search range from regular expression matching to complex techniques using text retrieval. In all cases, the results of a search depend on the query formulated by the developer. A developer needs to run a query and look at the results before realizing that it needs reformulating. Our aim is to automatically assess the performance of a query before it is executed. We introduce an automatic query performance assessment approach for software artifact retrieval, which uses 21 measures from the field of text retrieval. We evaluate the approach in the context of concept location in source code. The evaluation shows that our approach is able to predict the performance of queries with 79% accuracy, using very little training data.

When does a Refactoring Induce Bugs? An Empirical Study

G. Bavota*, B. De Carluccio, A. De Lucia, M. Di Penta, R. Oliveto, O. Strollo
Conference paper12th IEEE International Working Conference on Source Code Analysis and Manipulation, pages 104-113, Riva del Garda, Trento, Italy, 2012. IEEE Press. Best Paper Award. Acceptance Rate: 16/40 (40%).

Abstract

Refactorings are - as defined by Fowler - behavior preserving source code transformations. Their main purpose is to improve maintainability or comprehensibility, or also reduce the code footprint if needed. In principle, refactorings are defined as simple operations so that are "unlikely to go wrong" and introduce faults. In practice, refactoring activities could have their risks, as other changes. This paper reports an empirical study carried out on three Java software systems, namely Apache Ant, Xerces, and Ar-go UML, aimed at investigating to what extent refactoring activities induce faults. Specifically, we automatically detect (and then manually validate) 15,008 refactoring operations (of 52 different kinds) using an existing tool (Ref-Finder). Then, we use the SZZ algorithm to determine whether it is likely that refactorings induced a fault. Results indicate that, while some kinds of refactorings are unlikely to be harmful, others, such as refactorings involving hierarchies (e.g., pull up method), tend to induce faults very frequently. This suggests more accurate code inspection or testing activities when such specific refactorings are performed.

Putting the Developer in-the-loop: an Interactive GA for Software Re-Modularization

G. Bavota*, F. Carnevale*, A. De Lucia, M. Di Penta, R. Oliveto
Conference paper4th Symposium on Search Based Software Engineering, pages 75-89, Riva del Garda, Trento, Italy, 2012. LCNS Press. Acceptance Rate: 15/34 (44%).

Abstract

This paper proposes the use of Interactive Genetic Algorithms (IGAs) to integrate developer’s knowledge in a re-modularization task. Specifically, the proposed algorithm uses a fitness composed of automatically-evaluated factors—accounting for the modularization quality achieved by the solution—and a human-evaluated factor, penalizing cases where the way re-modularization places components into modules is considered meaningless by the developer. The proposed approach has been evaluated to re-modularize two software systems, SMOS and GESA. The obtained results indicate that IGA is able to produce solutions that, from a developer’s perspective, are more meaningful than those generated using the full-automated GA. While keeping feedback into account, the approach does not sacrifice the modularization quality, and may work requiring a very limited set of feedback only, thus allowing its application also for large systems without requiring a substantial human effort.

TraceME: Traceability Management in Eclipse

G. Bavota, L. Colangelo, A. De Lucia, S. Fusco, R. Oliveto, A. Panichella*
Tool demo paper28th IEEE International Conference on Software Maintenance, pages 642-645, Lago di Garda, Italy, 2012. IEEE Press. Acceptance Rate: 9/12 (75%).

Abstract

In this demo we present TraceME (Traceability Management in Eclipse), an Eclipse plug-in, that supports the software engineer in capturing and maintaining traceability links between different types of artifacts. A comparative analysis of the functionalities of the tools supporting traceability recovery highlights that TraceME is the more comprehensive tool for supporting such a critical activity during software development.

An Empirical Analysis of the Distribution of Unit Test Smells and Their Impact on Software Maintenance

G. Bavota*, A. Qusef*, R. Oliveto, A. De Lucia, D. Binkley
Conference paper28th IEEE International Conference on Software Maintenance, pages 56-65, Lago di Garda, Italy, 2012. IEEE Press. Acceptance Rate: 46/181 (25%).

Abstract

Unit testing represents a key activity in software development and maintenance. Test suites with high internal quality facilitate maintenance activities, such as code comprehension and regression testing. Several guidelines have been proposed to help developers write good test suites. Unfortunately, such rules are not always followed resulting in the presence of bad test code smells (or simply test smells). Test smells have been defined as poorly designed tests and their presence may negatively affect the maintainability of test suites and production code. Despite the many studies that address code smells in general, until now there has been no empirical evidence regarding test smells (i) distribution in software systems nor (ii) their impact on the maintainability of software systems. This paper fills this gap by presenting two empirical studies. The first study is an exploratory analysis of 18 software systems (two industrial and 16 open source) aimed at analyzing the distribution of test smells in source code. The second study, a controlled experiment involving twenty master students, is aimed at analyzing whether the presence of test smells affects the comprehension of source code during software maintenance. The results show that (i) test smells are widely spread throughout the software systems studied and (ii) most of the test smells have a strong negative impact on the comprehensibility of test suites and production code.

Using IR Methods for Labeling Source Code Artifacts: Is it Worthwhile?

A. De Lucia, M. Di Penta, R. Oliveto, A. Panichella*, S. Panichella
Conference paper20th IEEE International Conference on Program Comprehension, pages 193-202, Passau, Germany, 2012. IEEE Press. Acceptance Rate: 21/51 (41%).

Abstract

Information Retrieval (IR) techniques have been used for various software engineering tasks, including the labeling of software artifacts by extracting “keywords” from them. Such techniques include Vector Space Models, Latent Semantic Indexing, Latent Dirichlet Allocation, as well as customized heuristics extracting words from specific source code elements. This paper investigates how source code artifact labeling performed by IR techniques would overlap (and differ) from labeling performed by humans. This has been done by asking a group of subjects to label 20 classes from two Java software systems, JHotDraw and eXVantage. Results indicate that, in most cases, automatic labeling would be more similar to human-based labeling if using simpler techniques - e.g., using words from class and method names - that better reflect how humans behave. Instead, clustering-based approaches (LSI and LDA) are much more worthwhile to be used on source code artifacts having a high verbosity, as well as for artifacts requiring more effort to be manually labeled.

On the Role of Diversity Measures for Multi-Objective Test Case Selection

A. De Lucia, M. Di Penta, R. Oliveto, A. Panichella*
Conference paper7th International Workshop on Automation of Software Test, pages 145-151, Zurich, Switzerland, 2012. ACM Press. Acceptance Rate: 22/33 (67%).

Abstract

Test case selection has been recently formulated as multi-objective optimization problem trying to satisfy conflicting goals, such as code coverage and computational cost. This paper introduces the concept of asymmetric distance preserving, useful to improve the diversity of non-dominated solutions produced by multi-objective Pareto efficient genetic algorithms, and proposes two techniques to achieve this objective. Results of an empirical study conducted over four programs from the SIR benchmark show how the proposed technique (i) obtains non-dominated solutions having a higher diversity than the previously proposed multi-objective Pareto genetic algorithms; and (ii) improves the convergence speed of the genetic algorithms.

Estimating the Evolution Direction of Populations To Improve Genetic Algorithms

A. De Lucia, M. Di Penta, R. Oliveto, A. Panichella*
Conference paperGenetic and Evolutionary Computation Conference, pages 617-624, Philadelphia, USA, 2012. ACM Press.

Abstract

Meta-heuristics have been successfully used to solve a wide variety of problems. However, one issue many techniques have is their risk of being trapped into local optima, or to create a limited variety of solutions (problem known as "population drift"). During recent and past years, different kinds of techniques have been proposed to deal with population drift, for example hybridizing genetic algorithms with local search techniques or using niche techniques. This paper proposes a technique, based on Singular Value Decomposition (SVD), to enhance Genetic Algorithms (GAs) population diversity. SVD helps to estimate the evolution direction and drive next generations towards orthogonal dimensions. The proposed SVD-based GA has been evaluated on 11 benchmark problems and compared with a simple GA and a GA with a distance-crowding schema. Results indicate that SVD-based GA achieves significantly better solutions and exhibits a quicker convergence than the alternative techniques.

Supporting Extract Class Refactoring in Eclipse: The ARIES Project

G. Bavota*, A. De Lucia, A. Marcus, R. Oliveto, F. Palomba*
Tool demo paper34th International Conference on Software Engineering, pages 1419-1422, Zurich, Switzerland, 2012. IEEE Press. Acceptance Rate: 16/52 (30%).

Abstract

During software evolution changes are inevitable. These changes may lead to design erosion and the introduction of inadequate design solutions, such as design antipatterns. Several empirical studies provide evidence that the presence of antipatterns is generally associated with lower productivity, greater rework, and more significant design efforts for developers. In order to improve the quality and remove antipatterns, refactoring operations are needed. In this demo, we present the Extract class features of ARIES (Automated Refactoring In EclipSe), an Eclipse plug-in that supports the software engineer in removing the "Blob" antipattern.

Evaluating the Specificity of Text Retrieval Queries to Support Software Engineering Tasks

S. Haiduc, G. Bavota*, R. Oliveto, A. Marcus, A. De Lucia
Conference paper34th International Conference on Software Engineering - NIER Track, pages 1273-1276, Zurich, Switzerland, 2012. IEEE Press. Acceptance Rate: 26/147 (17%).

Abstract

Text retrieval approaches have been used to address many software engineering tasks. In most cases, their use involves issuing a textual query to retrieve a set of relevant software artifacts from the system. The performance of all these approaches depends on the quality of the given query (i.e., its ability to describe the information need in such a way that the relevant software artifacts are retrieved during the search). Currently, the only way to tell that a query failed to lead to the expected software artifacts is by investing time and effort in analyzing the search results. In addition, it is often very difficult to ascertain what part of the query leads to poor results. We propose a novel pre-retrieval metric, which reflects the quality of a query by measuring the specificity of its terms. We exemplify the use of the new specificity metric on the task of concept location in source code. A preliminary empirical study shows that our metric is a good effort predictor for text retrieval-based concept location, outperforming existing techniques from the field of natural language document retrieval.

Teaching Software Engineering and Software Project Management: An Integrated and Practical Approach

G. Bavota*, A. De Lucia, F. Fasano, R. Oliveto, C. Zottoli*
Conference paper34th International Conference on Software Engineering - Software Engineering Education Track, pages 1155-1164, Zurich, Switzerland, 2012. IEEE Press. Acceptance Rate: 11/49 (22%).

Abstract

We present a practical approach for teaching two different courses of Software Engineering (SE) and Software Project Management (SPM) in an integrated way. The two courses are taught in the same semester, thus allowing to build mixed project teams composed of five-eight Bachelor's students (with development roles) and one or two Master's students (with management roles). The main goal of our approach is to simulate a real-life development scenario giving to the students the possibility to deal with issues arising from typical project situations, such as working in a team, organising the division of work, and coping with time pressure and strict deadlines.

Information Retrieval Methods for Automated Traceability Recovery

A. De Lucia, A. Marcus, R. Oliveto, D. Poshyvanyk
Book chapterSoftware and Systems Traceability. J. Cleland-Huang, O. Gotel, and A. Zisman (eds.), 2012. Springer Press.

Abstract

The potential benefits of traceability are well known and documented, as well as the impracticability of recovering and maintaining traceability links manually. Indeed, the manual management of traceability information is an error prone and time consuming task. Consequently, despite the advantages that can be gained, explicit traceability is rarely established unless there is a regulatory reason for doing so. Extensive efforts have been brought forth to improve the explicit connection of software artifacts in the software engineering community (both research and commercial). Promising results have been achieved using Information Retrieval (IR) techniques for traceability recovery. IR-based traceability recovery methods propose a list of candidate traceability links based on the similarity between the text contained in the software artifacts. Software artifacts have different structures and the common element among many of them is the textual data, which most often captures the informal semantics of artifacts. For example, source code includes large volume of textual data in the form of comments and identifiers. In consequence, IR-based approaches are very well suited to address the traceability recovery problem. The conjecture is that artifacts with high textual similarity are good candidates to be traced to each other since they share several concepts. In this chapter we overview a general process of using IR-based methods for traceability link recovery and overview some of them in a greater detail: probabilistic, vector space, and Latent Semantic Indexing models. Finally, we discuss common approaches to measuring the performance of IR-based traceability recovery methods and the latest advances in techniques for the analysis of candidate links.

2011

Identifying Extract Class Refactoring Opportunities Using Structural and Semantic Cohesion Metrics

G. Bavota*, A. De Lucia, and R. Oliveto
Journal PaperJournal of Systems and Software, 84(3): 397-414, 2011. Elsevier Press.

Abstract

Approaches for improving class cohesion identify refactoring opportunities using metrics that capture structural relationships between the methods of a class, e.g., attribute references. Semantic metrics, e.g., C3 metric, have also been proposed to measure class cohesion, as they seem to complement structural metrics. However, until now semantic relationships between methods have not been used to identify refactoring opportunities. In this paper we propose an Extract Class refactoring method based on graph theory that exploits structural and semantic relationships between methods. The empirical evaluation of the proposed approach highlighted the benefits provided by the combination of semantic and structural measures and the potential usefulness of the proposed method as a feature for software development environments.

Improving Source Code Lexicon using Information Retrieval

A. De Lucia, M. Di Penta, and R. Oliveto
Journal PaperIEEE Transactions on Software Engineering, 37(2): 205-227, 2011. IEEE Press.

Abstract

The paper presents an approach helping developers to maintain source code identifiers and comments consistent with high-level artifacts. Specifically, the approach computes and shows the textual similarity between source code and related high-level artifacts. Our conjecture is that developers are induced to improve the source code lexicon, i.e., terms used in identifiers or comments, if the software development environment provides information about the textual similarity between the source code under development and the related high-level artifacts. The proposed approach also recommends candidate identifiers built from high-level artifacts related to the source code under development and has been implemented as an Eclipse plug-in, called COde Comprehension Nurturant Using Traceability (COCONUT). The paper also reports on two controlled experiments performed with master's and bachelor's students. The goal of the experiments is to evaluate the quality of identifiers and comments (in terms of their consistency with high-level artifacts) in the source code produced when using or not using COCONUT. The achieved results confirm our conjecture that providing the developers with similarity between code and high-level artifacts helps to improve the quality of source code lexicon. This indicates the potential usefulness of COCONUT as a feature for software development environments.

SCOTCH: Slicing and Coupling based Test to Code trace Hunter

A. Qusef*, G. Bavota*, R. Oliveto, A. De Lucia, D. Binkley
Tool demo paper18th Working Conference on Reverse Engineering, pages 443-444, Limerick, Ireland, 2011. IEEE Press. Acceptance Rate: 7/10 (70%).

Abstract

Maintaining traceability links between unit tests and tested classes is an important factor for effectively managing the development and evolution of software systems. Exploiting traceability links helps in program comprehension and maintenance by ensuring consistency between unit tests and tested classes during maintenance activities. Unfortunately, it is often the case that such links are not explicitly maintained and thus they have to be recovered manually during software evolution. A novel automated solution to this problem, based on dynamic slicing and conceptual coupling, is presented. The resulting tool, SCOTCH (Slicing and Coupling based Test to Code trace Hunter), is empirically evaluated on three systems: an open source system and two industrial systems. The results indicate that SCOTCH identifies traceability links between unit test classes and tested classes with a high accuracy and greater stability than existing techniques, highlighting its potential usefulness as a feature within a software development environment.

Identifying the Weaknesses of UML Class Diagrams during Data Model Comprehension

G. Bavota*, C. Gravino, R. Oliveto, A. De Lucia, G. Tortora, M. Genero, and J. A. Cruz-Lemus
Conference paper14th International Conference on Model Driven Engineering Languages and Systems, pages 168-182, Wellington, New Zealand, 2011. LNCS Press. Acceptance Rate: 34/167 (20%).

Abstract

In this paper we present an experiment and two replications aimed at comparing the support provided by ER and UML class diagrams during comprehension activities by focusing on the single building blocks of the two notations. This kind of analysis can be used to identify weakness in a notation and/or justify the need of preferring ER or UML for data modeling. The results reveal that UML class diagrams are generally more comprehensible than ER diagrams, even if the former has some weaknesses related to three building blocks, i.e., multi-value attribute, composite attribute, and weak entity. These findings suggest that a UML class diagram extension should be considered to overcome these weaknesses and improve the comprehensibility of the notation.

On Integrating Orthogonal Information Retrieval Methods to Improve Traceability Recovery

M. Gethers, R. Oliveto, D. Poshyvanyk, and A. De Lucia
Conference paper27th International Conference on Software Maintenance, pages 133-142, Williamsburg, USA, 2011. IEEE Press. Acceptance Rate: 36/127 (28%).

Abstract

Different Information Retrieval (IR) methods have been proposed to recover traceability links among software artifacts. Until now there is no single method that sensibly outperforms the others, however, it has been empirically shown that some methods recover different, yet complementary traceability links. In this paper, we exploit this empirical finding and propose an integrated approach to combine orthogonal IR techniques, which have been statistically shown to produce dissimilar results. Our approach combines the following IR-based methods: Vector Space Model (VSM), probabilistic Jensen and Shannon (JS) model, and Relational Topic Modeling (RTM), which has not been used in the context of traceability link recovery before. The empirical case study conducted on six software systems indicates that the integrated method outperforms stand-alone IR methods as well as any other combination of non-orthogonal methods with a statistically significant margin.

SCOTCH: Test-to-Code Traceability using Slicing and Conceptual Coupling

A. Qusef*, G. Bavota, R. Oliveto, A. De Lucia, D. Binkley
Conference paper27th International Conference on Software Maintenance, pages 63-72, Williamsburg, USA, 2011. IEEE Press. Acceptance Rate: 36/127 (28%).

Abstract

Maintaining traceability links between unit tests and tested classes is an important factor for effectively managing the development and evolution of software systems. Exploiting traceability links helps in program comprehension and maintenance by ensuring consistency between unit tests and tested classes during maintenance activities. Unfortunately, it is often the case that such links are not explicitly maintained and thus they have to be recovered manually during software evolution. A novel automated solution to this problem, based on dynamic slicing and conceptual coupling, is presented. The resulting tool, SCOTCH (Slicing and Coupling based Test to Code trace Hunter), is empirically evaluated on three systems: an open source system and two industrial systems. The results indicate that SCOTCH identifies traceability links between unit test classes and tested classes with a high accuracy and greater stability than existing techniques, highlighting its potential usefulness as a feature within a software development environment.

Improving IR-based Traceability Recovery Using Smoothing Filters

A. De Lucia, M. Di Penta, R. Oliveto, A. Panichella*, S. Panichella*
Conference paper19th International Conference on Program Comprehension, pages 21-30, Kingston, Canda, 2011. IEEE Press. Best paper award. Acceptance Rate: 18/76 (24%).

Abstract

Information Retrieval methods have been largely adopted to identify traceability links based on the textual similarity of software artifacts. However, noise due to word usage in software artifacts might negatively affect the recovery accuracy. We propose the use of smoothing filters to reduce the effect of noise in software artifacts and improve the performances of traceability recovery methods. An empirical evaluation performed on two repositories indicates that the usage of a smoothing filter is able to significantly improve the performances of Vector Space Model and Latent Semantic Indexing. Such a result suggests that other than being used for traceability recovery the proposed filter can be used to improve performances of various other software engineering approaches based on textual analysis.

An Exploratory Study of Identifier Renamings

L. M. Eshkevari, V. Arnaoudova, M. Di Penta, R. Oliveto, Y.-G. Guéhéneuc, G. Antoniol
Conference paper8th Working Conference on Mining Software Repositories, pages 33-42, Hawaii, USA, 2011. ACM Press. Acceptance Rate: 20/61 (33%).

Abstract

Identifiers play an important role in source code understandability, maintainability, and fault-proneness. This paper reports a study of identifier renamings in software systems, studying how terms (identifier atomic components) change in source code identifiers. Specifically, the paper (i) proposes a term renaming taxonomy, (ii) presents an approximate lightweight code analysis approach to detect and classify term renamings automatically into the taxonomy dimensions, and (iii) reports an exploratory study of term renamings in two open-source systems, Eclipse-JDT and Tomcat. We thus report evidence that not only synonyms are involved in renamings but also (in a small fraction) more unexpected changes occur: surprisingly, we detected hypernym (a more abstract term, e.g., size vs. length) and hyponym (a more concrete term, e.g., restriction vs. rule) renamings, and antonym renamings (a term replaced with one having the opposite meaning, e.g., closing vs. opening). Despite being only a fraction of all renamings, synonym, hyponym, hypernym, and antonym renamings may hint at some program understanding issues and, thus, could be used in a renamingrecommendation system to improve code quality.

Identifying Method Friendships to Remove the Feature Envy Bad Smell

R. Oliveto, M. Gethers, G. Bavota*, D. Poshyvanyk, and A. De Lucia
Conference paper33rd IEEE/ACM International Conference on Software Engineering - NIER Track, pages 820-823, Hawaii, USA, 2011. ACM Press. Acceptance Rate: 46/198 (23%).

Abstract

We propose a novel approach to identify Move Method refactoring opportunities and remove the Feature Envy bad smell from source code. The proposed approach analyzes both structural and conceptual relationships between methods and uses Relational Topic Models to identify sets of methods that share several responsabilities, i.e., 'friend methods'. The analysis of method friendships of a given method can be used to pinpoint the target class (envied class) where the method should be moved in. The results of a preliminary empirical evaluation indicate that the proposed approach provides meaningful refactoring opportunities.

CodeTopics: Which Topic Am I Coding Now?

M. Gethers, T. Savage, M. Di Penta, R. Oliveto, D. Poshyvanyk, and A. De Lucia
Conference paper33rd IEEE/ACM International Conference on Software Engineering - Formal Tool Demo, pages 1034-1036, Hawaii, USA, 2011. ACM Press. Acceptance Rate: 22/60 (37%).

Abstract

Recent studies indicated that showing the similarity between the source code being developed and related high-level artifacts (HLAs), such as requirements, helps developers improve the quality of source code identifiers. In this paper, we present CodeTopics, an Eclipse plug-in that in addition to showing the similarity between source code and HLAs also highlights to what extent the code under development covers topics described in HLAs. Such views complement information derived by showing only the similarity between source code and HLAs helping (i) developers to identify functionality that are not implemented yet or (ii) newcomers to comprehend source code artifacts by showing them the topics that these artifacts relate to.

2010

Fine-grained Management of Software Artefacts: The ADAMS System

A. De Lucia, F. Fasano, R. Oliveto, and G. Tortora
Journal PaperSoftware: Practice and Experience, 40(11):1007-1034, 2010. Wiley InterScience Press.

Abstract

We present ADvanced Artefact Management System (ADAMS), a web-based system that integrates project management features, such as work-breakdown structure definition, resource allocation, and schedule management as well as artefact management features, such as artefact versioning, traceability management, and artefact quality management. In this article we focus on the fine-grained artefact management approach adopted in ADAMS, which is a valuable support to high-level documentation and traceability management. In particular, the traceability layer in ADAMS is used to propagate events concerning changes to an artefact to the dependent artefacts, thus also increasing the context-awareness in the project. We also present the results of experimenting with the system in software projects developed at the University of Salerno.

An Experimental Comparison of ER and UML Class Diagrams for Data Modelling

A. De Lucia, C. Gravino, R. Oliveto, and G. Tortora
Journal PaperEmpirical Software Engineering, 15(5):455-489, 2010. Springer Press.

Abstract

We present the results of three sets of controlled experiments aimed at analysing whether UML class diagrams are more comprehensible than ER diagrams during data models maintenance. In particular, we considered the support given by the two notations in the comprehension and interpretation of data models, comprehension of the change to perform to meet a change request, and detection of defects contained in a data model. The experiments involved university students with different levels of ability and experience. The results demonstrate that using UML class diagrams subjects achieved better comprehension levels. With regard to the support given by the two notations during maintenance activities the results demonstrate that the two notations give the same support, while in general UML class diagrams provide a better support with respect to ER diagrams during verification activities.

Software Re-Modularization based on Structural and Semantic Metrics

G. Bavota*, A. De Lucia, A. Marcus, and R. Oliveto
Conference paper17th IEEE Working Conference on Reverse Engineering, pages 195-204, Beverly, Massachusetts, USA, 2010. IEEE Press. Acceptance Rate: 21/67 (31%).

Abstract

The structure of a software system has a major impact on its maintainability. To improve maintainability, software systems are usually organized into subsystems using the constructs of packages or modules. However, during software evolution the structure of the system undergoes continuous modifications, drifting away from its original design, often reducing its quality. In this paper we propose an approach for helping maintainers to improve the quality of software modularization. The proposed approach analyzes the (structural and semantic) relationships between classes in a package identifying chains of strongly related classes. The identified chains are used to define new packages with higher cohesion than the original package. The proposed approach has been empirical evaluated through a case study. The context of the study is represented by an open source system, JHotDraw, and two software systems developed by teams of students at the University of Salerno. The analysis of the results reveals that the proposed approach generates meaningful re-modularization of the studied systems, which can lead to higher quality.

A Two-Step Technique for Extract Class Refactoring

G. Bavota*, A. De Lucia, A. Marcus, and R. Oliveto
Conference paper25th IEEE/ACM International Conference on Automated Software Engineering, pages 151-154, Antwerp, Belgium, 2010. ACM Press. Acceptance Rate: 31+34/191 (16+18%).

Abstract

We propose a novel approach supporting the Extract Class refactoring. The proposed approach analyzes the (structural and semantic) similarity of the methods in a class in order to identify chains of strongly related methods. The identified method chains are used to define new classes with higher cohesion than the original class. A preliminary evaluation reveals that the approach is able to identify meaningful refactoring operations.

Recovering Traceability Links between Unit Tests and Classes Under Test: An Improved Approach

A. Qusef*, R. Oliveto, and A. De Lucia
Conference paper26th IEEE International Conference on Software Maintenance, pages 129-138, Timisoara, Romania, 2010. IEEE Press. Acceptance Rate: 36/133 (27%).

Abstract

Unit tests are valuable as a source of up-to-date documentation as developers continuously changes them to reflect changes in the production code to keep an effective regression suite. Maintaining traceability links between unit tests and classes under test can help developers to comprehend parts of a system. In particular, unit tests show how parts of a system are executed and as such how they are supposed to be used. Moreover, the dependencies between unit tests and classes can be exploited to maintain the consistency during refactoring. Generally, such dependences are not explicitly maintained and they have to be recovered during software development. Some guidelines and naming conventions have been defined to describe the testing environment in order to easily identify related tests for a programming task. However, very often these guidelines are not followed making the identification of links between unit tests and classes a time-consuming task. Thus, automatic approaches to recover such links are needed. In this paper a traceability recovery approach based on Data Flow Analysis (DFA) is presented. In particular, the approach retrieves as tested classes all the classes that affect the result of the last assert statement in each method of the unit test class. The accuracy of the proposed method has been empirically evaluated on two systems, an open source system and an industrial system. As a benchmark, we compare the accuracy of the DFA-based approach with the accuracy of the previously used traceability recovery approaches, namely Naming Convention (NC) and Last Call Before Assert (LCBA) that seem to provide the most accurate results. The results show that the proposed approach is the most accurate method demonstrating the effectiveness of DFA. However, the case study also highlights the limitations of the experimented traceability recovery approaches, showing that detecting the class under test cannot be fully automated and some issues are still under study.

Physical and Conceptual Identifier Dispersion: Measures and Relation to Fault Proneness

V. Arnaoudova, L. Eshkevari, R. Oliveto, Y.-G. Guéhéneuc, G. Antoniol
Conference paper26th IEEE International Conference on Software Maintenance - ERA Track, 4 pages, Timisoara, Romania, 2010. IEEE Press. Best paper award. Acceptance Rate: 18/43 (41%).

Abstract

Poorly-chosen identifiers have been reported in the literature as misleading and increasing the program comprehension effort. Identifiers are composed of terms, which can be dictionary words, acronyms, contractions, or simple strings. We conjecture that the use of identical terms in different contexts may increase the risk of faults. We investigate our conjecture using a measure combining term entropy and term context coverage to study whether certain terms increase the odds ratios of methods to be fault-prone. Entropy measures the physical dispersion of terms in a program: the higher the entropy, the more scattered across the program the terms. Context coverage measures the conceptual dispersion of terms: the higher their context coverage, the more unrelated the methods using them. We compute term entropy and context coverage of terms extracted from identifiers in Rhino 1.4R3 and ArgoUML 0.16. We show statistically that methods containing terms with high entropy and context coverage are more fault-prone than others.

Playing with Refactoring: Identifying Extract Class Opportunities through Game Theory

G. Bavota*, R. Oliveto, A. De Lucia, G. Antoniol, Y-G. Guéhéneuc
Conference paper26th IEEE International Conference on Software Maintenance - ERA Track, 4 pages, Timisoara, Romania, 2010. IEEE Press. Acceptance Rate: 18/43 (41%).

Abstract

In software engineering, developers must often find solutions to problems balancing competing goals, e.g., quality versus cost, time to market versus resources, or cohesion versus coupling. Finding a suitable balance between contrasting goals is often complex and recommendation systems are useful to support developers and managers in performing such a complex task. We believe that contrasting goals can be often dealt with game theory techniques. Indeed, game theory is successfully used in other fields, especially in economics, to mathematically propose solutions to strategic situation, in which an individual's success in making choices depends on the choices of others. To demonstrate the applicability of game theory to software engineering and to understand its pros and cons, we propose an approach based on game theory that recommend extract-class refactoring opportunities. A preliminary evaluation inspired by mutation testing demonstrates the applicability and the benefits of the proposed approach.

On the Equivalence of Information Retrieval Methods for Automated Traceability Link Recovery

R. Oliveto, M. Gethers, D. Poshyvanyk, A. De Lucia
Conference paper18th International Conference on Program Comprehension, pages 68-71, Braga, Portugal, 2010. IEEE Press. Acceptance Rate: 15+10/76 (20+13%).

Abstract

We present an empirical study to statistically analyze the equivalence of several traceability recovery methods based on Information Retrieval (IR) techniques. The analysis is based on Principal Component Analysis and on the analysis of the overlap of the set of candidate links provided by each method. The studied techniques are the Jensen-Shannon (JS) method, Vector Space Model (VSM), Latent Semantic Indexing (LSI), and Latent Dirichlet Allocation (LDA). The results show that while JS, VSM, and LSI are almost equivalent, LDA is able to capture a dimension unique to the set of techniques which we considered.

Investigating Tabu Search for Web Effort Estimation

F. Ferrucci, C. Gravino, E. Mendes, R. Oliveto, and F. Sarro
Conference paper36th EUROMICRO Conference on Software Engineering and Advanced Applications, pages 350-357, Lille, France, 2010. IEEE Press.

Abstract

Tabu Search is a meta-heuristic approach successfully used to address optimization problems in several contexts. This paper reports the results of an empirical study carried out to investigate the effectiveness of Tabu Search in estimating Web application development effort. The dataset employed in this investigation is part of the Tukutuku database. This database has been used in several studies to assess the effectiveness of various effort estimation techniques, such as Linear Regression and Case-Based Reasoning. Our results are encouraging given that Tabu Search outperformed all the other estimation techniques against which it has been compared.

Genetic Programming for Effort Estimation: an Analysis of the Impact of Different Fitness Functions

F. Ferrucci, C. Gravino, R. Oliveto, and F. Sarro
Conference paper2nd International Symposium on Search Based Software Engineering, pages 89-98, Benevento, Italy, 2010. IEEE Press.

Abstract

Context: The use of search-based methods has been recently proposed for software development effort estimation and some case studies have been carried out to assess the effectiveness of Genetic Programming (GP). The results reported in the literature showed that GP can provide an estimation accuracy comparable or slightly better than some widely used techniques and encouraged further research to investigate whether varying the fitness function the estimation accuracy can be improved. Aim: Starting from these considerations, in this paper we report on a case study aiming to analyse the role played by some fitness functions for the accuracy of the estimates. Method: We performed a case study based on a publicly available dataset, i.e., Desharnais, by applying a 3-fold cross validation and employing summary measures and statistical tests for the analysis of the results. Moreover, we compared the accuracy of the obtained estimates with those achieved using some widely used estimation methods, namely Case-Based Reasoning (CBR) and Manual Step Wise Regression (MSWR). Results: The obtained results highlight that the fitness function choice significantly affected the estimation accuracy. The results also revealed that GP provided significantly better estimates than CBR and comparable with those of MSWR for the considered dataset.

Numerical Signatures of Antipatterns: An Approach based on B-Splines

R. Oliveto, F. Khomh, G. Antoniol, and Y-G. Guéhéneuc
Conference paper14th European Conference on Software Maintenance and Reengineering, pages 257-260, Madrid, Spain, 2010. IEEE Press. Acceptance Rate: 21+11/80 (26+14%).

Abstract

Antipatterns are poor object-oriented solutions to recurring design problems. The identification of occurrences of antipatterns in systems has received recently some attention but current approaches have two main limitations: either (1) they classify classes strictly as being or not antipatterns, and thus cannot report accurate information for borderline classes, or (2) they return the probabilities of classes to be antipatterns but they require an expensive tuning by experts to have acceptable accuracy. To mitigate such limitations, we introduce a new identification approach, ABS (Antipattern identification using B-Splines), based on a numerical analysis technique. The results of a preliminary study show that ABS generally outperforms previous approaches in terms of accuracy when used to identify Blobs.

Using Evolutionary Based Approaches to Estimate Software Development Effort

F. Ferrucci, C. Gravino, R. Oliveto, and F. Sarro
Book chapterEvolutionary Computation and Optimization Algorithms in Software Engineering: Applications and Techniques. M. Chis (ed.). IGI Global, 2010.

Abstract

Software development effort estimation is a critical activity for the competitiveness of a software company; it is crucial for planning and monitoring project development and for delivering the product on time and within budget. In the last years, some attempts have been made to apply search-based approaches to estimate software development effort. In particular, some genetic algorithms have been defined and some empirical studies have been performed with the aim of assessing the effectiveness of the proposed approaches for estimating software development effort. The results reported in those studies seem to be promising. The objective of this chapter is to present a state of the art in the field by reporting on the most significant empirical studies undertaken so far. Furthermore, some suggestions for future research directions are also provided.

2009

Assessing IR-based Traceability Recovery Tools through Controlled Experiments

A. De Lucia, R. Oliveto, and G. Tortora
Journal PaperEmpirical Software Engineering, 14(1):57-93, 2009. Springer Press.

Abstract

We report the results of a controlled experiment and a replication performed with different subjects, in which we assessed the usefulness of an Information Retrieval-based traceability recovery tool during the traceability link identification process. The main result achieved in the two experiments is that the use of a traceability recovery tool significantly reduces the time spent by the software engineer with respect to manual tracing. Replication with different subjects allowed us to investigate if subjects’ experience and ability play any role in the traceability link identification process. In particular, we made some observations concerning the retrieval accuracy achieved by the software engineers with and without the tool support and with different levels of experience and ability.

Traceability Recovery using Numerical Analysis

G. Capobianco, A. De Lucia, R. Oliveto, A. Panichella*, and S. Panichella*
Conference paper16th International Working Conference on Reverse Engineering, pages 195-204, Lille, France, 2009. IEEE Press.

Abstract

The paper proposes a novel information retrieval technique based on numerical analysis for recovering traceability links between code and software documentation. The results of a reported case study demonstrate that the proposed approach significantly outperforms two vector-based IR models, i.e., the vector space model and latent semantic indexing, and it is comparable and sometimes better than a probabilistic model, i.e., the Jensen-Shannon method. The paper also discusses the influence of each method with the specific artifact type considered and the artifact language.

Using Tabu Search to Estimate Software Development Effort

F. Ferrucci, C. Gravino, R. Oliveto, and F. Sarro
Conference paper4th International Conference on Software Process and Product Measurement, pages 307-320, Amsterdam, The Netherlands, 2009. LCNS Press.

Abstract

The use of optimization techniques has been recently proposed to build models for software development effort estimation. In particular, some studies have been carried out using search-based techniques, such as genetic programming, and the results reported seem to be promising. At the best of our knowledge nobody has analyzed the effectiveness of Tabu search for development effort estimation. Tabu search is a meta-heuristic approach successful used to address several optimization problems. In this paper we report on an empirical analysis carried out exploiting Tabu Search on a publicity available dataset, i.e., Desharnais dataset. The achieved results show that Tabu Search provides estimates comparable with those achieved with some widely used estimation techniques.

The Role of the Coverage Analysis during IR-based Traceability Recovery: a Controlled Experiment

A. De Lucia, R. Oliveto, G. Tortora
Conference paper25th International Conference on Software Maintenance, pages 371-380, Edmonton, Canada, 2009. IEEE Press. Acceptance Rate: 35/162 (22%).

Abstract

This paper presents a two-steps process aiming at improving the tracing performances of the software engineer when using an IR-based traceability recovery tool. In the first step the software engineer performs an incremental coarse-grained traceability recovery between a set of source artefacts and a set of target artefacts. During this step he/she traces as many links as possible keeping low the effort to discard false positives. In the second step he/she uses a coverage link analysis aiming at identifying source artefacts poorly traced and guiding focused fine-grained traceability recovery sessions to recover links missed in the first step. The results achieved in a reported controlled experiment demonstrate that the proposed approach significantly increases the amount of correct links traced by the software engineer with respect to a tradition process.

On the Role of the Nouns in IR-based Traceability Link Recovery

G. Capobianco, A. De Lucia, R. Oliveto, A. Panichella*, and S. Panichella*
Conference paper17th International Conference on Program Comprehension, pages 140-157, Vancouver, British Columbia, Canada, 2009. IEEE Press. Acceptance Rate: 20/74 (27%).

Abstract

The intensive human effort needed to manually manage traceability information has increased the interest in utilising semi-automated traceability recovery techniques. This paper presents a simple way to improve the accuracy of traceability recovery methods based on Information Retrieval techniques. The proposed method acts on the artefact indexing considering only the nouns contained in the artefact content to define the semantics of an artefact. The rationale behind such a choice is that the language used in software documents can be classified as a sectorial language, where the terms that provide more indication on the semantics of a document are the nouns. The results of a reported case study demonstrate that the proposed artefact indexing significantly improves the accuracy of traceability recovery methods based on the probabilistic or vector space based IR models.

2008

Traceability Management for Impact Analysis

A. De Lucia, F. Fasano, R. Oliveto
Conference paper24th International Conference on Software Maintenance - Frontiers of Software Maintenance, pages 21-30, Beijing, China, 2008. IEEE Press.

Abstract

Software change impact analysis is the activity of the software maintenance process that determines possible effects of proposed software changes. This activity is necessary to be aware of ripple-effects caused by the change and record them so that nothing is overlooked. A change has not only impact on the source code, but also on the other related software artefacts, such as requirements, design, and test. For this reason, impact analysis can be efficiently supported through traceability information. In this paper we review traceability management in the context of impact analysis and discuss the main challenges and research directions.

Using Structural and Semantic Metrics to Improve Class Cohesion

A. De Lucia, R. Oliveto, and L. Vorraro*
Conference paper24th IEEE International Conference on Software Maintenance, pages 27-36, Beijing, China, 2008. IEEE Press. Acceptance rate: 40/156 (26%).

Abstract

Several refactoring methods have been proposed in the literature to improve the cohesion of classes. Very often, refactoring operations are guided by cohesion metrics based on the structural information of the source code, such as attribute references in methods. In this paper we present a novel approach to guide the extract class refactoring (M. Fowler, 1999), taking into account structural and semantic cohesion metrics. The proposed approach has been evaluated in a case study conducted on JHotDraw, an open source software system. The achieved results revealed that the performance achieved with the proposed approach significantly outperforms the results achieved with methods considering only structural or semantic information. The proposed approach has also been integrated in the Eclipse platform.

IR-based Traceability Recovery Processes: an Empirical Comparison of "One-Shot" and Incremental Processes

A. De Lucia, R. Oliveto, and G. Tortora
Conference paper23rd IEEE/ACM International Conference on Automated Software Engineering, pages 39-48, L'Aquila, Italy, 2008. ACM Press. Acceptance rate: 34/280 (12%).

Abstract

We present the results of a controlled experiment aiming at analysing the role played by the approach adopted during an IR-based traceability recovery process. In particular, we compare the tracing performances achieved by subjects using the "one-shot" process, where the full ranked list of candidate links is proposed, and the incremental process, where a similarity threshold is used to cut the ranked list and the links are classified step-by-step. The analysis of the achieved results shows that, in general, the incremental process improves the tracing accuracy and reduces the effort to analyse the proposed links.

Data Model Comprehension: an Empirical Comparison of ER and UML Class Diagram

A. De Lucia, C. Gravino, R. Oliveto, and G. Tortora
Conference paper16th IEEE International Conference on Program Comprehension, pages 93-102, Amsterdam, The Netherlands, 2008. IEEE Press. Acceptance rate: 20/57 (35%).

Abstract

We present the results of two controlled experiments to compare ER and UML class diagrams, in order to find out which of the models provides better support during the comprehension of data models. The experiment involved Master and Bachelor students performing comprehension tasks on data models represented by ER or UML class diagrams. The achieved results show that UML class diagrams significantly improve the comprehension level achieved by subjects. Moreover, having different subjects with different levels of ability and experience allowed us to also make some considerations on the influence of such factors on the comprehension performances.

ADAMS Re-Trace: Traceability Link Recovery via Latent Semantic Indexing

A. De Lucia, R. Oliveto, and G. Tortora
Conference paper30th International Conference on Software Engineering, pages 839-842, Leipzig, Germany, 2008. ACM Press. Acceptance rate: 18/88 (11%).

Abstract

In this demonstration we present the traceability recovery tool developed in ADAMS, a fine-grained artefact management system. The tool is based on an information retrieval technique, namely latent semantic indexing, and aims at supporting the software engineer in the identification of traceability links between artefacts of different types. The tool has also been integrated in the Eclipse-based client of ADAMS.

Enhancing IBM Requisite Pro with IR-based Traceability Recovery Features

A. De Lucia, R. Landi*, R. Oliveto, G. Tortora
Conference paper3rd Italian Workshop on Eclipse Technologies, pages 77-86, Bari, Italy, 2008. CEUR Workshop Proceedings Press.

Abstract

The potential benefits of traceability are well known, as well as the impracticability of maintaining traceability links manually. Recently, Information Retrieval (IR) techniques have been proposed in or- der to support the software engineer during the traceability link identification process. Clearly, a research method/tool has more change to be transferred to practitioner if its usefulness is investigated through empir- ical user studies and it can be integrated within a commercial and widely used CASE tool. In this paper we try to achieve this result showing how IBM Requisite Pro can be enriched with IR-based traceability recovery features.

Assessing the Support of ER and UML Class Diagrams during Maintenance Activities on Data Models

A. De Lucia, C. Gravino, R. Oliveto, and G. Tortora
Conference paper12th European Conference on Software Maintenance and Reengineering, pages 173-182, Athens, Greece, 2008. IEEE Press. Acceptance rate: 24/86 (28%).

Abstract

We present the results of two controlled experiments carried out to compare the support given by the ER and UML class diagrams during the maintenance of data models. The experiments involved master and bachelor students performing maintenance tasks on data models represented by ER and UML class diagrams. The results reveal that the two notations give in general the same support. In particular, the correctness level achieved by a subject performing the task on data model represented by an ER diagram are comparable with the correctness level achieved by the same subject performing the task on a different data model represented by an UML class diagram. Moreover, by discriminating the levels of ability (high vs. low) and experience (graduate vs. undergraduate) of subjects we also provide some consideration about the influence of such factors on the correctness level achieved by subjects. In particular, we observe that UML class diagrams better support subjects with high ability than ER diagrams, while no difference can be observed considering subjects with low ability. Regarding the experience factor the results reveal no difference in the correctness level achieved by graduate and undergraduate students.

Traceability Management meets Information Retrieval Methods: Strengths and Limitations

R. Oliveto
Conference paper12th European Conference on Software Maintenance and Reengineering, pages 302-305, Athens, Greece, 2008. IEEE Press.

Abstract

This research abstract analyses the strengths and limitations of the application of information retrieval (IR) methods for traceability link recovery between software artefacts. This work also shows how the ideas behind an IR-based traceability recovery process combined with traceability information can be used to improve and monitor software artefact quality during software development.

2007

Recovering Traceability Links in Software Artefact Management Systems using Information Retrieval Methods

A. De Lucia, F. Fasano, R. Oliveto, and G. Tortora
Journal PaperACM Transactions on Software Engineering and Methodology, 16(4): 13 (article number), 2007. ACM Press.

Abstract

The main drawback of existing software artifact management systems is the lack of automatic or semi-automatic traceability link generation and maintenance. We have improved an artifact management system with a traceability recovery tool based on Latent Semantic Indexing (LSI), an information retrieval technique. We have assessed LSI to identify strengths and limitations of using information retrieval techniques for traceability recovery and devised the need for an incremental approach. The method and the tool have been evaluated during the development of seventeen software projects involving about 150 students. We observed that although tools based on information retrieval provide a useful support for the identification of traceability links during software development, they are still far to support a complete semi-automatic recovery of all links. The results of our experience have also shown that such tools can help to identify quality problems in the textual description of traced artifacts.

eWorkbook: a Computer Aided Assessment System

G. Costagliola, F. Ferrucci, V. Fuccella, and R. Oliveto
Journal PaperInternational Journal of Distance Educational Technologies, 5(3):24-41, 2007. Idea Group Press.

Abstract

Computer aided assessment (CAA) tools are more and more widely adopted in academic environments mixed to other assessment means. In this article, we present a CAA Web application, named eWorkbook, which can be used for evaluating learner’s knowledge by creating (the tutor) and taking (the learner) on-line tests based on multiple choice, multiple response and true/false question types. Its use is suitable within the academic environment in a blended learning approach, by providing tutors with an additional assessment tool, and learners with a distance self-assessment means. In the article, the main characteristics of the tool are presented together with a rationale behind them and an outline of the architectural design of the system.

Improving Context Awareness in Subversion through Fine-Grained Versioning of Java Code

A. De Lucia, F. Fasano, R. Oliveto, and D. Santonicola*
Tool demo paperInternational Workshop on Principles of Software Evolution, pages 110-114, Dubrovnik, Croatia, 2007. ACM Press.

Abstract

In this paper, we present an extension of the Subversion command line to support fine-grained versioning of Java code. To this aim, for each Java file under versioning, an XML-based file representing the logical structure of the original file is automatically built by parsing the code. An additional XML-based file is also built to model collaboration constraints. This information is useful to enrich the context awareness by providing developers information about changes made by others to the same logical unit (i.e., class, method, or attribute) of the Java file. Finally, we present an extension of Subclipse, a Subversion front-end implemented as an Eclipse plug-in, aiming to support the fine-grained versioning in Subversion.

Software Artefact Traceability: the Never-ending Challenge

R. Oliveto, G. Antoniol, A. Marcus, and J. Hayes
Conference paper23rd International Conference on Software Maintenance, pages 485-488, Paris, France, 2007. IEEE Press.

Abstract

In this paper, we present an extension of the Subversion command line to support fine-grained versioning of Java code. To this aim, for each Java file under versioning, an XML-based file representing the logical structure of the original file is automatically built by parsing the code. An additional XML-based file is also built to model collaboration constraints. This information is useful to enrich the context awareness by providing developers information about changes made by others to the same logical unit (i.e., class, method, or attribute) of the Java file. Finally, we present an extension of Subclipse, a Subversion front-end implemented as an Eclipse plug-in, aiming to support the fine-grained versioning in Subversion.

Recovering Traceability Links using Information Retrieval Tools: a Controlled Experiment

A. De Lucia, R. Oliveto, and G. Tortora
Conference paperInternational Symposium on Grand Challenges in Traceability, pages 46-55, Lexington, Kentucky, 2007. ACM Press.

Abstract

The main drawback of existing software artifact management systems is the lack of automatic or semi-automatic traceability link generation and maintenance. We have improved an artifact management system with a traceability recovery tool based on Latent Semantic Indexing (LSI), an information retrieval technique. We have assessed LSI to identify strengths and limitations of using information retrieval techniques for traceability recovery and devised the need for an incremental approach. The method and the tool have been evaluated during the development of seventeen software projects involving about 150 students. We observed that although tools based on information retrieval provide a useful support for the identification of traceability links during software development, they are still far to support a complete semi-automatic recovery of all links. The results of our experience have also shown that such tools can help to identify quality problems in the textual description of traced artifacts.

2006

IncrementaI Approach and User Feedbacks: a Silver Bullet for Traceability Recovery?

A. De Lucia, R. Oliveto, and P. Sgueglia*
Conference paper22nd IEEE International Conference on Software Maintenance, pages 299-309, Sheraton Society Hill, Philadelphia, Pennsylvania, USA, 2006. IEEE Press.

Abstract

Several authors apply information retrieval (IR) techniques to recover traceability links between software artefacts. The use of user feedbacks (in terms of classification of retrieval links as correct or false positives) has been proposed to improve the retrieval performances of these techniques. In this paper we present a critical analysis of using feedbacks within an incremental traceability recovery process. In particular, we analyse the trade-off between the improvement of the performances and the link classification effort required to train the IR-based traceability recovery tool. We also present the results achieved in case studies and show that even though the retrieval performances generally improve with the use of feedbacks, IR-based approaches are still far from solving the problem of recovering all correct links with a low classification effort.

COCONUT: COde COmprehension Nurturant Using Traceability

A. De Lucia, M. Di Penta, R. Oliveto, and F. Zurolo*
Tool demo paper22nd IEEE International Conference on Software Maintenance, pages 274-275, Sheraton Society Hill, Philadelphia, Pennsylvania, USA, 2006. IEEE Press.

Abstract

In this paper we present an Eclipse plug-in, called COCONUT (COde COmprehension Nurturant Using Traceability), that shows the similarity level between the source code under development and high-level artefacts the source code should be traced onto. Also, the plug-in suggests candidate source code identifiers according to the domain terms contained into the corresponding high-level artefacts. Experiments showed that the plug-in helps to produce source code easier to be understood

Can Information Retrieval Techniques Effectively Support Traceability Link Recoovery?

A. De Lucia, F. Fasano, R. Oliveto, and G. Tortora
Tool demo paper14th IEEE International Conference on Program Comprehension, pages 307-316, Athens, Greece, 2006. IEEE Press.

Abstract

Applying information retrieval (IR) techniques to retrieve all correct links between software artefacts is in general impractical, as usually this means producing a high effort for discarding too many false positives. We show that the only way to recover traceability links using IR methods is to identify an "optimal" threshold that achieves an acceptable balance between traced links and false positives. Unfortunately, such threshold is not known a priori. For this reason we have devised the need to use an incremental traceability recovery approach to gradually identify the threshold where it is more convenient to stop the traceability recovery process, and provide evidence of this in a case study. We also report the experience of using the incremental traceability recovery during the development of software projects.

Improving Comprehensibility of Source Code via Traceability Information

A. De Lucia, M. Di Penta, R. Oliveto, and F. Zurolo*
Tool demo paper14th IEEE International Conference on Program Comprehension, pages 317-326, Athens, Greece, 2006. IEEE Press.

Abstract

The presence of traceability links between software artefacts is very important to achieve high comprehensibility and maintainability. This is confirmed by several researches and tools aiming at support traceability link maintenance and recovery. We propose to use traceability information combined with Information Retrieval techniques within an Eclipse plug-in to show the software engineer the similarity between source code components being developed and the high level artefacts they should be traced on. Such a similarity suggests actions aiming at improving the correct usage of identifiers and comments in source code and, as a consequence, the traceability and the comprehensibility level. The approach and tool have been assessed with a controlled experiment performed with master students.

2005

ADAMS: ADvanced Artefact Management System

A. De Lucia, F. Fasano, R. Oliveto, and G. Tortora
Tool demo paper10th European Conference on Software Maintenance and Reengineering, pages 349-350, Bari, Italy, 2005. IEEE Press.

Abstract

In this paper, we present ADAMS (ADvanced Artefact Management System), a Web-based system that integrates project management and artefact management features, as well as context-awareness and artefact traceability features. In particular, we focus on two features of the tool, namely hierarchical versioning and traceability support.

Traceability Management in ADAMS

A. De Lucia, F. Fasano, F. Francese, and R. Oliveto
Conference paperInternational Workshop on Distributed Software Development, pages 135-149, Paris, France, 2005. Austrian Computer Society Press.

Abstract

Maintaining traceability links (dependencies) between artefacts enables the management of changes during incremental and iterative software development in a flexible way. In this paper we present the traceability environment offered by ADAMS (ADvanced Artefact Management System). Basically, the traceability layer is used to propagate events concerning changes to an artefact to the dependent artefacts, thus also increasing the context awareness within the project. The proliferation of the messages generated by the system could slowdown the system and cause the developer to ignore notifications. Therefore, in ADAMS a visualisation tool is also included that enables the software engineer to browse the dependences concerning a given artefact and selectively subscribe events he/she is interested in.

ADAMS ReTrace: a Traceability Recovery Tool

A. De Lucia, F. Fasano, R. Oliveto, and G. Tortora
Conference paper9th European Conference on Software Maintenance and Reengineering, pages 32-41, Manchester, UK, 2005. IEEE Press.

Abstract

We present the traceability recovery tool developed in the ADAMS artefact management system. The tool is based on an Information Retrieval technique, namely Latent Semantic Indexing and aims at supporting the software engineer in the identification of the traceability links between artefacts of different types. We also present a case study involving seven student projects which represented an ideal workbench for the tool. The results emphasise the benefits provided by the tool in terms of new traceability links discovered, in addition to the links manually traced by the software engineer. Moreover, the tool was also helpful in identifying cases of lack of similarity between artefacts manually traced by the software engineer, thus revealing inconsistencies in the usage of domain terms in these artefacts. This information is valuable to assess the quality of the produced artefacts.

2004

Enhancing an Artefact Management System with Traceability Recovery Features

A. De Lucia, F. Fasano, R. Oliveto, and G. Tortora
Conference paper20th IEEE International Conference on Software Maintenance, pages 306-315, Chicago IL, USA, 2004. IEEE Press.

Abstract

We present a traceability recovery method and tool based on latent semantic indexing (LSI) in the context of an artefact management system. The tool highlights the candidate links not identified yet by the software engineer and the links identified but missed by the tool, probably due to inconsistencies in the usage of domain terms in the traced software artefacts. We also present a case study of using the traceability recovery tool on software artefacts belonging to different categories of documents, including requirement, design, and testing documents, as well as code components.

Recovering Traceability Links between Requirement Artefacts: a Case Study

A. De Lucia, F. Fasano, F. Francese, and R. Oliveto
Conference paper16th International Conference of Software Engineering and Knowledge Engineering - Workshop on Knowledge Oriented Maintenance, pages 453-466, Banff, Alberta, Canada, 2004. Knowledge Systems Institute Press.

Abstract

Recently, researchers have addressed the problem of recovering traceability links between code and documentation using information retrieval techniques. We present a case study of applying Latent Semantic Indexing to recovering traceability links between artefacts produced during the requirements phase of a software development process and discuss the application of our approach within an artefact management system.

These documents are made available as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each copyright holder. These works may not be reposted without the explicit permission of the copyright holder.