Automatically Assessing Code Understandability: How Far Are We?

Experimental material and raw data

Simone Scalabrino¹, Gabriele Bavota², Christopher Vendome³,
Mario Linares-Vásquez⁴, Denys Poshyvanyk³, Rocco Oliveto¹

¹ University of Molise, Pesche (IS), Italy

² Università della Svizzera Italiana (USI), Lugano, Switzerland

³ The College of William and Mary, Williamsburg, Virginia (USA)

⁴ Universidad de los Andes, Bogotá, Colombia

Abstract

Program understanding plays a pivotal role in software maintenance and evolution: a deep understanding of code is the stepping stone for most software-related activities, such as bug fixing or testing. Being able to measure the understandability of a piece of code might help in estimating the effort required for a maintenance activity, in comparing the quality of alternative implementations, or even in predicting bugs. Unfortunately, there are no existing metrics specifically designed to assess the understandability of a given code snippet. In this paper we perform a first step in this direction, by studying the extent to which several types of metrics computed on code, documentation and developers correlate with code understandability. To perform such an investigation we ran a study with 46 participants who were asked to understand eight code snippets each. We collected a total of 324 evaluations aiming at assessing the perceived understandability, the actual level of understanding and the time needed to understand a code snippet. Our results demonstrate that none of the (existing and new) metrics we considered is able to capture code understandability, not even the ones assumed to assess quality attributes strongly related with it, such as code readability and complexity.

Presentation at ASE 2017

Dataset

This dataset contains 50 methods from 10 open source projects, namely:

We asked 46 software developers to (i) read the methods and (ii) state if they understood the method. If they said they understood the method, they had to answer 3 verification questions. We collected the evaluations through a web application. The participants were able to:

read the method (with syntax highlighting)
answer the questions
browse classes referred by the method they had to evaluate.

Besides measuring the number of correct answers, we also register the number of seconds needed to understand each method. The figure below shows the main evaluation page, which contains (1) the method and (2) links to related classes. The participants could answer "I understood" or "I did not understand" (3).

The participants answered "I understood the method" in 228 of the evaluations (70%). The mean time needed to understand a method is 154 seconds, while the median is 72 seconds. The figure below shows the estimated distribution of the time needed to answer.

We correlated 121 metrics to four proxies of understandability. As a first step, for each pair of metrics exhibiting a strong correlation (i.e., with a Kendall’s |τ| ≥ 0.7), we excluded the ones which presented the highest number of missing values or, if equals, one at random. We reduced the number of investigated metrics from 121 to 74. Click here to view the list of excluded metrics (and the correlation with the included metric).

Raw data

Raw dataset. Each row includes information about (i) the evaluator, (ii) the method (included all the metrics) and (iii) all the understandability variables
Total number of upvotes for all the external APIs in the systems taken into account
Popularity of Java classes
Verification questions (the first is always the correct answer; in the webapp the answers are shuffled)

Automatically Assessing Code Understandability: How Far Are We?

Experimental material and raw data

Simone Scalabrino1, Gabriele Bavota2, Christopher Vendome3, Mario Linares-Vásquez4, Denys Poshyvanyk3, Rocco Oliveto1

1 University of Molise, Pesche (IS), Italy

2 Università della Svizzera Italiana (USI), Lugano, Switzerland

3 The College of William and Mary, Williamsburg, Virginia (USA)

4 Universidad de los Andes, Bogotá, Colombia