Improving Code Readability Models with Textual Featres

Experimental material and raw data

Simone Scalabrino¹, Mario Linares-Vásquez², Denys Poshyvanyk², Rocco Oliveto¹

¹ University of Molise, Pesche (IS), Italy

² The College of William and Mary, Williamsburg, Virginia (USA)

Abstract

Code reading is one of the most frequent activities in software maintenance; before implementing changes, it is necessary to fully understand source code often written by other developers. Thus, readability is a crucial aspect of source code that might significantly influence program comprehension effort. In general, software readability models take into account only structural aspects of source code, e.g., line length and a number of comments. However, code is a particular form of text; therefore, a code readability model should not ignore the textual aspects of source code encapsulated in identifiers and comments. In this paper, we propose a set of textual features that could be used to measure code readability. We evaluated the proposed textual features on 600 code snippets manually evaluated (in terms of readability) by 5K+ people. The results show that the proposed features complement classic structural features when predicting readability judgments. Consequently, a code readability model based on a richer set of features, including the ones proposed in this paper, achieves a significantly better accuracy as compared to all the state-of-the-art readability models.

Dataset

This dataset contains 200 method from four open source projects, namely:

The methods were manually evaluated by 9 Computer Science studens from The College of William and Mary, Williamsburg, VA (USA). The participants were asked to evaluate the readability of each method using a five-point Likert scale ranging between 1 (very unreadable) and 5 (very readable). We collected the rankings through a web application where participants were able to:

read the method (with syntax highlighting);
evaluate its readability;
write comments about the method.

The participants were allowed to complete the evaluation in multiple rounds (e.g, evaluate the first 100 methods in one day and the remaining after one week). This is a screenshot of the web application:

For each single snippet, it was computed the average score, in order to set a proper cut-off to distinguish readable snippets from unreadable ones. The following plots show (i) the frequency of each score and (ii) the (estimated) probability density of the average scores (bandwidth: 0.05):

Material

Extension

Check the extesion of our work: A Comprehensive Model for Code Readability (Online Appendix with the readability tool)