A large ecosystem study to understand the effect of programming languages on code quality

B. Ray*; D. Posnett    * University of Virginia, Charlottesville, VA, United States
University of California, Davis, CA, United States

Abstract

Developers are often tasked with choosing a language for a project. Although such decisions are usually based on prior experiences or legacy requirements, as reported by Meyerovich et al. [1], language choice is in general believed to impact the code quality. For example, statically typed languages catch type errors early, at compile time, while dynamically typed languages catch errors at run time, if the errors at all arise. Other language properties such as strong versus weak typing, procedural versus functional language, managed versus unmanaged memory, etc. can also influence the number of bugs in the resulting code.

In this chapter, we empirically investigate the impact of language choice on code quality. We analyzed the 17 most popular languages from GitHub including C, C++, C#, Objective-C, Go, Java, Ruby, Php, Python, Perl, Javascript, CoffeeScript, TypeScript, Clojure, Erlang, Haskell, and Scala across 728 GitHub projects. Based on our previous study [2] we found that the overall effect of language on code quality is rather “modest.”

Keywords

Language choice; Code quality; GitHub projects; Regression models

Developers are often tasked with choosing a language for a project. Although such decisions are usually based on prior experiences or legacy requirements, as reported by Meyerovich et al. [1], language choice is in general believed to impact the code quality. For example, statically typed languages catch type errors early, at compile time, while dynamically typed languages catch errors at run time, if the errors at all arise. Other language properties such as strong versus weak typing, procedural versus functional language, managed versus unmanaged memory, etc. can also influence the number of bugs in the resulting code.

In this chapter, we empirically investigate the impact of language choice on code quality. We analyzed the 17 most popular languages from GitHub including C, C++, C#, Objective-C, Go, Java, Ruby, Php, Python, Perl, Javascript, CoffeeScript, TypeScript, Clojure, Erlang, Haskell, and Scala across 728 GitHub projects. Based on our previous study [2] we found that the overall effect of language on code quality is rather “modest.”

Comparing Languages

Comparing programming languages across multiple projects is a non-trivial task. Language choice may depend on many things. For what purpose are you choosing the language, what are the underlying conditions, how big is your team, or who are the programmers? More than that, though, how can the language itself matter? It is the properties of languages that are important. To estimate the language effect on code quality, all these factors need to be considered.

One method for answering these types of question is the controlled study. For example, recent studies monitored students while programming in different languages and then compared outcomes such as development effort and program quality [3]. Although such controlled studies are precise, they are typically limited in their tasks and context. Studies that employ students to execute tasks that can be completed in a short period are often criticized, sometimes unfairly [4], as not emulating real-world development.

Another alternative is the observational study on a small set of chosen projects. For example, Bhattacharya et al. [5] studied four projects developed in both C and C++ and found that the software components developed in C++ are in general more reliable than C. While less limited than the controlled study, such data often suffers from lack of diversity. Further, if it is collected from the same organization, eg, the Apache Software Foundation, it might be tainted with organizational bias.

What we can do, however, is to seek data sources that naturally capture diversity among software developers, organizations, software languages, and projects [2]. The availability of a large number of open source projects written in multiple languages, and collected in software forges such as GitHub, facilitate the study of questions such as “which language is more defect prone” in an observational setting.

Study Design and Analysis

GitHub projects vary substantially across size, age, and number of developers. Each project repository provides a detailed record, including contribution history, project size, authorship, and defect repair. This is great because we want diversity in our dataset. Diversity helps us make the case for generalizability [6].

This same diversity, however, presents challenges. For example, quality of software may depend on software engineering practices, development environment, developer knowledge, etc. It is important to reduce such unforeseen effects as much as possible. For instance, we focus on projects that are currently under active development, that are widely popular in the open source community, and that are written in somewhat important languages. These choices helped us to mitigate the effect of unmaintained code coming from unskillful development practices. We further excluded some languages from a project that has too few examples within the repository. This ensures that the studied languages have significant activity within the projects.

Finally, we analyzed the 17 most popular languages from GitHub comprising 728 projects, 63 million lines of code, and 1.5 million commits from 29,000 developers. Without such a large software ecosystem, it would be challenging to obtain enough data to statistically compare development across 17 different languages.

To understand the effect of these languages on code quality, we modeled the number of bugfix commits against language. A number of confounding factors including code size, team size, and age/maturity were further included in the study design to capture the variance attributable to them. Failure to include important metrics will lead to spurious relationships with the outcome. The classic example of this is code size. Code that is larger, will, on average, contain more defects. Therefore, languages that are favored in larger projects may appear to be more defect prone than other languages.

Moreover, comparing 17 different languages can be messy. What we want to say is that language X causes N additional defects over language Y. However, attempting to compare all of the languages to each other would complicate interpretation. Instead, we compare each language to a baseline. We weight each language with respect to the number of examples written in each language to compare languages fairly. We then compare it to the average number of defects across all languages in order to compare the impact of each language against a common background.

Results

Finally, using a combination of regression modeling, text analytics, and visualization, we examined the interactions of language, language properties, application domain, and defect type. We found that

1. Functional languages (eg, Clojure, Erlang, Haskell, Scala) produce slightly better results, ie, produce fewer bugs, than procedural languages (eg, C, C++, C#, Objective-C, Java, Go); strong typing (eg, C#, Java, Python, etc.) is somewhat better than weak typing (eg, C, C++, etc.), and static typing (eg, C, C++, C#, etc.) is modestly better than dynamic typing (eg, JavaScript, Python, Perl, etc.).

2. Defect proneness of languages in general is not associated with software domains, ie, the type of project does not mediate this relationship to a large degree.

3. Also, languages are more related to individual bug categories (memory, concurrency, etc.) than bugs overall.

The modest effects arising from language choices on code quality are overwhelmingly dominated by the other process factors that we have controlled for in our regression models. We hasten to caution the reader that even these modest effects might quite possibly be due to other, intangible process factors, eg, developers’ expertise, the preference of certain personality types for functional, static and strongly typed languages.

Summary

We have presented a large-scale study of the impact of programming language choice on software quality. The GitHub data we used is characterized by its complexity and the variance along multiple dimensions of language, language type, usage domain, amount of code, sizes of commits, and the various characteristics of the many issue types. We report that programming language choice helps to minimize a certain type of errors like memory errors, concurrency errors, etc. However, as opposed to common belief, in general, a language choice does not impact software quality much.

References

[1] Meyerovich L.A., Rabkin A.S. Empirical analysis of programming language adoption. In: ACM SIGPLAN Notices; ACM; 1–18. 2013;vol. 48 (10).

[2] Ray B., Posnett D., Filkov V., Devanbu P. A large scale study of programming languages and code quality in github. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering; New York: ACM; 2014:155–165.

[3] Hanenberg S. An experiment about static and dynamic type systems: Doubts about the positive impact of static type systems on development time. In: ACM Sigplan Notices; ACM; 22–35. 2010;vol. 45 (10).

[4] Murphy-Hill E., Murphy G.C., Griswold W.G. Understanding context: creating a lasting impact in experimental software engineering research. In: Proceedings of the FSE/SDP workshop on future of software engineering research; ACM; 2010:255–258.

[5] Bhattacharya P., Neamtiu I. Assessing programming language impact on development and maintenance: a study on C and C++. In: 33rd international conference on software engineering (ICSE), 2011; IEEE; 2011:171–180.

[6] Nagappan M., Zimmermann T., Bird C. Diversity in software engineering research. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering; ACM; 2013:466–476.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.128.200.71