Static code analysis tools can provide a rich summary of information on the static properties of your code, which can provide insights into aspects such as complexity and modifiability/readability of the code.
Python has a lot of third-party tool support, which helps in measuring the static aspects of Python code such as these:
The following are some of the most popular tools in the Python ecosystem that can perform such static analysis:
Code smells are surface symptoms of deeper problems with your code. They usually indicate problems with the design, which can cause bugs in the future or negatively impact development of the particular piece of code.
Code smells are not bugs themselves, but they are patterns that indicate that the approach to solving problems adopted in the code is not right and should be fixed by refactoring.
Some of the common code smells are as follows:
At the class level, there are the following:
At the method/function level, there are the following:
A related antipattern to code smell is design smell, which are the surface symptoms in the design of a system that indicate underlying deeper problems in the architecture.
Cyclomatic complexity is a measure of complexity of a computer program. It is computed as the number of linearly independent paths through the program's source code from start to finish.
For a piece of code with no branches at all, such as the one given next, the Cyclomatic complexity would be 1, as there is just one path through the code:
""" Module power.py """ def power(x, y): """ Return power of x to y """ return x^y
A piece of code with two branches, like the following one, will have a complexity of 2:
""" Module factorial.py """ def factorial(n): """ Return factorial of n """ if n == 0: return 1 else: return n*factorial(n-1)
The use of Cyclomatic complexity as a metric using the control graph of a code was developed by Thomas J. McCabe in 1976. Hence, it is also called McCabe complexity or the McCabe index.
To measure the metric, the control graph can be pictured as a directed graph, where the nodes represent the blocks of the program and edges represent control flow from one block to another.
With respect to the control graph of a program, the McCabe complexity can be expressed as follows:
M = E − N + 2P
In the preceding equation, we have the following:
In Python, the mccabe
package, written by Ned Batcheldor, can be used to measure a program's Cyclomatic complexity. It can be used as a standalone module or as a plugin to programs such as Flake8 or Pylint.
For example, here is how we measure the Cyclomatic complexity of the two code pieces given earlier:
The
–min
argument tells the mccabe
module to start measuring and reporting from the given McCabe index.
Let's now try a few of the aforementioned tools and use them on an example module to find out what kind of information these tools report.
The purpose of the following sections is not to teach you the usage of these tools or their command-line options—these can be picked up via the tool's documentation. Instead, the purpose is to explore the depth and richness of information that these tools provide with respect to the style, logic, and other issues with the code.
For purposes of this testing, the following contrived module example has been used. It is written purposefully with a lot of coding errors, style errors, and coding smells.
Since the tools we are using lists errors by line numbers, the code has been presented with numbered lines so that it is easy to follow the output of the tools back to the code:
1 """ 2 Module metrictest.py 3 4 Metric example - Module which is used as a testbed for static checkers. 5 This is a mix of different functions and classes doing different things. 6 7 """ 8 import random 9 10 def fn(x, y): 11 """ A function which performs a sum """ 12 return x + y 13 14 def find_optimal_route_to_my_office_from_home(start_time, 15 expected_time, 16 favorite_route='SBS1K', 17 favorite_option='bus'): 18 19 20 d = (expected_time – start_time).total_seconds()/60.0 21 22 if d<=30: 23 return 'car' 24 25 # If d>30 but <45, first drive then take metro 26 if d>30 and d<45: 27 return ('car', 'metro') 28 29 # If d>45 there are a combination of options 30 if d>45: 31 if d<60: 32 # First volvo,then connecting bus 33 return ('bus:335E','bus:connector') 34 elif d>80: 35 # Might as well go by normal bus 36 return random.choice(('bus:330','bus:331',':'.join((favorite_option, 37 favorite_route)))) 38 elif d>90: 39 # Relax and choose favorite route 40 return ':'.join((favorite_option, 41 favorite_route)) 42 43 44 class C(object): 45 """ A class which does almost nothing """ 46 47 def __init__(self, x,y): 48 self.x = x 49 self.y = y 50 51 def f(self): 52 pass 53 54 def g(self, x, y): 55 56 if self.x>x: 57 return self.x+self.y 58 elif x>self.x: 59 return x+ self.y 60 61 class D(C): 62 """ D class """ 63 64 def __init__(self, x): 65 self.x = x 66 67 def f(self, x,y): 68 if x>y: 69 return x-y 70 else: 71 return x+y 72 73 def g(self, y): 74 75 if self.x>y: 76 return self.x+y 77 else: 78 return y-self.x
Let's see what Pylint has to say about our rather horrible-looking piece of test code.
$ pylint –reports=n metrictest.py
Here is the detailed output captured in two screenshots:
Take a look at the screenshot of the next page of the report:
Let's focus on those very interesting last 10-20 lines of the Pylint report, skipping the earlier styling and convention warnings.
Here are the errors, classified into a table. We have skipped similar occurrences to keep the table short:
Error |
Occurrences |
Explanation |
Type of Code Smell |
---|---|---|---|
Invalid function name |
The |
The name |
Too short identifier |
Invalid variable name |
The |
The names |
Too short identifier |
Invalid function name |
Function name, |
The function name is too long |
Too long identifier |
Invalid variable name |
The |
The name |
Too short identifier |
Invalid class name |
Class |
The name |
Too short identifier |
Invalid method name |
Class |
The name |
Too short identifier |
Invalid |
Class |
Doesn't call base class |
Breaks contract with base Class |
Arguments of f differ in class |
Class |
Method signature breaks contract with base class signature |
Refused bequest |
Arguments of |
Class |
Method signature breaks contract with base class signature |
Refused bequest |
As you can see, Pylint has detected a number of code smells, which we discussed in the previous section. Some of the most interesting ones are how it detected the absurdly long function name and how the subclass D
breaks the contract with the base class, C
, in its __init__
method and other methods.
Let's see what flake8
has to tell us about our code. We will run it in order to report the statistics and summary of error counts:
$ flake8 --statistics --count metrictest.py
The preceding command gives the following output:
As you would've expected from a tool that is written to mostly follow PEP-8 conventions, the errors reported are all styling and convention errors. These errors are useful to improve the readability of the code and make it follow closer to the style guidelines of PEP-8.
It is a good time to now check the complexity of our code. First, we will use mccabe
directly and then call it via Flake8:
As expected, the complexity of the office-route function is too high, as it has too many branches and sub-branches.
As flake8
prints too many styling errors, we will grep specifically for the report on complexity:
As expected, Flake8 reports the find
_optimal_route_to_my_office_from_home
function as too complex.
As a last step, let's run pyflakes
on the code:
There is no output! So, Pyflakes finds no issues with the code. The reason is that Pyflakes is a basic checker that does not report anything beyond the obvious syntax and logic errors, unused imports, missing variable names, and the like.
Let's add some errors into our code and rerun Pyflakes. Here is the adjusted code with line numbers:
1 """ 2 Module metrictest.py 3 4 Metric example - Module which is used as a testbed for static checkers. 5 This is a mix of different functions and classes doing different things. 6 7 """ 8 import sys 9 10 def fn(x, y): 11 """ A function which performs a sum """ 12 return x + y 13 14 def find_optimal_route_to_my_office_from_home(start_time, 15 expected_time, 16 favorite_route='SBS1K', 17 favorite_option='bus'): 18 19 20 d = (expected_time – start_time).total_seconds()/60.0 21 22 if d<=30: 23 return 'car' 24 25 # If d>30 but <45, first drive then take metro 26 if d>30 and d<45: 27 return ('car', 'metro') 28 29 # If d>45 there are a combination of options 30 if d>45: 31 if d<60: 32 # First volvo,then connecting bus 33 return ('bus:335E','bus:connector') 34 elif d>80: 35 # Might as well go by normal bus 36 return random.choice(('bus:330','bus:331',':'.join((favorite_option, 37 favorite_route)))) 38 elif d>90: 39 # Relax and choose favorite route 40 return ':'.join((favorite_option, 41 favorite_route)) 42 43 44 class C(object): 45 """ A class which does almost nothing """ 46 47 def __init__(self, x,y): 48 self.x = x 49 self.y = y 50 51 def f(self): 52 pass 53 54 def g(self, x, y): 55 56 if self.x>x: 57 return self.x+self.y 58 elif x>self.x: 59 return x+ self.y 60 61 class D(C): 62 """ D class """ 63 64 def __init__(self, x): 65 self.x = x 66 67 def f(self, x,y): 68 if x>y: 69 return x-y 70 else: 71 return x+y 72 73 def g(self, y): 74 75 if self.x>y: 76 return self.x+y 77 else: 78 return y-self.x 79 80 def myfunc(a, b): 81 if a>b: 82 return c 83 else: 84 return a
Take a look at the following output:
Pyflakes now returns some useful information in terms of a missing name (random
), unused import (sys
), and an undefined name (the c
variable in the newly introduced function, myfunc
). So it does perform some useful static analysis on the code. For example, the information on the missing and undefined names is useful to fix obvious bugs in the preceding code.
3.22.41.212