When optimizing, always perform just enough to get the job done so as to not introduce new bugs. In this recipe, we introduce some simple optimizations that require minimal modification to the source code and test how effective they are at decreasing the execution time.
The following steps walk you through establishing a new performance baseline for the code and implementing several small optimizations:
main
function, edit the line that sets the number of sphere points, changing it from 960
to 60
:n_sphere = 60
time
command to benchmark the new code configuration and compute the new result:time python asa.py 1R0R.pdb 12114.7 angstrom squared. python asa.py 1R0R.pdb 11.05s user 0.04s system 97% cpu 11.334 total
asa.py
into a new file as a local code-versioning system. You can also just use Git if you are comfortable using it to roll back to previous versions of the code. At the command line, enter the following command:cp asa.py asa_v1.py
calculate_asa
function, change the original version:for point in sphere_points: is_accessible = True test_point.x = point[0]*radius + atom_i.pos.x test_point.y = point[1]*radius + atom_i.pos.y test_point.z = point[2]*radius + atom_i.pos.z
To:
atom_i_pos_x = atom_i.pos.x atom_i_pos_y = atom_i.pos.y atom_i_pos_z = atom_i.pos.z n_accessible_point = 0 for point in sphere_points: is_accessible = True test_point.x = point[0]*radius + atom_i_pos_x test_point.y = point[1]*radius + atom_i_pos_y test_point.z = point[2]*radius + atom_i_pos_z
math.sqrt
functions in the find_neighbor_indices
function, transforming the original code:dist = pos_distance(atom_k.pos, atom_i.pos) if dist < radius + atom_i.radius:
Into:
dist = pos_distance_sq(atom_k.pos, atom_i.pos) if dist < (radius + atom_i.radius)**2:
time
command again to see the end result:time python asa_v1.py 1R0R.pdb 12114.7 angstrom squared. python asa_v1.py 1R0R.pdb 9.42s user 0.03s system 99% cpu 9.500 total
12114.7
, is the same value that was computed previously (which it is).With this first optimization effort, we tried to make only very small tweaks to the code and not introduce any new libraries or radical restructuring.
The first trick was to use local variables inside the loops to remove unneeded dereferencing. In the original code, , we must locate the pos
attribute of the atom_i
object for each point in sphere_points
and then access the x
attribute of the vector3d.Vector3d
object:
for point in sphere_points: … test_point.x = point[0]*radius + atom_i.pos.x
We perform this same lookup for each iteration of this loop despite the fact that atom_i
has not changed. The simple solution is to pull this lookup to just before the loop and cache the value in a local variable used inside the loop. This is shown in the new version of the code as follows:
atom_i_pos_x = atom_i.pos.x … for point in sphere_points: … test_point.x = point[0]*radius + atom_i_pos_x
The second optimization we used was to remove an unnecessary call to the square root. Remember, square roots are painfully expensive in comparison to other mathematical operations that can be tested:
In [1]: %timeit sqrt(29111) 10000000 loops, best of 3: 116 ns per loop In [2]: %timeit 29111 + 372.1 10000000 loops, best of 3: 30.9 ns per loop
This factor of approximately 4x
differential is actually on the low side and we have seen square root calculations require ten times the time of additions and multiplications.
In the original code, a square root was concealed in the pos_distance
function call. We then test the result against the sum of radius
and atom_i.radius
:
dist = pos_distance(atom_k.pos, atom_i.pos) if dist < radius + atom_i.radius:
Assuming that the smallest possible dist
value is greater than 1
, we can square both sides of the inequality and remove the need for the original square root:
dist = pos_distance_sq(atom_k.pos, atom_i.pos) if dist < (radius + atom_i.radius)**2:
The two simple changes in the preceding code resulted in an approximately 15 percent decrease in the execution time of the code. We can take these two ideas and roll them out to all other locations in asa.py
. We can also implement a number of other simple changes such as preallocating lists that could offer additional performance increases. However, our target is to get the code execution time down from 60 seconds to 1 second, and it is unlikely that these little changes will yield the improvement we need.
3.145.166.167