5. Refactoring

image

© Jennifer M. Kohnke

The only factor becoming scarce in a world of abundance is human attention.

—Kevin Kelly, in Wired

This chapter is about human attention, about paying attention to what you are doing and making sure that you are doing your best. It is about the difference between getting something to work and getting something right. It is about the value we place in the structure of our code.

In Refactoring, his classic book, Martin Fowler defines refactoring as “the process of changing a software system in such a way that it does not alter the external behavior of the code yet improves its internal structure.”1 But why would we want to improve the structure of working code? What about “If it’s not broken, don’t fix it!”?

Every software module has three functions. First is the function it performs while executing. This function is the reason for the module’s existence. The second function of a module is to afford change. Almost all modules will change in the course of their lives, and it is the responsibility of the developers to make sure that such changes are as simple as possible to make. A module that is difficult to change is broken and needs fixing, even though it works. The third function of a module is to communicate to its readers. Developers who are not familiar with the module should be able to read and understand it without undue mental gymnastics. A module that does not communicate is broken and needs to be fixed.

What does it take to make a module easy to read and easy to change? Much of this book is dedicated to principles and patterns whose primary goal is to help you create modules that are flexible and adaptable. But it takes something more than just principles and patterns to make a module that is easy to read and change. It takes attention. It takes discipline. It takes a passion for creating beauty.

A Simple Example of Refactoring: Generating Primes

Consider the code in Listing 5-1. This program generates prime numbers. It is one big function with many single-letter variables and comments to help us read it.


Listing 5-1. GeneratePrimes.cs, version 1

/// <remark>
/// This class Generates prime numbers up to a user specified
/// maximum. The algorithm used is the Sieve of Eratosthenes.
///
/// Eratosthenes of Cyrene, b. c. 276 BC, Cyrene, Libya --
/// d. c. 194, Alexandria. The first man to calculate the
/// circumference of the Earth. Also known for working on
/// calendars with leap years and ran the library at
/// Alexandria.
///
/// The algorithm is quite simple. Given an array of integers
/// starting at 2. Cross out all multiples of 2. Find the
/// next uncrossed integer, and cross out all of its multiples.
/// Repeat until you have passed the square root of the
/// maximum value.
///
/// Written by Robert C. Martin on 9 Dec 1999 in Java
/// Translated to C# by Micah Martin on 12 Jan 2005.

///</remark>

using System;

/// <summary>
/// author: Robert C. Martin
/// </summary>
public class GeneratePrimes
{
  ///<summary>
  /// Generates an array of prime numbers.
  ///</summary>
  ///
  /// <param name="maxValue">The generation limit.</param>
  public static int[] GeneratePrimeNumbers(int maxValue)
  {
    if (maxValue >= 2) // the only valid case
    {
      // declarations
      int s = maxValue + 1; // size of array
      bool[] f = new bool[s];
      int i;

      // initialize array to true.
      for (i = 0; i < s; i++)
        f[i] = true;

      // get rid of known non-primes
      f[0] = f[1] = false;

      // sieve
      int j;
      for (i = 2; i < Math.Sqrt(s) + 1; i++)
      {
        if(f[i]) // if i is uncrossed, cross its multiples.
        {
          for (j = 2 * i; j < s; j += i)
            f[j] = false; // multiple is not prime
        }
      }

      // how many primes are there?
      int count = 0;
      for (i = 0; i < s; i++)
      {
        if (f[i])
          count++; // bump count.
      }

      int[] primes = new int[count];

      // move the primes into the result
      for (i = 0, j = 0; i < s; i++)

      {
        if (f[i])             // if prime
          primes[j++] = i;
      }

      return primes;  // return the primes
    }
    else // maxValue < 2
      return new int[0]; // return null array if bad input.
  }
}


Unit Testing

The unit test for GeneratePrimes is shown in Listing 5-2. It takes a statistical approach, checking whether the generator can generate primes up to 0, 2, 3, and 100. In the first case, there should be no primes. In the second, there should be one prime, and it should be 2. In the third, there should be two primes, and they should be 2 and 3. In the last case, there should be 25 primes, the last of which is 97. If all these tests pass, I make the assumption that the generator is working. I doubt that this is foolproof, but I can’t think of a reasonable scenario in which these tests would pass but the function fail.


Listing 5-2. GeneratePrimesTest.cs

using NUnit.Framework;

[TestFixture]
public class GeneratePrimesTest
{
  [Test]
  public void TestPrimes()
  {
    int[] nullArray = GeneratePrimes.GeneratePrimeNumbers(0);
    Assert.AreEqual(nullArray.Length, 0);

    int[] minArray = GeneratePrimes.GeneratePrimeNumbers(2);
    Assert.AreEqual(minArray.Length, 1);
    Assert.AreEqual(minArray[0], 2);

    int[] threeArray = GeneratePrimes.GeneratePrimeNumbers(3);
    Assert.AreEqual(threeArray.Length, 2);
    Assert.AreEqual(threeArray[0], 2);
    Assert.AreEqual(threeArray[1], 3);

    int[] centArray = GeneratePrimes.GeneratePrimeNumbers(100);
    Assert.AreEqual(centArray.Length, 25);
    Assert.AreEqual(centArray[24], 97);
  }
}


Refactoring

To help me refactor this program, I am using Visual Studio with the ReSharper refactoring add-in from JetBrains. This tool makes it trivial to extract methods and rename variables and classes.

It seems pretty clear that the main function wants to be three separate functions. The first initializes all the variables and sets up the sieve. The second executes the sieve, and the third loads the sieved results into an integer array. To expose this structure more clearly, I extracted those functions into three separate methods (Listing 5-3). I also removed a few unnecessary comments and changed the name of the class to PrimeGenerator. The tests all still ran.

Extracting the three functions forced me to promote some of the variables of the function to static fields of the class. This makes it much clearer which variables are local and which have wider influence.


Listing 5-3. PrimeGenerator.cs, version 2

///<remark>
/// This class Generates prime numbers up to a user specified
/// maximum. The algorithm used is the Sieve of Eratosthenes.
/// Given an array of integers starting at 2:
/// Find the first uncrossed integer, and cross out all its
/// multiples. Repeat until there are no more multiples
/// in the array.
///</remark>
using System;

public class PrimeGenerator
{
  private static int s;
  private static bool[] f;
  private static int[] primes;

  public static int[] GeneratePrimeNumbers(int maxValue)
  {
    if (maxValue < 2)
      return new int[0];
    else
    {
      InitializeSieve(maxValue);
      Sieve();
      LoadPrimes();
      return primes;  // return the primes
    }
  }
  private static void LoadPrimes()
  {
    int i;
    int j;

    // how many primes are there?
    int count = 0;
    for (i = 0; i < s; i++)
    {
      if (f[i])
        count++; // bump count.
    }

    primes = new int[count];

    // move the primes into the result
    for (i = 0, j = 0; i < s; i++)
    {
      if (f[i])             // if prime
        primes[j++] = i;
    }
  }

  private static void Sieve()
  {
    int i;
    int j;
    for (i = 2; i < Math.Sqrt(s) + 1; i++)
    {
      if(f[i]) // if i is uncrossed, cross its multiples.
      {
        for (j = 2 * i; j < s; j += i)
          f[j] = false; // multiple is not prime
      }
    }
  }

  private static void InitializeSieve(int maxValue)
  {
    // declarations
    s = maxValue + 1; // size of array
    f = new bool[s];
    int i;

    // initialize array to true.
    for (i = 0; i < s; i++)
      f[i] = true;

    // get rid of known non-primes
    f[0] = f[1] = false;
  }
}


The InitializeSieve function is a little messy, so I cleaned it up considerably (Listing 5-4). First, I replaced all usages of the s variable with f.Length. Then I changed the names of the three functions to something a bit more expressive. Finally, I rearranged the innards of InitializeArrayOfIntegers (née InitializeSieve) to be a little nicer to read. The tests all still ran.


Listing 5-4. PrimeGenerator.cs, version 3 (partial)

public class PrimeGenerator
{
  private static bool[] f;
  private static int[] result;

  public static int[] GeneratePrimeNumbers(int maxValue)
  {
    if (maxValue < 2)
      return new int[0];
    else
    {
      InitializeArrayOfIntegers(maxValue);
      CrossOutMultiples();
      PutUncrossedIntegersIntoResult();
      return result;
    }
  }

  private static void InitializeArrayOfIntegers(int maxValue)
  {
     // declarations
     f = new bool[maxValue + 1];
     f[0] = f[1] = false; //neither primes nor multiples.
     for (int i = 2; i < f.Length; i++)
       f[i] = true;
  }
}


Next, I looked at CrossOutMultiples. There were a number of statements in this function, and in others, of the form if(f[i] == true). The intent was to check whether i was uncrossed, so I changed the name of f to unCrossed. But this led to ugly statements, such as unCrossed[i] = false. I found the double negative confusing. So I changed the name of the array to isCrossed and changed the sense of all the Booleans. The tests all still ran.

I got rid of the initialization that set isCrossed[0] and isCrossed[1] to true and simply made sure that no part of the function used the isCrossed array for indexes less than 2. I extracted the inner loop of the CrossOutMultiples function and called it CrossOutMultiplesOf. I also thought that if (isCrossed[i] == false) was confusing, so I created a function called NotCrossed and changed the if statement to if (NotCrossed(i)). The tests all still ran.

I spent a bit of time writing a comment that tried to explain why you have to iterate only up to the square root of the array size. This led me to extract the calculation into a function where I could put the explanatory comment. In writing the comment, I realized that the square root is the maximum prime factor of any of the integers in the array. So I chose that name for the variables and functions that dealt with it. The result of all these refactorings are in Listing 5-5. The tests all still ran.


Listing 5-5. PrimeGenerator.cs, version 4 (partial)

public class PrimeGenerator
{
  private static bool[] isCrossed;
  private static int[] result;

  public static int[] GeneratePrimeNumbers(int maxValue)
  {
    if (maxValue < 2)
      return new int[0];
    else
    {
      InitializeArrayOfIntegers(maxValue);
      CrossOutMultiples();
      PutUncrossedIntegersIntoResult();
      return result;
    }
  }

  private static void InitializeArrayOfIntegers(int maxValue)
  {
    isCrossed = new bool[maxValue + 1];
    for (int i = 2; i < isCrossed.Length; i++)
      isCrossed[i] = false;
  }

  private static void CrossOutMultiples()
  {
    int maxPrimeFactor = CalcMaxPrimeFactor();
    for (int i = 2; i < maxPrimeFactor + 1; i++)
    {
      if(NotCrossed(i))
        CrossOutputMultiplesOf(i);
    }
  }

  private static int CalcMaxPrimeFactor()
  {
    // We cross out all multiples of p, where p is prime.
    // Thus, all crossed out multiples have p and q for
    // factors.  If p > sqrt of the size of the array, then
    // q will never be greater than 1. Thus p is the
    // largest prime factor in the array and is also
    // the iteration limit.

    double maxPrimeFactor = Math.Sqrt(isCrossed.Length) + 1;
    return (int) maxPrimeFactor;
  }

  private static void CrossOutputMultiplesOf(int i)
  {
    for (int multiple = 2*i;
       multiple < isCrossed.Length;
       multiple += i)
      isCrossed[multiple] = true;
  }


  private static bool NotCrossed(int i)
  {
    return isCrossed[i] == false;
  }
}


The last function to refactor is PutUncrossedIntegersIntoResult. This method has two parts. The first counts the number of uncrossed integers in the array and creates the result array of that size. The second moves the uncrossed integers into the result array. I extracted the first part into its own function and did some miscellaneous cleanup (Listing 5-6). The tests all still ran.


Listing 5-6. PrimerGenerator.cs, version 5 (partial)

private static void PutUncrossedIntegersIntoResult()
{
  result = new int[NumberOfUncrossedIntegers()];
  for (int j = 0, i = 2; i < isCrossed.Length; i++)
  {
    if (NotCrossed(i))
      result[j++] = i;
  }
}

private static int NumberOfUncrossedIntegers()
{
  int count = 0;
  for (int i = 2; i < isCrossed.Length; i++)
  {
    if (NotCrossed(i))
      count++; // bump count.
  }
  return count;
}


The Final Reread

Next, I made one final pass over the whole program, reading it from beginning to end, rather like one would read a geometric proof. This is an important step. So far, I’ve been refactoring fragments. Now I want to see whether the whole program hangs together as a readable whole.

image

First, I realize that I don’t like the name InitializeArrayOfIntegers. What’s being initialized is not, in fact, an array of integers but an array of Booleans. But InitializeArrayOfBooleans is not an improvement. What we are really doing in this method is uncrossing all the relevant integers so that we can then cross out the multiples. So I change the name to UncrossIntegersUpTo. I also realize that I don’t like the name isCrossed for the array of Booleans. So I change it to crossedOut. The tests all still run.

One might think that I’m being frivolous with these name changes, but with a refactoring browser, you can afford to do these kinds of tweaks; they cost virtually nothing. Even without a refactoring browser, a simple search and replace is pretty cheap. And the tests strongly mitigate any chance that we might unknowingly break something.

I don’t know what I was smoking when I wrote all that maxPrimeFactor stuff. Yikes! The square root of the size of the array is not necessarily prime. That method did not calculate the maximum prime factor. The explanatory comment was simply wrong. So I rewrote the comment to better explain the rationale behind the square root and rename all the variables appropriately.2 The tests all still run.

What the devil is that +1 doing in there? It must have been paranoia. I was afraid that a fractional square root would convert to an integer that was too small to serve as the iteration limit. But that’s silly. The true iteration limit is the largest prime less than or equal to the square root of the size of the array. I’ll get rid of the +1.

The tests all run, but that last change makes me pretty nervous. I understand the rationale behind the square root, but I’ve got a nagging feeling that there may be some corner cases that aren’t being covered. So I’ll write another test that checks that there are no multiples in any of the prime lists between 2 and 500. (See the TestExhaustive function in Listing 5-8.) The new test passes, and my fears are allayed.

The rest of the code reads pretty nicely. So I think we’re done. The final version is shown in Listings 5-7 and 5-8.


Listing 5-7. PrimeGenerator.cs (final)

///<remark>
/// This class Generates prime numbers up to a user specified
/// maximum. The algorithm used is the Sieve of Eratosthenes.
/// Given an array of integers starting at 2:
/// Find the first uncrossed integer, and cross out all its
/// multiples. Repeat until there are no more multiples
/// in the array.
///</remark>
using System;

public class PrimeGenerator
{
  private static bool[] crossedOut;
  private static int[] result;

  public static int[] GeneratePrimeNumbers(int maxValue)
  {
    if (maxValue < 2)
      return new int[0];
    else
    {
      UncrossIntegersUpTo(maxValue);
      CrossOutMultiples();
      PutUncrossedIntegersIntoResult();
      return result;
    }
  }

  private static void UncrossIntegersUpTo(int maxValue)
  {
    crossedOut = new bool[maxValue + 1];
    for (int i = 2; i < crossedOut.Length; i++)
      crossedOut[i] = false;
  }

  private static void PutUncrossedIntegersIntoResult()
  {
    result = new int[NumberOfUncrossedIntegers()];
    for (int j = 0, i = 2; i < crossedOut.Length; i++)
    {
      if (NotCrossed(i))
        result[j++] = i;
    }
  }

  private static int NumberOfUncrossedIntegers()
  {
    int count = 0;
    for (int i = 2; i < crossedOut.Length; i++)
    {
      if (NotCrossed(i))
        count++; // bump count.
    }
    return count;
  }

  private static void CrossOutMultiples()
  {
    int limit = DetermineIterationLimit();
    for (int i = 2; i <= limit; i++)
    {
      if(NotCrossed(i))
        CrossOutputMultiplesOf(i);
    }
  }

  private static int DetermineIterationLimit()
  {
    // Every multiple in the array has a prime factor that
    // is less than or equal to the root of the array size,
    // so we don't have to cross off multiples of numbers
    // larger than that root.
    double iterationLimit = Math.Sqrt(crossedOut.Length);
    return (int) iterationLimit;
  }

  private static void CrossOutputMultiplesOf(int i)
  {
    for (int multiple = 2*i;
       multiple < crossedOut.Length;
       multiple += i)
      crossedOut[multiple] = true;
  }

  private static bool NotCrossed(int i)
  {
    return crossedOut[i] == false;
  }
}



Listing 5-8. GeneratePrimesTest.cs (final)

using NUnit.Framework;

[TestFixture]
public class GeneratePrimesTest
{
  [Test]
  public void TestPrimes()
  {
    int[] nullArray = PrimeGenerator.GeneratePrimeNumbers(0);
    Assert.AreEqual(nullArray.Length, 0);

    int[] minArray = PrimeGenerator.GeneratePrimeNumbers(2);
    Assert.AreEqual(minArray.Length, 1);
    Assert.AreEqual(minArray[0], 2);

    int[] threeArray = PrimeGenerator.GeneratePrimeNumbers(3);
    Assert.AreEqual(threeArray.Length, 2);
    Assert.AreEqual(threeArray[0], 2);
    Assert.AreEqual(threeArray[1], 3);

    int[] centArray = PrimeGenerator.GeneratePrimeNumbers(100);
    Assert.AreEqual(centArray.Length, 25);
    Assert.AreEqual(centArray[24], 97);
  }

  [Test]
  public void TestExhaustive()
  {
    for (int i = 2; i<500; i++)
      VerifyPrimeList(PrimeGenerator.GeneratePrimeNumbers(i));
  }

  private void VerifyPrimeList(int[] list)
  {
    for (int i=0; i<list.Length; i++)
      VerifyPrime(list[i]);
  }

  private void VerifyPrime(int n)
  {
    for (int factor=2; factor<n; factor++)
      Assert.IsTrue(n%factor != 0);
  }
}


Conclusion

The end result of this program reads much better than it did at the start. It also works a bit better. I’m pretty pleased with the outcome. The program is much easier to understand and is therefore much easier to change. Also, the structure of the program has isolated its parts from one another. This also makes the program much easier to change.

You might be worried that extracting functions that are called only once might adversely affect performance. I think that the increased readability is worth a few extra nanoseconds in most cases. However, there may be deep inner loops where those few nanoseconds will be costly. My advice is to assume that the cost will be negligible and wait to be proved wrong.

Was this worth the time we invested in it? After all, the function worked when we started. I strongly recommend that you always practice such refactoring for every module you write and for every module you maintain. The time investment is very small compared to the effort you’ll be saving yourself and others in the near future.

Refactoring is like cleaning up the kitchen after dinner. The first time you skip cleaning up, you are done with dinner sooner. But the lack of clean dishes and clear working space makes dinner take longer to prepare the next day. This makes you want to skip cleaning again. Indeed, you can always finish dinner faster today if you skip cleaning. But the mess builds and builds. Eventually, you are spending an inordinate amount of time hunting for the right cooking utensils, chiseling the encrusted dried food off the dishes, scrubbing them down so they are suitable to cook with, and so on. Dinner takes forever. Skipping the cleanup does not really make dinner go more quickly.

The goal of refactoring, as depicted in this chapter, is to clean your code every day, every hour, and every minute. We don’t want the mess to build. We don’t want to have to chisel and scrub the encrusted bits that accumulate over time. We want to be able to extend and modify our systems with a minimum of effort. The most important enabler of that ability is the cleanliness of the code.

I can’t stress this enough. All the principles and patterns in this book come to naught if the code they are used within is a mess. Before investing in principles and patterns, invest in clean code.

Bibliography

[Fowler99] Martin Fowler, Refactoring: Improving the Design of Existing Code, Addison-Wesley, 1999.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.22.136