C H A P T E R  13

image

Legacy Code

Back in 2004 Robert C. Martin mercilessly wrote that “software systems almost always degrade into a mess. What starts as a clean crystalline design in the minds of the programmers rots, over time, like a piece of bad meat.” [LEGACY] We would bet that anyone reading this book has faced this situation during her life as a developer. Our best intentions are to build a perfect code base, but time constraints and the tendency to entropy claim their fee. We have always tried to blame this on the customers and their senseless will to keep changing requirements, but the reality is that we don't have to change customers as much as we have to change ourselves.

Reality being what it is and not what we would like it to be, if customers need to change their requirements we better upgrade our code and manage software projects in a way more closely matching this fact. Developers who best meet that challenge while preserving the underlying quality of their code will be able to better position themselves in the market, slowly overwhelming developers still sticking to the old way of blaming software projects and their stakeholders for changing.

The key to embracing changes and indeed to seeing them as an opportunity to improve the system is to avoid writing stiff code that doesn't tolerate changing requirements. This is the real goal of every software project. This book shows you many techniques to improve the quality of your design step by step by changing the code, with no effect on external functionality. By the way, solutions never come too early, so everyone has come to some nasty unmanageable code in his life. Furthermore, even the most committed developer will need time to master the refactoring techniques, and thus he will keep writing some ugly code in the meantime.

That ugly code has many names. In this book we will call it legacy code. Legacy code is what you get when for some reason you haven't been able to fight entropy. Legacy code is something you cannot get rid of in a single refactoring step or by some other sort of magic. For this reason, we decided to dedicate this last part of our book to some higher-level ways to distill a better structure from your old or new legacy code. We hope it will help those PHP developers out there who are still working on old applications but are willing to join the wonders of well-structured and safe programming. In this chapter we will introduce a simple application built with a procedural style, and in the following chapters we will show you how to refactor it towards a better architecture.

Ugly Code

This section will be based on a sample application Francesco Trucchia created for one of his workshops on refactoring techniques. It's nothing more than a system to manage a contact list: you can list, add, edit, and delete names, phone numbers, and mobile phone numbers, as shown in Figures 13-1 through 13-3.

image

Figure 13-1. The list view of our application

image

Figure 13-2. The contact creation form of our application

image

Figure 13-3. The form to edit a contact is the same as that in the contact creation screen, but filled with editable data.

We kept its code intentionally ugly. Some of you reading the code in the next chapters might be amazed by what we are calling ugly and argue that the principles it seems written on are the same applied by many open-source and well-known PHP applications out there. We agree—that's true. Right now we prefer not to explain why we think that's ugly code—we will return to that later in this chapter. Now it's time to check our code.

The application is built mainly upon a set of nine files, each focusing on a single task and including code belonging to every logical layer, from database access to HTML code rendering through SQL and plain PHP logic. Those files are the following:

  • config.php
  • edit.php
  • footer.php
  • _form.php
  • functions.php
  • header.php
  • index.php
  • new.php
  • remove.php

We could be tempted to point out here a well-known web application pattern called page controller, according to which we could have one path leading to a file for each single logical page request to handle. Though this would be a simple but well-structured way to organize our application, we must avoid getting misled by the outer appearance of that code tree. Evil lies beneath the surface, and to better understand this, let's have a look at the files.

index.php

This is the main application access file. It is responsible for contact retrieval and printing.

<?php
include_once('config.php'),

$db = @mysql_connect($database['host'], $database['username'], $database['password']) orimage
die('Can't connect to database'),
@mysql_select_db($database['name']) or die('The database selected does not exists'),

$query = 'SELECT * FROM contacts ORDER BY lastname';
$rs = mysql_query($query);

if (!$rs)
{
  die_with_error(mysql_error(), $query);
}

$num = mysql_num_rows($rs);

?>

<?php include_once('header.php') ?>

<div class="actions">
  <a href="new.php">New contact</a>
</div>

<?php if ($num) : ?>
  <table border="1" cellspacing="0" cellpadding="5">
  <tr>
    <th>Last Name</th>
    <th>First Name</th>
    <th>Phone</th>
    <th>Mobile</th>
    <th>&nbsp;</th>
  </tr>
  <?php while($row = mysql_fetch_assoc($rs)) : ?>
    <tr>
      <td><a href="edit.php?id=<?php echo $row['id']?>" title="Edit"><?php echo image
$row['lastname']?></a></td>
      <td><?php echo $row['firstname']?></a></td>
      <td><a href="callto://<?php echo $row['phone']?>"><?php echo $row['phone']?></a></td>
      <td><a href="callto://<?php echo $row['mobile']?>"><?php echo $row['mobile']?></a></td>
      <td>[<a href="remove.php?id=<?php echo $row['id']?>" title="Delete" onclick="if image
(confirm('Are you sure?')) {return true;} return false;">X</a>]</td>
    </tr>
  <?php endwhile;?>
</table>


<?php else : ?>
  Database is empty
<?php endif ?>

<?php include_once('footer.php') ?>

<?php
  mysql_free_result($rs);
  mysql_close($db);
?>

We start by including config.php, a system configuration file. Then we connect to a MySQL database and execute the query to retrieve all contacts ordered by lastname. After having included the header.php template file, we print some additional HTML code and perform a conditional branching based on the amount of records returned by the query. In the case of a non-empty record set, we draw a table iterating over the recordset. Thereafter we include another template file called footer.php, and then we close the MySQL connection.

Even at a first glance you may well see how data-oriented code is interleaved with business logic code and template view code. In spite of being a small file it features an amazingly tangled portion of code, where database queries are next to HTML code. We know that separating code into layers is one of the most widely acknowledged ways to keep code readable and maintainable. In this case we are completely missing this goal.

config.php

This is the main application configuration file. It is responsible for setting database access parameters. It also takes care of including a mandatory application specific library.

<?php

$database['host'] = 'localhost';
$database['username'] = 'root';
$database['password'] = null;
$database['name'] = 'contacts';

require_once('functions.php'),

?>
functions.php

This is the main application library file. It is responsible for the definition of a couple of functions used throughout the application.

<?php

/**
* Validate form
*
* @param array $mandatory_fields
* @param array $fields
* @return array
*/
function validate($mandatory_fields, $fields)
{
  $errors = array();

  foreach ($mandatory_fields as $field)
  {
    if($fields[$field] == '')
    {
      $errors[] = 'The ' . $field . ' field is mandatory';
    }
  }

  return $errors;
}

function die_with_error($error_msg, $query)
{
  $message  = 'Invalid query: ' . $error_msg. " ";
  $message .= 'Whole query: ' . $query;
  die($message);
}

?>

We define a validate() function that will be used by files requiring the validation of values coming from a form. Then we define an SQL error management function called die_with_error().

header.php

This is the main template file. It is responsible for the HTML code to be printed before our main page content.

<html>
  <head>
    <title>Contacts Book</title>

    <style>
    body{
      font-family: Helvetica;
      font-size: 14px;
    }

    label{
      display: block;
      background-color: #ddd;
      margin: 5px 0px;
      padding: 4px;
      font-weight: bold;
      width: 250px;
    }

    form{
      margin-left: 10px;
    }

    input[type="text"]{
width: 250px;
    }

    .actions{
      margin-bottom: 10px;
    }

    #footer{
      padding: 5px;
      border-top: 1px solid #000;
      margin-top: 10px;
    }
    </style>
  </head>

  <body>

  <div id="header">
    <h1>Contacts Book</h1>
  </div>

<div id="content">

No PHP code or output buffering here: this HTML code gets sent to the client as soon as the header.php file is included.

footer.php

This is the footer template file. It is responsible for printing the HTML after our main page content.

</div>

    <div id="footer">
    All &copy; Francesco Trucchia & Jacopo Romei - Pro PHP Refactoring
    </div>
  </body>
</html>

The HTML code printed here must closely match the HTML code featured in header.php. Any mismatch could make the overall HTML code not well formed, thus leading to rendering errors in the browser and, at least, to invalid HTML code.

new.php

This is the new contact page. It is responsible for the creation of a new contact.

<?php
include_once('config.php'),

if($_SERVER['REQUEST_METHOD'] == 'POST')
{
  $errors = validate(array('firstname', 'lastname', 'phone'), $_POST);

  if(count($errors) == 0)
  {
    $db = @mysql_connect($database['host'], $database['username'], $database['password']) or image
die('Can't connect do database'),
    @mysql_select_db($database['name']) or die('The database selected does not exists'),

    $query = sprintf("INSERT INTO contacts (firstname, lastname, phone, mobile) VALUES ('%s', image
'%s', '%s', '%s')",
                       mysql_real_escape_string($_POST['firstname']),
                       mysql_real_escape_string($_POST['lastname']),
                       mysql_real_escape_string($_POST['phone']),
                       mysql_real_escape_string($_POST['mobile'])
                      );

    $rs = mysql_query($query);

    if (!$rs)
    {
      die_with_error(mysql_error(), $query);
    }

    mysql_close($db);

    header('Location: index.php'),

  }
}
?>

<?php include_once('header.php') ?>

<?php include_once('_form.php') ?>
<?php include_once('footer.php') ?>

We start by including the configuration file and, after checking the request HTTP method and form parameter validity, we perform a MySQL connection and an INSERT query. After a successful query, since we haven't output any HTML code yet, we can redirect the client to index.php, the contact list where we will see the contact we just created. At the end of the file we manage the case of a GET request by including three template files to render a new contact form.

edit.php

This is the edit contact page. It is responsible for the editing of an existing contact.

<?php

include_once('config.php'),

if(!$_GET['id'])
{
die('Some error occured!!'),
}

$db = @mysql_connect($database['host'], $database['username'], $database['password']) or image
die('Can't connect do database'),
@mysql_select_db($database['name']) or die('The database selected does not exists'),
if($_SERVER['REQUEST_METHOD'] == 'POST')
{
  $errors = validate(array('id', 'firstname', 'lastname', 'phone'), $_POST);

  if(count($errors) == 0)
  {
    $query = sprintf("UPDATE contacts set firstname = '%s',
                                                                     lastname = '%s',
                                                                     phone = '%s',
                                                                     mobile = '%s' WHERE id = %s",
                       mysql_real_escape_string($_POST['firstname']),
                       mysql_real_escape_string($_POST['lastname']),
                       mysql_real_escape_string($_POST['phone']),
                       mysql_real_escape_string($_POST['mobile']),
                       mysql_real_escape_string($_POST['id'])
                      );

    $rs = mysql_query($query);

    if (!$rs)
    {
      die_with_error(mysql_error(), $query);
    }

    header('Location: index.php'),
  }
}
else
{
  $query = sprintf('SELECT * FROM contacts WHERE id = %s', mysql_real_escape_string($_GET['id']));

  $rs = mysql_query($query);

  if (!$rs)
  {
    die_with_error(mysql_error(), $query);
  }

  $row = mysql_fetch_assoc($rs);

  $_POST['id'] = $row['id'];
  $_POST['firstname'] = $row['firstname'];
  $_POST['lastname'] = $row['lastname'];
  $_POST['phone'] = $row['phone'];
  $_POST['mobile'] = $row['mobile'];
}

mysql_close($db);

?>

<?php include_once('header.php') ?>

<?php include_once('_form.php') ?>

<?php include_once('footer.php') ?>

We are now used to this: we start by including the configuration file. We connect to the MySQL database and then we perform actions similar to those we found in new.php: in the case of a POST request we perform the data validation, the query to update the right record, and the client redirection to index.php; in the case of a GET request or invalid POST parameters we render an HTML form loaded with default values coming from the requested record.

_form.php

This is the main contact form file. It is responsible for printing the form to create or edit a contact.

<form method="post">

<?php if (isset($errors) and count($errors)) : ?>
  <ul class="errors">
  <?php foreach ($errors as $error) : ?>
    <li><?php echo $error ?></li>
  <?php endforeach;?>
  </ul>
<?php endif ?>

<input type="hidden" name="id" value="<?php echo $_POST['id']?>" />

<label for="firstname">First Name*</label>
<input type="text" id="firstname" name="firstname" value="<?php echo $_POST['firstname'] ?>" />

<label for="lastname">Last Name*</label>
<input type="text" id="lastname" name="lastname" value="<?php echo $_POST['lastname'] ?>" />

<label for="phone">Phone*</label>
<input type="text" id="phone" name="phone" value="<?php echo $_POST['phone'] ?>" />

<label for="mobile">Mobile</label>
<input type="text" id="mobile" name="mobile" value="<?php echo $_POST['mobile'] ?>" />

<br/><br/>
<input type="submit" value="Save" />
<a href="index.php" >Cancel</a>
</form>
<em>(* Mandatory fields)</em>
remove.php

This is the remove contact file. It is responsible for contact removal.

<?php
include_once('config.php'),

if(!$_GET['id'])
{
die('Some error occured!!'),
}

$db = @mysql_connect($database['host'], $database['username'], $database['password']) or image
die('Can't connect do database'),
@mysql_select_db($database['name']) or die('The database selected does not exists'),
$query = sprintf('DELETE FROM contacts where ID = %s',
                 mysql_real_escape_string($_GET['id']));

if(!mysql_query($query))
{
  die_with_error(mysql_error(), $query);
}

mysql_close($db);

header('Location: index.php'),

?>

Also here, for the last time, we include the configuration file named config.php. We check whether the ID of the record we want to delete is defined, we connect to our MySQL database, and we perform the record deletion query before redirecting the client to index.php or dying because of a MySQL error.

Maintenance

This application is very hard to maintain. There is no separation of layers and no abstraction at all. Because the code is affected by a strong entanglement between business logic code, workflow, data-oriented code, and template rendering, it makes any change request a nightmare for the developer. We are not talking about new features but just about maintenance of this code. Let's see a few examples.

Example: SQL injection

After launching our Contacts Book application, we realize we are vulnerable to SQL injection attacks. Just by requesting this URL, we could delete all the data in our database:

http://<hostname>/remove.php?id=3%20or%20id%20is%20not%20null (or just http://<hostname>/remove.php?id=3 or id is not null on modern browsers).

In the remove.php file we execute the following code:

$query = sprintf('DELETE FROM contacts where ID = %s', mysql_real_escape_string($_GET['id']));

This lets us inject some foreign SQL by means of the $_GET['id'] variable. Then, requesting the preceding URL, our code would execute the following query:

DELETE FROM contacts where ID = 3 or id is not null;

This has easy-to-understand and ruinous consequences. To make things worse, it looks like we are exposing many other similar security holes. In edit.php we could perform the same attack by exploiting the $_POST['id'] variable.

Please don't think it's just a simple security hole that no serious PHP developer would introduce. It may be, but the point here is that with this application structure we cannot manage this maintenance issue in a single, clear, and well-defined spot. When called to fix such a simple security bug, we have to arrange a solution scattered throughout the code tree. This example application is obviously small and likely not worth this worry, but how would you manage the same issue if it featured hundreds of pages?

Example: Database Portability

One year after the first public launch of our Contacts Book application the managers made a request: the application must be ported to some proprietary database technology due to both new partnerships and sponsorships. Good for the business, but how good for the developers? In our application the code managing our database connections and queries is closely tangled with the rest of the PHP code and completely MySQL specific.

Please note we are not simply advocating the use of an abstraction layer here, as you may think. The real problem is having no single focal point to manage DB-oriented code. We could even agree on using native DB libraries offered by PHP, but the way our application is structured now forces us to cope with lots of code duplication and lots of chaos: what if our application were a real one with hundreds of files needing a connection to the DB? Wouldn't it be insane to go through all of them replacing the mysql_connect() call with the new one?

Many other examples could show you how hard the maintenance of an application like this would be. What if we need to change some HTTP header depending on the output of code executed after the header.php file is included, preventing us from sending further HTTP headers? What if we need an error log, considering we manage errors only by means of conditional logic, and thus only errors we thought of well in advance and not those emerging from the complexity of our application?

Keeping maintenance simple is the key to keeping software cheap. Keeping software cheap makes us successful and desirable developers, providing our customers with winning solutions, and no compromise on quality.

New Features

Things can get awkward when it comes to new features, too. A good design paves the way for new features to be put in place quickly and cheaply. Here we will provide just a couple of brief examples, but other similar situations are easy to imagine.

Dynamic Layouts

Our application grew up strong and full of data. Thousands of contacts populate our database and the list shown in index.php is very long now. The customer, together with her usability experts, decide to ask the development team for a few user-centered features: a box in every page containing the form to create a new contact without loading a whole new page by means of an AJAX asynchronous request, and another box showing the ten most recently added contacts.

From a customer point of view it should be no big deal, and you have to admit all the logic required is there already: we have a form, contact creation business logic, and a list template and a query to retrieve lists of contacts. Well, actually the listing query should be tweaked somehow to make it ordered by record creation time and limited to ten records, but we know it's absolutely something you cannot tell the customer it's hard to do. The truth is different, though.

We have a big problem here: a simple problem gets complicated because of our own code. Portions of our existing code are not suitable to be used out of their own context. We cannot just include the new.php file in a div tag in the header.php file to get a working new contact form in every page, since the new.php execution brings another HTML layout in with itself. If we were to include just the _form.php template we would miss the right action attribute and some server-side action would have to be performed to return a clean AJAX-oriented output.

Concerning the list of the ten most recently added contacts, we have similar issues to cope with. As we noted already, we have a query to retrieve lists of contacts in index.php:

$query = 'SELECT * FROM contacts ORDER BY lastname';

The first problem here is about query reuse. Though we could just append the needed LIMIT clause, being the query written in native SQL there is no way to edit the ORDER BY clause if we don't want to use some awkward string replacement routine. The worst has yet to come: that query is buried deep into the index.php file, once again tainted by the whole HTML page layout. So we have no ready-to-use query to base our new feature upon, leading to higher development costs.

Internationalization

At first we planned for a local user community and we didn't need an internationalization framework. Then our customer discovered her own web site had grown outside of national boundaries, and now she wants it to be capable of speaking its user's language. She plans a meeting with the development team and happily announces a translation team will be providing the developers with a folder of well-structured files with translations of every string featured on the web site for each planned new language, according to the format that the developers judge best to use. We get caught red-handed here. All in all the customer did her homework: she didn't ask the developers to do something other than developing, she kept her options open by making the translation team deliver the best material to be used by the developers, and she didn't even ask for the developers to extract all the strings used in the user interface.

The reason the upgrade of the system to feature translations seems so expensive lies on the design side. The missing separation of layers makes it very hard to spot all the user interface strings used across the application, and even the most patient developer who spotted all of them would have a hard time injecting the translation behavior with no loss in readability and no risk of side effects. With a real layered structure, tasks like this get a lot easier and coding new features gets lots cheaper.

Break the Cycle

Changing existing code always costs something. Introducing unintended changes along with desired ones costs a lot more. Most development teams facing the need for a change in code will react in a conservative way, in a struggle to minimize change. All in all, if it works, why change it? Consequently, new features are just plainly added as new lines right into the methods or functions that apparently need them, involving fewer changes and less uncertainty.

As time passes, we get ever-worse code, and we are progressively less confident in the changes you are requested to perform to maintain or upgrade the code base. The less confident we are, the less we are incentivized to improve our design, since it requires the code to be changed. This endless loop diverges to the point where a rewrite from scratch is needed, and we lose all the value injected for months or even years in the software we are about to throw away. It's an unsustainable damage for most companies.

OK, we know that avoiding change is bad, but how can we avoid the fear of change? Working with more attention, though considered a must for every professional, may be not enough. It would be like asking a free climber to be more cautious as a security policy instead of making climbing harnesses mandatory. To remove fear, we need a safety net. A safety net is something that leaves you free to move but gently saves you in case something goes wrong. What kind of safety net is available to us?

We already learned about a safety net tool while reading this book: automated tests. We can cover our legacy code with tests, and when we have a good set of tests wrapping our system, we can make changes and quickly find bad effects caused by our changes. We still preserve our care, due to the customer as professionals, but with the quick and complete feedback we get by means of tests, we are able to make changes faster and more carefully. When we are equipped with such a safety net, the fear of change starts to fade away, motivating us to transform legacy code into a maintainable code base able to host new features very cheaply.

As long as we define legacy code as the code that becomes unmaintainable with time, we can now come to the conclusion that with adequate test coverage, legacy code doesn't deserve the “legacy” attribute anymore. Legacy code exists only where an adequate set of tests doesn't. Those kinds of tests are called regression tests.

Summary

In this chapter we introduced a legacy application and defined what we mean by legacy code. We learned how legacy code is created by the absence of proper test coverage, and how, from a certain perspective, an apparently well-structured software application can be considered low-quality. In the next chapter, we will show you how to face the big challenge of making a good application from a poor one. Keep reading!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.31.222