The reality of professional software development is that you mostly work with existing code. The previous chapters had much to say about beginning a new code base, and how to go from zero to a working system as efficiently as possible. Greenfield development comes with its own set of challenges, but they’re different from the problems typically associated with making changes to an existing code base.
You’ll mostly be editing production code. Even if you do test-driven development, you’ll mostly be adding new tests, while you’ll often have to change existing production code.
The process of changing the structure of existing code without changing its behaviour is called refactoring. Other resources[34][53][27] already cover that ground, so I don’t intend to regurgitate that material here. Instead, I’ll focus on how to add new behaviour to a code base.
Informally, I tend to think of addition of behaviour as roughly falling into three buckets:
• Completely new functionality
• Enhancements to existing behaviour
• Bug-fixing
You’ll learn about bug-fixing in chapter 12, while this chapter covers the other two cases. Completely new behaviour is, in many ways, the easiest kind of change to make, so let’s start there.
When your task is to add a completely new feature, most of the code you’ll be writing will be new code; code that you add to the code base, rather than changes made to the existing code base.
Perhaps there’s existing code infrastructure that you can leverage, and perhaps you’ll have to make modifications to it before you can add the new feature, but for the most part, adding a new feature is smooth sailing. The biggest challenge that you’re likely to encounter1 is sticking to the practice of Continuous Integration.
1 Apart from, of course, that the feature itself may be difficult to implement.
As you learned in subsection 9.1.2, as a rule of thumb, you should merge your code with the master branch at least twice a day. In other words, you can, at most, work on something for four hours before you ought to integrate it. What if you can’t complete an entire feature in four hours?
Most people are uncomfortable with merging incomplete features into master, particularly if their team also practices Continuous Deployment. That would imply that incomplete features are deployed to the production system. Surely, that’s undesirable.
The solution is to distinguish between the feature itself and the code that implements it. You can deploy ‘incomplete code’ to your production system, as long as the behaviour that it implements is unavailable. Hide the functionality behind a feature flag[49].
Here’s an example from the restaurant code base. Once I was done with the functionality to make a reservation, I wanted to add a calendar feature to the system. This should enable a client to browse a month or a day to view how many remaining seats are available. This can be used by a user interface to display whether or not a date is even open for additional reservations, and so on.
Adding calendars is a complex undertaking. You need to enable navigation from month to month, calculate the maximum number of remaining seats for a given time slot, and so on. It’s unlikely you can do all of that in four hours; I couldn’t2.
2 If you examine the example code base, you can compare the commit that starts this work with the commit that ends it. Close to two months separate those two! Okay, so in between, I had a four-week summer vacation, did some other work for paying clients, etcetera. By a rough estimate, though, the entire work may still represent between one and two weeks of work. It definitely wasn’t done in four hours!
Before I started this work, the REST API’s ‘home’ resource responded with the JSON representation shown in listing 10.1.
GET / HTTP/1.1 HTTP/1.1 200 OK Content-Type: application/json { "links": [ { "rel": "urn:reservations", "href": "http://localhost:53568/reservations" } ] }
The system is a true RESTful API that uses hypermedia controls (i.e. links)[2] rather than OpenAPI (née Swagger) or the like. A client wishing to make a reservation requests the only documented URL of the API (the ‘home’ resource) and looks for a link with the relationship type "urn:reservations"
. The actual URL should be opaque to the client.
Before I started working on the calendar feature, the code that generated the response in listing 10.1 looked as listing 10.2.
When I started working on the calendar feature, I soon realised that it’d take me more than four hours, so I introduced a feature flag[49]. It enabled me to write the Get
method as shown in listing 10.3.
The enableCalendar
variable is a Boolean value (a flag) that ultimately originates from a configuration file. In the context of listing 10.3, it’s a class field supplied via the Controller’s constructor, as shown in listing 10.4.
The CalendarFlag
class is just a wrapper around a Boolean value. The wrapper is conceptually redundant, but is required because of a technical detail: The built-in ASP.NET Dependency Injection Container is responsible for composing classes with their dependencies, and it refuses to consider a value type3 a dependency. As a workaround for this issue, I introduced the CalendarFlag
wrapper4.
3 In C# known as a struct
.
4 I could live with this workaround because I knew that it was only going to be temporary. Once the feature is fully implemented, you can delete its feature flag. An alternative to introducing wrapper classes for primitive dependencies is to dispense with the built-in Dependency Injection Container altogether. I’d be inclined to do this in a code base if I had to maintain it for years, but I acknowledge that this comes with its own set of advantages and disadvantages. I don’t want to fight that battle here, but you can read how to do that in ASP.NET in Steven van Deursen’s and my book Dependency Injection Principles, Practices, and Patterns[25].
public IActionResult Get() { return Ok(new HomeDto { Links = new[] { CreateReservationsLink() } }); }
public IActionResult Get() { var links = new List<LinkDto>(); links.Add(CreateReservationsLink()); if (enableCalendar) { links.Add(CreateYearLink()); links.Add(CreateMonthLink()); links.Add(CreateDayLink()); } return Ok(new HomeDto { Links = links.ToArray() }); }
private readonly bool enableCalendar; public HomeController(CalendarFlag calendarFlag) { if (calendarFlag is null) throw new ArgumentNullException(nameof(calendarFlag)); enableCalendar = calendarFlag.Enabled; }
When the system starts, it reads various values from its configuration system. It uses those values to configure the appropriate services. Listing 10.5 shows how it reads the EnableCalendar
value and configures the CalendarFlag
‘service’.
var calendarEnabled = new CalendarFlag( Configuration.GetValue<bool>("EnableCalendar")); services.AddSingleton(calendarEnabled);
If the "EnableCalendar"
configuration value is missing, the GetValue
method returns the default value, which for Boolean values in .NET is false
. So I simply didn’t configure the feature, which meant that I could keep merging and deploying to production without exposing that behaviour.
In the automated integration tests, however, I overrode the configuration to turn on the feature. Listing 10.6 shows how. This means that I could still use integration tests to drive the behaviour of the new feature.
protected override void ConfigureWebHost(IWebHostBuilder builder) { if (builder is null) throw new ArgumentNullException(nameof(builder)); builder.ConfigureServices(services => { services.RemoveAll<IReservationsRepository>(); services.AddSingleton<IReservationsRepository>( new FakeDatabase()); services.RemoveAll<CalendarFlag>(); services.AddSingleton(new CalendarFlag(true)); }); }
Additionally, when I wanted to perform some exploratory testing by interacting with the new calendar feature in a more ad-hoc fashion, I could set the "EnableCalendar"
flag to true
in my local configuration file, and the behaviour would also light up.
Once, after weeks of work, I was finally able to complete the feature and turn it on in production. I deleted the CalendarFlag
class. That caused all the conditional code that relied on the flag to no longer compile. After that, it was basically a matter of leaning on the compiler[27] to simplify all the places where the flag was used. Deleting code is always so satisfying, because it means that there’s less code to maintain.
The ‘home’ resource now responds with the output shown in listing 10.7.
In this example, you’ve seen how to use a feature flag to hide a feature until it’s fully implemented. This example is based on a REST API, where it’s easy to hide incomplete behaviour: just don’t surface the new capability as a link. In other types of applications, you could use the flag to hide the corresponding user interface elements, and so on.
GET / HTTP/1.1 HTTP/1.1 200 OK Content-Type: application/json { "links": [ { "rel": "urn:reservations", "href": "http://localhost:53568/reservations" }, { "rel": "urn:year", "href": "http://localhost:53568/calendar/2020" }, { "rel": "urn:month", "href": "http://localhost:53568/calendar/2020/10" }, { "rel": "urn:day", "href": "http://localhost:53568/calendar/2020/10/20" } ] }
When you add a new feature, you can often do that by appending new code to the existing code base. Enhancing existing features is something else.
I once led an effort to refactor toward deeper insight[26]. My colleague and I had identified that the key to implementing a new feature would require changing a fundamental class in our code base.
While such an insight rarely arrives at an opportune time, we wanted to make the change, and our manager allowed it.
A week later, our code still didn’t compile.
I’d hoped that I could make the change to the class in question and then lean on the compiler[27] to identify the call sites that needed modification. The problem was that there was an abundance of compilation errors, and fixing them wasn’t a simple question of search-and-replace.
My manager finally took me aside to let me know that he wasn’t satisfied with the situation. I could only concur.
After a mild dressing down, he allowed me to continue the work, and a few more days of heroic5 effort saw the work completed.
5 To be clear, heroism isn’t an engineering practice. It’s too unpredictable, and also stimulates the development of sunk cost fallacies. Try to do without it.
That’s a failure I don’t intend to repeat.
As Kent Beck puts it:
“for each desired change, make the change easy (warning: this may be hard), then make the easy change”[6]
I did try to make the change easy, but failed to realise just how hard it would be. It doesn’t have to be that hard, though. Follow a simple rule of thumb:
For any significant change, don’t make it in-place; make it side-by-side.
This is also known as the Strangler pattern[35]. Despite its name, it has nothing to do with violence, but is named after the strangler fig, a vine that grows around a ‘host’ tree and over years may strangle it by stealing both light and water. At that time, the vine has itself grown strong enough to support itself. Left is a new, hollow tree approximately the size and shape of the old, dead tree, as illustrated by figure 10.1.
Martin Fowler originally described the pattern in the context of large-scale architecture, as a way to gradually replace a legacy system with a newer system. I’ve found it to be useful at almost any scale.
In object-oriented programming you can apply the pattern both at the method level and the class level. At the method level, you first add a new method, gradually move callers over, and finally delete the old method. At the class level, you first add a new class, gradually move callers over, and finally delete the old class.
You’ll see examples of both, starting at the method level.
When I was implementing the calendar feature discussed in section 10.1, I needed a way to read reservations for multiple dates. The current incarnation of the IReservationsRepository
interface, however, looked as in listing 10.8. The ReadReservations
method took a single DateTime
as input, and returned all the reservations for that date.
I needed a method that would return reservations for a range of dates. Your reaction to such a requirement might be to add a new method overload and leave it at that. Technically, that’s possible, but think of the maintenance tax. When you add more code, you have more code to maintain. An extra method on an interface means that you’ll have to maintain it on all implementers, too.
I’d rather prefer replacing the old ReadReservations
method with a new method. This is possible, because reading reservations for a range of dates instead of a single date actually weakens the preconditions. You can view the current method as a special case, where the range is just a single date.
public interface IReservationsRepository { Task Create(Reservation reservation); Task<IReadOnlyCollection<Reservation>> ReadReservations( DateTime dateTime); Task<Reservation?> ReadReservation(Guid id); Task Update(Reservation reservation); Task Delete(Guid id); }
If much of your code already calls the current method, however, making the change in one fell swoop might be overreaching. Instead, add the new method first, gradually migrate call sites, and finally delete the old method. Listing 10.9 shows the IReservationsRepository
interface with the new method added.
When you add a new method like that, the code fails to compile until you add it to all classes that implement the interface. The restaurant reservation code base only has two implementers: SqlReservationsRepository
and FakeDatabase
. I added the implementation to both classes in the same commit, but that’s all I had to do. Even with the SQL implementation, that represents perhaps 5-10 minutes of work.
Alternatively, I could also have added the new ReadReservations
overload to both SqlReservationsRepository
and FakeDatabase
, but left them throwing a NotImplementedException
. Then, in following commits, I could have used test-driven development to flush out the desired behaviour. At every point during this process, I’d have a set of commits that I could merge with master.
Yet another option would be to first add methods with identical signatures to the concrete classes, and only after all those are in place, add the method to the interface.
public interface IReservationsRepository { Task Create(Reservation reservation); Task<IReadOnlyCollection<Reservation>> ReadReservations( DateTime dateTime); Task<IReadOnlyCollection<Reservation>> ReadReservations( DateTime min, DateTime max); Task<Reservation?> ReadReservation(Guid id); Task Update(Reservation reservation); Task Delete(Guid id); }
In any case, you can incrementally develop the new method, because at this point, no code is using it.
When the new method is firmly in place, you can edit the call sites, one at a time. In this way, you can take as much time as you need. You can merge with master at any time during this process, even if that means deploying to production. Listing 10.10 shows a code fragment that now calls the new overload.
var min = res.At.Date; var max = min.AddDays(1).AddTicks(-1); var reservations = await Repository .ReadReservations(min, max) .ConfigureAwait(false);
I changed the calling code one call site at a time, and committed to Git after each change. After a few commits, I was done; there were no more code calling the original ReadReservations
method.
Finally, I could delete the original ReadReservations
method, leaving the IReservationsRepository
interface as shown in listing 10.11.
When you delete a method from an interface, remember to also remove it from all implementing classes. The compiler isn’t going to complain if you let them stay, but that’s a maintenance burden you don’t need to take on.
You can also apply the Strangler pattern on the class level. If you have a class that you’d like to refactor, but you’re concerned that it’ll take too long to change it in place, you can add a new class, move callers over one by one, and finally delete the old class.
You can find a few examples of that in the online restaurant reservation code base. In one case, I found that I’d over-engineered a feature6. I needed to model the allocation of reservations to tables at a given time, so I’d added a generic Occurrence<T>
class that could associate any type of object with a time. Listing 10.12 shows its constructor and properties to give you a sense of it.
6 Yes, even when I try my best to follow all practices that I present in this book, I, too, err. Despite admonitions to do the simplest thing that could possibly work[22], I occasionally make things too complicated because ‘I’m certainly going to need it later’. Hitting yourself over the head for your errors, however, isn’t productive. When you realise your mistake, just acknowledge and correct it.
public interface IReservationsRepository { Task Create(Reservation reservation); Task<IReadOnlyCollection<Reservation>> ReadReservations( DateTime min, DateTime max); Task<Reservation?> ReadReservation(Guid id); Task Update(Reservation reservation); Task Delete(Guid id); }
After I’d implemented the features where I needed the Occurrence<T>
class, I realised that I didn’t really need it to be generic. All the code that used the object contained a collection of tables with associated reservations.
Generics do make code slightly more complex. While I find them useful in the right circumstance, they also make things more abstract. For example, I had a method with the signature shown in listing 10.13.
Consider the advice from subsection 8.1.5. By looking at the types, can you figure out what the Schedule
method does? How do you think about a type like IEnumerable<Occurrence<IEnumerable<Table>>>
?
Wouldn’t the method be easier to understand if it had the signature shown in listing 10.14?
IEnumerable<TimeSlot>
seems like a more palatable return type, so I wanted to refactor from the Occurrence<T>
class to such a TimeSlot
class.
There was already enough code that used Occurrence<T>
that I didn’t feel comfortable that I could perform such a refactoring in a brief enough time span. Instead, I decided to use the Strangler pattern: first add the new TimeSlot
class, then migrate callers one by one, and finally delete the Occurrence<T>
class.
public Occurrence(DateTime at, T value) { At = at; Value = value; } public DateTime At { get; } public T Value { get; }
public IEnumerable<Occurrence<IEnumerable<Table>>> Schedule(
IEnumerable<Reservation> reservations)
public IEnumerable<TimeSlot> Schedule(
IEnumerable<Reservation> reservations)
I first added the TimeSlot
class to the code base. Listing 10.15 shows its constructor and properties so that you can get a sense of how it looks.
public TimeSlot(DateTime at, IReadOnlyCollection<Table> tables) { At = at; Tables = tables; } public DateTime At { get; } public IReadOnlyCollection<Table> Tables { get; }
As soon as I’d added this class I could commit it to Git and merge it with the master branch. That didn’t break any functionality.
I could then start to migrate code from using Occurrence<T>
to use TimeSlot
. I started with some helper methods, like the one shown in listing 10.16.
private TimeDto MakeEntry(Occurrence<IEnumerable<Table>> occurrence)
Instead of taking an Occurrence<IEnumerable<Table>>
parameter I wanted to change it to take a TimeSlot
parameter, as shown in listing 10.17.
private static TimeDto MakeEntry(TimeSlot timeSlot)
The code that called this MakeEntry
helper method was itself a helper method that received an IEnumerable<Occurrence<IEnumerable<Table>>>
argument, and I wanted to gradually migrate callers. I realised that I could do that if I added the temporary conversion method in listing 10.18. This method supports the conversion between the old class and the new class. Once I completed the Strangler migration I deleted it together with the class itself.
internal static TimeSlot ToTimeSlot( this Occurrence<IEnumerable<Table>> source) { return new TimeSlot(source.At, source.Value.ToList()); }
I also had to migrate the Schedule
method in listing 10.13 to the version in listing 10.14. Since I had multiple callers, I wanted to migrate each caller separately, committing to Git between each change. This meant that I needed the two versions of Schedule
to exist side by side for a limited time. That’s not strictly possible because they differ only in their return type, and C# doesn’t support return-type overloading.
To get around that issue, I first used the Rename Method[34] refactoring to rename the original Schedule
method to ScheduleOcc
7. I then copied and pasted it, changed its return type and changed the new method’s name back to Schedule
. I now had the original method called ScheduleOcc
and the new method with a better return type, but no callers. Again, this is a point where you can commit your changes and merge with master.
7 Occ for Occurrence.
With two methods, I could now migrate callers one at a time, checking my changes into Git for each method. Again, this is work that you can do gradually without getting in the way of other work that you or your team mates perform. Once all callers called the new Schedule
method, I deleted the ScheduleOcc
method.
The Schedule
method wasn’t the only method that returned data that used Occurrence<T>
, but I could migrate the other methods to TimeSlot
using the same technique.
When I finally had completed the migration, I deleted the Occurrence<T>
class, including the conversion helper method in listing 10.18.
During this process, I was never more than five minutes from being able to do a commit, and all commits left the system in a consistent state that could be integrated and deployed.
Do yourself a favour: Read the Semantic Versioning specification[82]. Yes, all of it. It takes less than fifteen minutes. In short, it uses a major.minor.patch scheme. You only increment the major version when you introduce breaking changes; incrementing the minor version indicates the introduction of a new feature, and a patch version incrementation indicates a bug fix.
Even if you don’t decide to adopt Semantic Versioning, I believe that it’ll help you think more clearly about breaking and non-breaking changes.
If you’re developing and maintaining a monolithic application with no API, breaking changes may not matter, but as soon as other code depends on your code, it does.
This is true regardless of where that depending code lives. Obviously, backwards compatibility is crucial if you have external, paying customers who depend on your API. But even if the system that depends on your code is ‘just’ another code base in your organisation, it still pays to think about compatibility.
Every time you break compatibility, you’ll need to coordinate with your callers. Sometimes, this happens reactively, as in “your latest change broke our code!” It’d be better if you can give clients advance warning.
Things run smoother, though, if you can avoid breaking changes. In Semantic Versioning, this means staying on the same major version for a long time. This may take a little time getting used to.
I once maintained an open-source library that stayed on major version 3 for more than four years! The last version 3 release was 3.51.0. Apparently, we added 51 new features during those four years, but since we didn’t break compatibility, we didn’t increment the major version.
If you must break compatibility, be deliberate about it. If you can, warn users in advance. Consider the hierarchy of communication discussed in subsection 8.1.7 to figure out which communications channel will work best.
For example, some languages enable you to deprecate methods with an annotation. In .NET this is called [Obsolete]
, in Java @Deprecated
. Listing 10.19 shows an example. This’ll cause the C# compiler to emit a compiler warning for all code that calls that method.
[Obsolete("Use Get method with restaurant ID.")] [HttpGet("calendar/{year}/{month}")] public Task<ActionResult> LegacyGet(int year, int month)
If you realise that you must break compatibility, consider if you can bundle more than one breaking change into a single release. This isn’t always a good idea, but it sometimes can be. Every time you introduce a breaking change, you force client developers to deal with it. If you have multiple smaller breaking changes, it might make client developers’ lives easier if you bundle them into a single release.
On the other hand, it’s probably not a good idea to release multiple breaking changes if each of them forces client developers into massive rework. Exercise some judgment; this is, after all, the art of software engineering.
You work in existing code bases. As you add new features, or enhance the ones already there, or fix bugs, you make changes to existing code. Take care that you do so in small steps.
If you’re working on a feature that takes a long time to implement, it might be tempting to develop it on a feature branch. Don’t do that; it’ll lead to merge hell. Instead, hide the feature behind a feature flag and integrate often[49].
When you want to make a sizeable refactoring, consider using the Strangler pattern. Instead of performing the edit in situ, change the code by letting the new and the old ways co-exist for a while. This enables you to gradually migrate callers a little at a time. You can even do this as a maintenance task that you interleave with other work. Only when the migration is complete you delete the old method or class.
If the method or class is part of a published object-oriented API, deleting a method or class may constitute a breaking change. In such a case, you’ll need to explicitly consider versioning. First deprecate the old API to warn users about the impending change, and only delete the deprecated API when releasing a new major version.
3.146.105.137