Chapter 1. Software Efficiency Matters

The primary task of software engineers is the cost-effective development of maintainable and useful software.

Jon Louis Bentley, Writing Efficient Programs (1982)

Even after 40 years, Jon’s definition of development is fairly accurate. The ultimate goal for any engineer is to create a useful product that can sustain user needs for the product lifetime. Unfortunately, nowadays, not every developer realizes the significance of the software cost. The truth can be brutal; stating that the development process can be expensive might be an underestimation. For instance, it took five years and 250 engineers for Rockstar to develop a popular Grand Theft Auto 5 video game, which was estimated to cost $137.5 million. On the other hand, to create a usable, commercialized operating system, Apple had to spend way over $500 million before the first release of macOS in 2001.

Because of the high cost, when producing software, it’s crucial to focus our efforts on things that matter the most. Ideally, we don’t want to waste engineering time and energy on unnecessary actions, like spending weeks on subjective code refactoring that does not reduce code complexity or deep micro-optimizations of the function executed rarely. The industry is continually inventing new patterns to pursue an efficient development process. Agile, Kanban methods allowing to adapt to ever-changing requirements, specialized programming languages for mobile platforms like Kotlin, or frameworks for building websites like React are only some examples. Engineers innovate in those fields because every inefficiency increases the cost.

What makes it even more difficult is that when developing software now, we should also be aware of the future costs. Running and maintenance cost is typically higher than the initial development. Code changes to stay competitive, bug fixing, incidents, installations, or even electricity consumed are only a few examples of Total software Cost of Ownership (TOC) we, developers, tend to forget. Agile methodologies help to reveal this cost early by releasing software often and getting feedback sooner. Unfortunately, because of development complexity, it’s a prevalent mistake to focus on the short-term goal of publishing the software quicker. Many dangerous shortcuts naturally come to our minds. Skipping testing parts, not investing enough in security, documentation, clean code, or code performance might indicate smart development efficiency and pragmatism. The problem is that if taken too far, they can cause an enormous cost, ranging from your product not being useful on the market to extremes like taking airplanes down1.

At this point, based on the title “Efficient Go”, you might be asking how this book will motivate you to spend more of your precious time on software execution performance characteristics like speed or efficiency. If we, software creators, should care, as Jon wrote, about development cost-effectiveness, why not focus purely on the bare minimum needed for the software to work? Waiting a few seconds more for application execution never killed anyone. On top of that, the hardware is getting cheaper and faster every month. In 2021, it’s not difficult to buy a smartphone with a dozen GBs of RAM. Finger-sized, 2TB SSD disks capable of 7 GB/s read and write throughput are available. Even home PC workstations are hitting never before seen performance scores. With0 8 CPUs or more that can perform billions of cycles per second each and with 2TB of RAM we can compute things fast. After all, improving performance on the software level alone is a complicated topic. Especially when you are new, it is common to lose time optimizing without significant program speedups. And even if we start caring about the latency introduced by our code, things like Java Virtual Machine or Go compiler will apply their optimizations anyway. Overall, spending more time on something tricky like performance that can also sacrifice our code’s reliability and maintainability may sound like a bad idea. These are only a few of the numerous reasons why typically engineers put performance optimizations on the lowest position of the development priority list, far in the outskirts of the mentioned software bare minimum.

Unfortunately, as with everything extreme, there is a risk in such performance deprioritization. In essence, there is a difference between consciously postponing optimizations and making silly mistakes causing inefficiencies and slowdowns. However, don’t be worried! I will not try to convince you in this book that all of this is wrong, and you should now measure the number of nanoseconds each code line introduces or how many bits it allocates in memory before putting it in your software. You should not. I am far from trying to motivate you to put a performance on the top of your development priority list. Instead, I would like to propose a subtle but essential change to how we, software engineers, should think about application performance. It will allow you to bring small but effective habits to your programming and development management cycle. Based on data and as early as possible in the development cycle, you will learn how to tell when you can safely ignore or postpone program inefficiencies. And when you can’t afford to skip performance optimizations, where and how to apply them effectively, and when to stop.

Machines have become increasingly cheap compared to people; any discussion of computer efficiency that fails to take this into account is short-sighted. “Efficiency” involves the reduction of overall cost - not just machine time over the life of the program, but also time spent by the programmer and by the users of the program.

Brian W. Kernighan and P. J. Plauger, The Elements of Programming Style (1978)

In “Motivation For This Book” you will learn what made me decide not to treat performance optimizations as an unnecessary evil. In “Behind Performance” we will unpack the word performance and learn how it is related to efficiency in this book’s title. Through “Common Performance Misconceptions” to “Be Vigilant to Simplifications”, we will challenge five serious misconceptions around efficiency and performance, often descoping such work from developer minds. You will learn that thinking about efficiency is not reserved only for “high performance” software. Finally, in “Efficiency: The Key to Pragmatic Code Performance” I will teach you why particularly efficiency will allow us to think about performance optimizations effectively without sacrificing time and other software qualities. This chapter might feel theoretical, but trust me. Those insights will train your essential programming judgment on how and if to adopt particular efficiency optimizations, algorithms, and code improvements presented in Part 3 of this book. This chapter is also fully language agnostic, so it should be practical for non-Go developers too!

Motivation For This Book

When writing this book, I was 29 years old. That might not feel like much experience, but I started full-time, professional programming when I was 19. I did full-time Computer Science studies in parallel to work at Intel around infrastructure. Initially coding in Python and C++, then jumping into Go (due to Kubernetes hype) for most of those years. I wrote or reviewed tens of thousands of code lines for various software that had to run on production, be reliable, and scale. From around 2017, in London, UK, I was lucky enough to develop primarily open-source software in various projects written in Go. This includes a popular time-series database for monitoring purposes called Prometheus. I also had an opportunity to co-create a large distributed system project called Thanos. I would not be surprised if my code is running somewhere in your company infrastructure too!

I am grateful for those open-source opportunities. If not those, most likely, I would not decide to write this book. The reason is that nothing taught me as much about software development as doing this in the open. You interact with diverse people, from different places worldwide, with different backgrounds, goals, and needs. It’s sometimes challenging. It would probably be easier to stick to working only with ex-Google, ex-Facebook, ex-Amazon like I did before. However, I was always motivated to look around more and see the bigger picture of everyday software development problems and challenges. In my opinion, that picture does not look perfect. With more people programming overall, often without a computer science background, there are plenty of mistakes and misconceptions, especially related to software performance.

Overall, with the fantastic people I had a chance to work with, I believe we achieved amazing things. I was lucky to work in environments where high-quality code was more important than decreasing code review iterations or reducing time spent addressing style issues. We thrived for good system design, code maintainability, and readability. We tried to bring those values to open source too, and I think we did a good job there. However, there is one important thing I would improve if I had a chance to write, for instance, the Thanos project again. You probably would not guess what. I would try to focus more on the pragmatic efficiency of my code and the algorithms we chose. I would focus on learning how to quickly gather data about my code module’s performance, benchmark, profile, and understand many performance tools. Go, and other languages provide. I would avoid many hours I spent making mistakes and understanding the performance tooling and Go runtime behaviour. And don’t get me wrong, the Thanos system nowadays is faster and uses much fewer resources than some competitors, but it took a lot of time, and there is still a massive amount of hardware resources we could use less. If I would apply the knowledge, tips, and suggestions that you will learn in this book, I believe we could cut the development cost in half, if not more, to have Thanos in the state we have today. (I hope my ex-boss who paid for this work won’t read that thesis!).

Don’t you believe me? Well, hear this: Thanos was almost fully functional in February 2018, after three months of development we did with Fabian Reinartz. However, it took six more months to improve performance to the state where queries no longer crashed the process due to OOMs2, and latencies were enough for initial production use.

What blocked me from caring about efficiency a little bit more from the start? First of all, a lack of skills. There was not much literature that would give me the practical answer to our performance or scaling questions, especially for Go. And even when I found a way to improve our Go code’s efficiency, almost no one could review it and verify it properly. In both open-source and organization, not many around me had practical awareness of what to do around Go code performance. Fortunately, today we live in a better world, with consolidated literature about pragmatic efficiency. You are reading such literature right now!

The second reason is even harder to improve upon. It is about the misleading perception of performance optimizations. The impression that “premature optimization is the root of all evil” as Donald E. Knuth wrote3, spread the world, giving everyone an excellent excuse to do less optimization work. Taken to the extreme, it demotivated people to even learn about efficiency practices and think about it, which had a strong impact on the software industry we know now. In this chapter, in “Common Performance Misconceptions”, I have collected five major ones that we will unpack. Those and similar misconceptions are the main reasons why basic programs like Microsoft Excel are so slow, why the battery in your smartphone only lasts a few hours, and why your cloud provider bill is so large.

Let’s start with unpacking some of those misconceptions by first exploring what performance actually means.

Behind Performance

Before discussing why software efficiency or any form of optimizations matters and when to apply them, we must first demystify the overused word performance. In engineering, this word is used in too many contexts and can mean different things, so let’s unpack it to avoid confusion.

Did you know?

Word performant does not exist in English vocabulary. Sadly our code cannot be performant, indicating that there is always room to improve things4. The question is, at what point we should say stop.

When people say “this application is performing poorly”, they usually mean that this particular program is executing slowly5. However, if the same people would say, “Bartek is not performing well at work”, they probably do not mean that Bartek is walking too slowly from the computer to the meeting room. In my experience, a significant number of people in software development consider the word performance a synonym of speed. For others, it means the overall quality of execution, which is the original definition of this word6. This phenomenon is sometimes called a “Semantic Diffusion”. It occurs when a word starts to be used by larger groups with different meaning it originally had.].

The word performance in computer performance means the same thing that performance means in other contexts, that is, it means “How well is the computer doing the work it is supposed to do?

Computer Performance Analysis with Mathematica by Arnold O. Allen, Academic Press

I think Arnold’s definition describes the performance world as accurately as possible, so it’s might be the first actionable item you can take from this book. Be specific.

Clarify when someone uses the word “performance”.

When reading the documentation, code, bug trackers, or attending conference talks, be careful when you hear that word. Ask follow-up questions and ensure what the author means.

In practice, performance, as the quality of overall execution, might contain much more than we typically think. It might feel picky, but if we want to improve software development’s cost-effectiveness, we must communicate clearly, efficiently and effectively!

I would suggest we avoid this word unless we can specify what we mean. Imagine you are reporting a bug in a bug tracker like GitHub Issues. Especially there, don’t just mention “low performance”, but instead specify what exactly was the unexpected behaviour of the application you were describing. Similarly, when describing improvements for a software release in the changelog7, don’t just mention “improving performance”. Describe what exactly was enhanced. Maybe part of the system is now less prone to user input errors, use less RAM (if yes, how much less, in what circumstances?) or execute something faster (how many seconds faster, for what kind of workloads?). Being explicit will save time for you and your users.

When you see the word performance in my book, in the context of application computation, you can refer to it as visualized in Figure 1-1.

Performance means runtime quality consists of Accuracy, Efficiency and Speed
Figure 1-1. Performance Definition

In principle, software performance means “how well software runs” and consists of three core execution elements you can improve (or sacrifice):

Accuracy

The number of errors you do make while doing the work to accomplish the task. It can be measured for software by the number of wrong results your application produces. For example, how many requests finished with non 200 HTTP status codes in a web system

Speed

How fast you do the work needed to accomplish the task, the timeliness of execution. It can be observed by operation latency or throughput. For example, compressing 1GB of data in memory typically takes around 10s (latency), allowing approximately 100MB per second throughput

Efficiency

The ratio of the useful energy delivered by a dynamic system to the energy supplied to it. In simpler words, it is the indicator of how many extra resources, energy, or work was used to accomplish the task. A waste. It is sometimes an easily measurable concept, quantified by the ratio of sound output to total helpful input. For instance, if our operation of fetching 64 bytes of valuable data from disk allocates 420 bytes on RAM, our memory efficiency would equal energyoutput/energyused*100%=15.23%. Note that this does not mean our operation is 15.23% efficient in total. We did not calculate energy, CPU time, heat and other efficiencies. For practical purposes, we tend to specify what efficiency we have in mind. Alternatively, when talking about overall program efficiency, we mean we don’t waste significant effort.

performance=(accuracy*efficiency*speed)

Improving any of those enhances the performance of the running application or system. It can help with reliability, availability, resiliency, overall latency, and more. Similarly, ignoring any of those can make our software less useful. Those three elements might also feel disjointed, but in fact, they are connected. For instance, notice that we can still achieve better reliability and availability without changing accuracy (not reducing the number of bugs). For example, with efficiency, reducing memory consumption decreases the chances of running out of memory and crashing the application or host operating system. This book focuses on knowledge, technics and methods, allowing you to increase the efficiency and speed of your running code without degrading accuracy.

It’s no mistake that the title of my book is called “Efficient Go”

My goal is to teach you pragmatic skills allowing you to produce high quality, accurate, efficient and fast code with minimum effort. For this purposes, when I mention the overall efficiency of the code in my book (without saying a particular resource), I mean both speed and efficiency as shown in Figure 1-1. Trust me, this will help us to get through the subject effectively. You will learn more about why in “Efficiency: The Key to Pragmatic Code Performance”.

Misleading use of the performance word might be the tip of the misconceptions iceberg in this area. We will now walk through many more serious stereotypes and tendencies that are causing the development and our software to worsen. In the best case, it results in more expensive to run or less useful programs. In the worse case, causing severe social and financial organizational problems.

Common Performance Misconceptions

The number of times when I was asked, on my code reviews or sprint planning, to ignore performance “for now” is staggering. And you have probably heard that too! I also rejected someone else’s change-set for the same reasons numerous times. Perhaps our changes were dismissed at that time for good reasons, especially if they were micro-optimizations that added unnecessary complexity.

On the other hand, there were also cases where the reasons for rejection were based on common, factual performance misconceptions. Let’s try to unpack five of the most damaging misunderstandings. Be cautious when you hear some of those generalized statements. Demystifying them might help you save enormous development costs long term.

Optimized Code is Not Readable

Undoubtedly, one of the most critical qualities of software code is its readability.

(…) it is more important to make the purpose of the code unmistakable than to display virtuosity. (…) The problem with obscure code is that debugging and modification become much more difficult, and these are already the hardest aspects of computer programming. Besides, there is the added danger that a too clever program may not say what you thought it said.

Brian W. Kernighan and P. J. Plauger, The Elements of Programming Style (1978)

When we think about ultra-fast code, the first thing that sometimes comes to our minds is those clever, low-level implementations with a bunch of byte shifts, magic byte paddings, and unrolled loops. Or worse, pure assembly code linked to your application. Yes, low-level optimizations in this direction can make our code significantly less readable, but let’s be honest, those are extreme and rare cases and should be applied ultra carefully.

Code optimizations might produce some extra complexity, increase cognitive load, and make our code harder to maintain. But such risk exists if we add any other functionality or change the code for different reasons. The problem is that engineers tend to connect optimization to complexity to the extreme and avoid performance optimization as fire. In their minds, it translates to an immediate negative readability impact. The point of this section is to show you that there are ways to make performance-optimized code clear. Efficiency and readability can coexist. Similarly, you can add a feature to your program that does not impact your ability to understand code. Refusing to write more efficient code because of fear of losing readability is like refusing to add vital functionality to avoid complexity. Of course, we can consider descoping it, but we should evaluate the consequences first.

For example, when you want to add extra validation to the input, you can naively paste a complex 50 code line’s waterfall of if statements directly into the handling function, making the next reader of your code cry (Or yourself when you will revisit this code month later). Or, you can encapsulate everything to a single func validate(input string) error function, which will add only slight complexity. Alternatively, to avoid modifying the handling block of code, you can design the code to validate it on the caller side or in middleware. We can also rethink our system design and move validation complexity to another system or component, thus not implementing this feature at all.

How are performance improvements in our code different from extra features? I would argue they are not. You can design efficiency optimizations with readability in mind in the same manner as you do with features. Both can impact readability in a variety of ways—from trashing your code to adding little complexity. Both can be entirely transparent to the readers if hidden under abstractions8.

Yet, we tend to mark optimizations as the primary source of readability problems. The foremost damaging consequence of this and other misconceptions in this chapter is that it’s often used as an excuse to ignore performance improvements completely. This often leads to something called “premature pessimization”, the act of making the program less efficient, the opposite of optimization.

Easy on yourself, easy on the code: All other things being equal, notably code complexity and readability, certain efficient design patterns and coding idioms should just flow naturally from your fingertips and are no harder to write than the pessimized alternatives. This is not premature optimization; it is avoiding gratuitous [author: unnecesary] pessimization.

H. Sutter and A. Alexandrescu, C++ Coding Standards: 101 Rules

Readability is essential. I would even argue that unreadable code is rarely efficient in the long haul. When software evolves, it’s easy to break previously made too clever optimization because we misinterpret or misunderstand it. Similar to bugs and mistakes, it’s easier to cause performance issues in tricky code. In XREF HERE, you will see examples of pragmatic efficiency, where the code stays maintainable and easy to read despite having better performance.

Tip

It’s easier to optimize readable code than make heavily optimized code readable. If you can’t achieve both, in most cases, default to readability.

Optimization often results in less readable code because we don’t design good efficiency into our software from the beginning. If you refuse to think about efficiency now, it might be too late to optimize the code later without impacting readability. It’s much easier to find a way to introduce a simpler and more efficient way of doing things in the fresh modules where we just started to design APIs and abstractions. As you will learn in XREF HERE, we can do performance optimizations on many different levels, not only via nit-picking and code tuning. Perhaps we can choose a more efficient algorithm, faster data structure, or chose a different system tradeoff. These will likely result in much cleaner, maintainable code and better performance than improving efficiency after releasing the software. Under many constraints, like backward compatibility, integrations, or strict interfaces, our only way to improve performance would be to introduce additional, often significant, complexity to the code or system.

So, if you add new code, don’t sacrifice readability. Surprisingly, code after optimization can be more readable! Let’s look at a few Go code examples. In Example 1-1, you will see a sub-optimal code, potentially a “pessimization”, that I have personally seen hundreds of times when reviewing student or junior developer Go code:

Example 1-1. Simple calculation for the ratio of reported errors.
type ReportGetter interface {
   Get() []Report
}

func FailureRatio(reports ReportGetter) float64 { 1
   if len(reports.Get()) == 0 {
      return 0
   }

   var sum float64
   for _, report := range reports.Get() {
      if report.Error() != nil {
         sum++
      }
   }
   return sum / float64(len(reports.Get()))
}
1

This is a simplified example, but there is quite a popular pattern of passing a function or interface to get elements needed for operation instead of passing them directly. It is useful when elements are dynamically added, cached, or fetched from remote databases.

I think you would agree that code from Example 1-1 would work for most cases, is simple and readable. Still, I would most likely not accept such code mainly because of potential efficiency issues. I would suggest simple modification as in Example 1-2 instead:

Example 1-2. Simple, more efficient calculation for the ratio of reported errors.
func FailureRatio(reports ReportGetter) float64 {
   got := reports.Get() 1
   if len(got) == 0 {
      return 0
   }

   var sum float64
   for _, report := range got {
      if report.Error() != nil {
         sum++
      }
   }
   return sum / float64(len(got))
}
1

Notice that in comparison with Example 1-1 instead of calling method Get in three places, I do it once and reuse the result via the got variable.

Some developers could argue that the FailureRatio function is potentially used very rarely, it’s not on a critical path, and the current ReportGetter implementation is very cheap and fast. They could argue that Example 1-1 is more readable and only a slower few nanoseconds due to three times fewer function calls (Get). They could call my suggestion a “premature optimization”.

However, I deem it is a very popular case of premature pessimization. It is a silly case of rejecting more efficient code that does not speed up things a lot right now but does not harm either. On the contrary, I would argue that in our example, Example 1-2 is superior in many aspects:

Example 1-2 code is more efficient

Interfaces allow us to replace the implementation. They represent a certain contract between users and implementations. From the point of view of the FailureRatio function, we cannot assume anything beyond that contract. Most likely, we cannot assume that the ReportGetter ’s Get code will always be fast and cheap9. Tomorrow, someone might swap the Get code with the expensive I/O operation against filesystem, implementation with mutexes or call to the remote database10. Users will most likely forget to optimize this function from Example 1-1 when this will happen.

Example 1-2 code is safer

It is potentially not visible in plain sight, but the code from Example 1-1 has a considerable risk of introducing race conditions. We may hit a problem if the ReportGetter implementation is synchronized with other threads that dynamically change the Get() result over time. It’s better to avoid races and ensure consistency within a function body. Race errors are the hardest to debug and detect, so it’s better to be safe than sorry.

Example 1-2 code is more readable

We might be adding one more line and extra variable, but at the end, the code Example 1-2 is explicitly telling us that we want to use the same result across three usages. By replacing three instances of the Get() call with a simple variable, we also minimize the potential side effects, making our FailureRatio purely functional (except the first line). By all means Example 1-2 is thus more readable then Example 1-1.

“Premature” optimization is the root of (readability) evil

Such a statement might be accurate, but evil is in the “premature” part. Not every performance optimization is premature. Furthermore, such a rule is not a license for rejecting or forgetting about more efficient solutions with comparable complexity.

Another example of optimized code yielding clarity is visualized by code Example 1-3 and Example 1-4:

Example 1-3. Simple loop without optimization
func createSlice(n int) (slice []string) { 1
   for i := 0; i < n; i++ {
      slice = append(slice, "I", "am", "going", "to", "take", "some", "space") 2
   }
   return slice
}
1

Return named parameter called “slice” will create a variable holding an empty string slice at the start of the function call.

2

We append seven string items to the slice and repeat that n times.

Example 1-3 shows how we usually fill slices in Go, and one would say nothing is wrong here. It just works. However, I would argue that this is not how we should append in the loop if we know exactly how many elements we will append to the slice upfront. Instead, in my opinion, we should always write it as in Example 1-4.

Example 1-4. Simple loop with pre-allocation optimization. Is this less readable?
func createSlice(n int) []string {
   slice := make([]string, 0, n*7) 1
   for i := 0; i < n; i++ {
      slice = append(slice, "I", "am", "going", "to", "take", "some", "space") 2
   }
   return slice
}
1

We are creating a variable holding the string slice. We are also allocating space (capacity) for n * 7 strings for this slice.

2

We append seven string items to the slice and repeat that n times.

We will talk about the efficiency optimizations like in Example 1-2 and Example 1-4 in XREF HERE with the more profound Go runtime knowledge from XREF HERE. In principle, both allow our program to do less work. In Example 1-4 thanks to initial pre-allocation, internal append implementation does not need to extend slice size in memory progressively. We do it once at the start. Now, I would like you to focus on the following question: Is this code more or less readable?

Readability can often be objective, but I would argue the more efficient code from Example 1-4 is more understandable. It adds one more line, so we could say the code is a bit more complex, but at the same time, it is explicit and clear in the message. Not only does it help Go runtime to perform less work, but it also hints to the reader about the purpose of this loop and how many iterations we exactly expect.

If you have never seen raw usage of the built-in make function in Go, you probably would say that this code is less readable. That is fair. However, once you realize the benefit and start using this pattern consistently across the code, it becomes a good habit. Even more, thanks to that, any slice creation without such pre-allocation tells you something too. For instance, it could say that the number of iterations is unpredictable, so you know to be more careful. You know one thing before you even looked at the loop’s content! To make such a habit consistent across Prometheus and Thanos codebase, we even added a related entry to Thanos Go style guide.

Readability is not written in stone. It is dynamic.

The ability to understand certain software code can change over time, even if the code never changes. Conventions come and go as the language community tries new things. With strict consistency, you can help the reader to understand even more complex pieces of your program by introducing a new, clear convention.

Finally, to understand Knuth’s “premature optimization is the root of all evil” quote, we need to apply a specific context. While we can learn a lot about general programming from the past, there are many things we improved enormously from what engineers had in 1974. For example, back then, it was popular to add information about the type of the variable to its name as showcased in Example 1-511.

Example 1-5. Example of System Hungarian Notation applied to Go code.
type structSystem struct {
   sliceU32Numbers []uint32
   bCharacter      byte
   f64Ratio        float64
}

Hungarian notation was useful because compilers and Integrated Development Environments (IDEs) were not very mature at that point. However, nowadays, on our IDEs or even repository websites like GitHub, we can hover over the variable to immediately know its type. We can go to the variable definition in milliseconds, read the commentary, find all invocations and mutations. With smart code suggestions, advanced highlighting, and dominance of object-oriented programming started in the mid-1990s, we have tools in our hands that allow us to add features and efficiency optimizations (so complexity) without significantly impacting the practical readability12. Furthermore, the accessibility and capabilities of observability and debugging tools have grown enormously, which we will explore in XREF HERE. It still does not permit clever code but allow us to understand bigger codebases much quicker.

To sum up, performance optimization is like another feature in our software, and we should treat it accordingly. It can add complexity, but there are ways to minimize the cognitive load required to understand our code13. As we learned in this chapter, there are even cases when a more efficient program is often a side effect of the simple, explicit, and understandable code. All practical suggestions and code examples you will see in this book have readability in mind.

Warning

We need performance, but our goal is to make the code both readable and efficient. Ideally, we should never sacrifice readability. Simplify, use abstractions, encapsulation, and commentaries to avoid surprises and hard-to-follow code.

You Aren’t Going to Need It (YAGNI)

YAGNI is a powerful and popular rule that I use very often while writing or reviewing any software.

One of the most widely publicized principles of XP [Extreme Programming] is the You Aren’t Going to Need It (YAGNI) principle. The YAGNI principle highlights the value of delaying an investment decision in the face of uncertainty about the return on the investment. In the context of XP, this implies delaying the implementation of fuzzy features until uncertainty about their value is resolved.

Erdogmus, Hakan; Favaro

In principle, it means avoiding doing the extra work that is not strictly needed for the current requirements. It relies on the fact that requirements constantly change, and we have to embrace iterating rapidly on our software.

Let’s imagine a potential situation where Katie, a senior software engineer, was assigned the task of creating a simple webserver. Nothing fancy, just an HTTP server that exposes some REST endpoint. Katie is an experienced developer who created probably a hundred similar endpoints in the past. She went ahead, programmed functionality, and tested the server in no time. Given some time left, she decided to add extra functionality, a simple bearer token authorization ( Simple token-based authorization technique) layer. Katie knows that such change is outside of the current requirements, but she had written hundreds of REST endpoints, and each of those had a similar authorization. Experience tells her it’s highly likely such requirements will come soon, too, so she will be prepared. Do you think such a change would make sense and should be accepted?

While Katie had shown a good intention and solid experience, we should refrain from merging such change to preserve the quality of the webserver code and overall development cost-effectiveness. In other words, we should apply the YAGNI rule. Why? In most cases, we cannot predict the feature. Sticking to requirements allows us to save time and complexity. There is a risk that the project will never need an authorization layer. For example, if the server is running behind a dedicated authorization proxy. In such a case, the extra code Katie wrote can bring a high cost even if not used. It is an additional code to read, which adds up to the cognitive load. Furthermore, it will be harder to change or refactor such code when needed.

Now, let’s step into a more grey area. We explained to Katie why we need to reject the authorization code. She agreed, and instead, she decided to add some critical monitoring to the server by instrumenting it with a few vital metrics. Does this change violate the YAGNI rule too?

If monitoring is part of the requirements, it does not violate YAGNI and should be accepted. If it’s not, without knowing the full context, it’s hard to say. Critical monitoring should be explicitly mentioned in requirements. Still, even if not, webserver observability is the first thing that will be needed when we will run such code anywhere. Otherwise, how will we know that it is even running? In this case, Katie is technically doing something important that is immediately useful. In the end, we should apply common sense and judgment and add or explicitly remove monitoring from the software requirements before merging such change.

After such a change, in her free time, Katie (who writes solid code very quickly!) decided to add a simple cache to the necessary computation that enhances the performance of the separate endpoint reads. She even wrote and performed a quick benchmark to verify the endpoint’s latency and resource consumption improvements. Does that violate the YAGNI rule?

The sad truth about software development is that performance efficiency and response time are often missing from stakeholders’ requirements. The target performance goal for an application is to “just work” and be “fast enough”, without details on what that means. We will discuss on how to define practical software performance requirements in XREF HERE, commonly known nowadays as Service Level Objectives (SLO). For this example, let’s assume the worst. There was nothing in the requirements list about performance. Should we then apply YAGNI and reject Katie’s change?

Again, hard to tell without full context. Implementing a robust and usable cache is not trivial, so how complex is the new code? Is the data we are working on easily “cachable”?14 Do we know how often such an endpoint will be used (is it a critical path)? How far should it scale? On the other hand, computing the same result for a heavily used endpoint is highly inefficient, so cache is a good pattern.

I would suggest Katie takes a similar approach as she did with monitoring change—consider discussing with the team to clarify the performance guarantees that web service should offer. That will tell us if the cache is required now or it is violating the YAGNI rule.

As the last change, Katie went ahead and applied a simple efficiency rule mentioned in Thanos Style Guide. For instance, in relevant places, she implemented the slice pre-allocation improvement you learned from example Example 1-4. Should we accept such a change?

I would be strict here and say yes. Maybe you noticed that when we talked about readability, I suggested optimizing Example 1-4 always when you know the number of elements upfront. Isn’t that violating the core statement behind the YAGNI rule? Even if something is generally applicable, you shouldn’t do it before you are sure You Are Going to Need It?

I would argue that such small efficiency habits that do not reduce code readability (some even improve it) should generally be an essential part of the developer job, even if not explicitly mentioned in requirements. Similarly, no project requirements state basic best practices like code versioning, having small interfaces, or avoiding big dependencies.

The main takeaway here is that using the YAGNI rule helps, but it is not permission for developers to completely ignoring performance efficiency. As you will learn from XREF HERE, it’s usually thousands of small things that make up excessively resource usage and latency of an application, not just a single thing we can fix later. Ideally, well-defined requirements help clarify your software’s efficiency needs, but they will never cover all details and best practices we should try to apply nevertheless.

Hardware is Getting Faster and Cheaper

Undoubtedly hardware is more powerful and less expensive than ever before. We see technological advancement on almost every front every year or month. From single-core Pentium CPUs with a 200 MHz clock rate in 1995 to smaller, energy-efficient CPUs capable of 3-4 GHz speeds. RAM sizes increasing from dozens of MB in 2000 to 64 GB in personal computers 20 years later, with faster access patterns. In the past, small capacity hard disks, moving to SSD, then 7GB/s fast NVME SSD disks with few TB of space. Network interfaces achieving 100 Gigabits throughput. In terms of remote storage, I remember Floppy Disks with 1.44 MB of space, then read-only CD-ROMs with a capacity of up to 553 MB, next we had Blue-Ray, read-write capability DVDs, now it’s easy to get SD cards with TB sizes.

Now let’s add to the above facts the popular opinion that amortized hourly value of typical hardware is cheaper than the developer hour. With all of this, one would say that it does not matter if a single function in code takes 1 MB more or does excessive disk reads. Why should we delay features, educate or invest in performance-aware engineers if we could buy bigger servers and overall pay less?

As you probably imagine, this is not that simple. Let’s unpack this quite harmful argument descoping the efficiency and performance from the software development todo list.

First of all, stating that spending more money on hardware is cheaper than investing the expensive developer time into efficiency topics is very short-sighted. It is like claiming that we should buy a new car and sell an old one every time something breaks, only because we don’t want to care, learn or pay for an automotive mechanics job. Sure, that can work, but it would not be a very efficient approach. In many cases, that is simply a misleading oversimplification.

Let’s assume a software developer’s annual salary oscillates around $100,000. With other employment costs, let’s say the company has to pay $120,000 yearly, so $10,000 monthly. For $10,000 in 2021, you can buy a server with 1 TB of DDR4 memory, two high-end CPUs, 1 Gigabit network card, and 10 TB of hard disk space. Let’s ignore for now the energy consumption cost. Such a deal means that our software can over-allocate terabytes of memory every month, and we would still be better off than hiring an engineer to optimize this, right? Unfortunately, it does not work like this.

First of all, terabytes of allocation are more common than you think, and you don’t need to wait a whole month! Figure 1-2 shows a screenshot of the heap memory profile of a single replica (of six totally), of a single service (of dozens) running in a single cluster, and we run thousands of clusters at Red Hat. We will discuss how to read and use profiles in XREF HERE, but the particular screenshot in Figure 1-2 shows the total memory allocated since the last restart of the process five days before.

Profile was taken at 21 Mar, 11:25 am UTC, service started at 16 Mar, 11:47 am.
Figure 1-2. Snippet of memory profile showing all memory allocations within five days made by high traffic service. See the full [profile here]( https://share.polarsignals.com/378a246)

Most of that memory was already released, but notice that this software from the Thanos project used 17.61 TB in total for only five days of running15. If you write desktop applications or tools instead, you will hit a similar scale issue sooner or later. Taking the previous example, if one function is over-allocating 1MB, it is enough to run it 100 times for critical operation in our application with only 100 desktop users to get to 10 TB wasted in total. Not in a month, but on a single run done by 100 users. As a result, slight inefficiency can create overabundant hardware resources quickly.

There is more. To afford an over-allocation of 10 TB, it is not enough to buy a server with that much memory and pay for energy consumption. The amortized cost, among other things, has to include writing, buying, or at least maintaining firmware, drivers, operating system, and software to monitor, update and operate such server. Since for extra hardware, we need additional software, by definition, it requires spending money on engineers, so we are back where we were. We might have saved engineering costs by avoiding focusing on performance optimizations. In return, we would spend more on other engineers required to maintain over-used resources or pay a cloud provider which already calculated such extra cost, plus their profit into the cloud usage bill.

On the other hand, today, 10 TB of memory costs a lot, but tomorrow it might be a marginal cost due to technological advancements. What if we ignore performance problems and wait until server cost decreases or more users replace their laptops or phones with faster ones. Waiting is easier than debugging tricky performance issues!

Unfortunately, we cannot skip software development efficiency and expect hardware advancements to mitigate needs and performance mistakes. Hardware is getting faster and more powerful, yes. But, unfortunately, not fast enough. Let’s go through three main reasons behind that this not intuitive effect.

First of all, there is a saying that “software expands to fill the available memory”. This effect is known as Parkinson’s law16. It states that no matter how many resources we have, the demands tend to match the supply. For example, Parkinson’s Law is heavily visible in universities. No matter how much time the professor would give for the assignments or exam preparations, students will always use all of it and probably do most of it last minute17. We can see similar behaviour in software development too.

The second reason is that software tends to get slower more rapidly than hardware becomes faster. Niklaus mentions a “Fat Software” term that explains why there will always be more demand for more hardware.

Increased hardware power has undoubtedly been the primary incentive for vendors to tackle more complex problems (…). But it is not the inherent complexity that should concern us; it is the self-inflicted complexity. There are many problems that were solved long ago, but for the same problems, we are not offered solutions wrapped in much bulkier software.

Niklaus Wirth, A Plea for Lean Software (1995)

Software is getting slower faster than hardware is getting more powerful because products have to invest in a better user experience to get profitable. Prettier operating systems, glowing icons, complex animations, high definition videos on websites, or fancy emojis that mimic your face expression, thanks to facial recognition techniques. It’s a never-ending battle for clients, which brings more complexity, thus computational demands.

On top of that, rapid democratization of software thanks to better access to computers, servers, mobile phones, IoT devices, and any other kind of electronics. As Marc Andreessen said, “Software is eating the world”. The COVID-19 pandemic that started in late 2019 accelerated digitalization even more as remote, internet-based services became the critical backbone of modern society. We might have more computation power available every day, but more functionalities and user interactions consume all of it and demand even more. In the end, I would argue that our overused 1MB in the aforementioned single function might become a critical bottleneck on such a scale pretty quickly.

If that feels still very hypothetical to you, just look at the software around you. We use social media, where Facebook alone generates 4 PB18] of data per day. We search online, causing Google to process 20 PB of data per day. However, one would say those are rare, planet-scale systems with billions of users. Typical developers don’t have such problems, right? When I looked at most of the software co-created or used, they hit some performance issue related to significant data usage sooner or later. Prometheus UI page, written in React, was performing a search on millions of metric names or tried to fetch hundreds of MBs of compressed samples, causing browser latencies and explosive memory usage. Single Kubernetes at our infrastructure, with low usage, are generating 0.5 TB of logs per day (most of those never used). The excellent grammar checking tool I used to write this book is making too many network calls when text has more than 20 000 words, slowing my browser considerably. Our simple script for formatting our documentation in markdown and link checking took minutes to process all elements. Our Go static analysis job and linting exceeded 4GB of memory and crashed our CI jobs. It used to take 20 minutes for my IDE to index all code from our mono-repo, despite doing it on a top-shelf laptop. I still haven’t edited my 4K ultra-wide videos from GoPro because the software was too laggy. I could go forever with those examples, but the point is that we live in a really “Big Data” world.

When I started programming we not only had slow processors, we also had very limited memory — sometimes measured in kilobytes. So we had to think about memory and optimize memory consumption wisely.

V. Simonov; https://va.lent.in/optimize-for-readability-first/[Optimize for readability first (2014)]

While we can acknowledge that hardware was more constrained in the past, this does not change the fact that we can now ignore optimizations. Because of big data, we have to optimize memory and other resources wisely, if not wiser than before. It also will be much worse in the future. Our software and hardware have to handle the data growing at extreme rates, faster than any hardware development. We are just on the edge of introducing 5G networks capable of transfers up to 20 Gigabits per second. We introduce mini-computers in almost every item we buy like TVs, bikes, washing machines, freezers, desk lamps, or even deodorants!. We call this movement “Internet of Things” (IoT). Data from those devices are estimated to grow from 18.3 ZB in 2019 to 73.1 ZB by 202519. The industry can produce 8K TVs, rendering resolutions of 7,680 by 4,320, so approximately 33 million pixels. If you ever wrote any computer game, you are probably as scared as me. It will take a lot of efficient effort to render so many pixels in highly realistic games with immersive, highly destructive environments at 60+ frames per second. Modern cryptocurrencies and blockchain algorithms also pose challenges in computational energy efficiencies, with Bitcoin energy consumption increased in February 2021 to roughly 130 Terawatt-hours of energy (0.6% of global electricity consumption).

The last reason, but not least, behind not fast enough hardware progression is that hardware advancement stalled on some fronts like CPU speed (clock-rate) or memory access speeds. We will cover some challenges of that situation in XREF HERE, but I believe every developer should be aware of the fundamental technological limits we are hiting right now.

It would be odd to read a modern book about efficiency that does not mention Moore’s Law, right? You probably heard of it somewhere before. It was first stated in 1965 by former CEO and co-founder of Intel, Gordon Moore.

The complexity for minimum component costs [author: the number of transistors, with minimal manufacturing cost per chip] has increased at a rate of roughly a factor of two per year. (…) Over the longer term, the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearly constant for at least 10 years. That means by 1975, the number of components per integrated circuit for minimum cost will be 65,000.

Gordon E. Moore, Cramming more components onto integrated circuits

Mr Moore’s observation had a big impact on the semiconductor industry. But decreasing the transistors’ size would not have been that beneficial if not for Rober H. Dennard and his team. In 1974 their experiment revealed that power use stays proportional to the transistor dimension (constant power density)20. This means that smaller transistors were more power-efficient. In the end, both laws promised exponential performance per watt growth of transistors. It motivated investors to continuously research and develop ways to decrease the size of MOSFET21 transistors. We can also fit more of them on even smaller, more dense microchips, which reduced manufacturing costs. The industry continuously decreased the amount of space needed to fit the same amount of computing power, enhancing any chip, from CPU, through RAM and Flash memory to GPS receivers and high definition camera sensors.

In practice, Moore’s prediction lasted not ten years as he thought, but nearly 60 so far and still holds. We continue to invent tinier, microscopic transistors, currently oscillating around ~70nm. Probably we can make them even smaller. Unfortunately, as we can see on ??? we reached the physical limit of Dennard’s Scaling around 200622.

  1. from Why we cannot have faster transistors: Moore’s Law vs Dennard’s Rule image::images/hardware.png[""]

While technically, power usage of the higher density of tiny transistors remains constant, such dense chips heat up quickly. Beyond 3-4 GHz of clock speed, it takes significantly more power and other costs to cool those transistors to keep them running. As a result, unless you plan to run software on the bottom of the ocean23, you are not getting CPUs with faster instruction execution anytime soon. We only can have more cores.

So, what we have learned so far? Hardware speed is getting capped, the software is getting bulkier, and we have to handle continuous growth in data and users. Unfortunately, that’s not the end. There is a vital resource we tend to forget about while developing the software: power. Every computation of our process takes electricity, which is heavily constrained on many platforms like mobile phones, smartwatches, IoT devices, or laptops. Non-intuitively there is a strong correlation between energy efficency and software speed and efficiency. I love the Chandler Carruth presentation, which explained this surprising relation well:

If you ever read about “power-efficient instructions” or “optimizing for power usage”, you should become very suspicious. (…) This is mostly total junk science. Here is the number one leading theory about how to save battery life. Finish running the program. Seriously, race to sleep. The faster your software runs, the less power it consumes. (…) Every single general-usage microprocessor you can get today, the way it conserves power is by turning itself off. As rapidly and as frequently as possible.

Chandler Carruth, Efficiency with Algorithms, Performance with Data Structures

To sum up, avoid the common trap of thinking about hardware as a continuously faster and cheaper resource that will save us from optimizing our code. It’s a trap. Such a broken loop makes engineers gradually lower their coding standards in performance and demand more and faster hardware. Cheaper and more accessible hardware then creates even more mental room to skip efficiency and so on. There are amazing innovations like recent Apple’s M1 Silicon24, RISC-V standard25 and more practical Quantum computing appliances, which promise a lot. Unfortunately, as of 2021, hardware is growing slower than software efficiency needs.

If you are not yet convinced that we should take a little bit more thought on software efficiency, there is one more very important and, sadly, often forgotten point. Software developers are often “spoiled” and detached from the typical human reality. It’s often the case that engineers create and test the software on premium, high-end laptop or mobile devices. We need to realize that many people and organizations are and will be utilizing older hardware or worse internet connections26. It might be the case that people will have to run your applications on slower computers. It might be worth considering efficiency in our development process to improve our software’s overall accessibility and inclusiveness.

We Can Scale Horizontally Instead

As we learned in the previous sections, we expect our software to handle more data sooner or later. But it’s unlikely your project will have billions of users from day one. We can avoid enormous software complexity and development cost by pragmatically choosing a much lower target number of users, operations, or data sizes to aim for at the beginning of our development cycle. We usually simplify the initial programming cycle by assuming a low number of notes in your mobile note-taking app, a lower number of requests per second in the proxy you build, or smaller files in the data converter tool your team is working on. It’s ok to simplify things, but as you will learn in XREF HERE, it’s also important to roughly predict performance requirements in the early design phase. Similarly, it’s essential to find the expected load and usage in the mid to long term of software deployment. The software design that guarantees similar performance levels, even with increased traffic, is called “scalable”. Generally, scalability is very difficult and expensive to achieve in practice.

Even if a system is working reliably today, that doesn’t mean it will necessarily work reliably in the future. One common reason for degradation is increased load: perhaps the system has grown from 10,000 concurrent users to 100,000 concurrent users, or from 1 million to 10 million. Perhaps it is processing much larger volumes of data than it did before. Scalability is the term we use to describe a system’s ability to cope with increased load.

Martin Kleppmann, Designing Data-Intensive Application

Inevitability, while talking about performance, we might touch on some scalability topics in this book. For this chapter’s purpose, we can distinguish the scalability of our software into two types.

The first and sometimes the simplest way of scaling our application is by running the software on hardware with more resources (“vertical” scalability). For example, we could introduce parallelism for software to use not one but three CPU cores. If the load would increase, we provide more CPU cores. Similarly, if our process is memory intensive, we might bump running requirements and ask for bigger RAM space. The same with any other resource like disk, network or power. Obviously, that does not go without consequences. In the best case, you have that room in the target machine. Potentially you can make that room by re-scheduling other processes to different machines (e.g. when running in the cloud) or closing them temporarily (useful when running on the laptop or a smartphone). In the worse case, you need to buy a bigger computer, a more capable smartphone or laptop. The latter option is usually very limited, especially if you provide software for customers to run on their non-cloud premises. In the end, the usability of resource-hungry applications or websites that scale only vertically is much lower. If you or your customers run your software in the cloud, the situation is a little bit better. You can “just” buy a bigger server. As of 2021, you can scale up your software on the AWS platform up to 128 CPU cores, almost 4 TB of RAM and 14 Gbps of bandwidth27. In extreme cases you can also buy an IBM mainframe with 190 cores and 40TB of memory.

It does not take much time to notice that vertical scalability has its limits on many fronts. Even in cloud or datacenters, we simply cannot infinitely scale up the hardware. First of all, giant machines are rare and expensive. Secondly, as we will learn in XREF HERE, bigger machines run into complex issues caused by many hidden single points of failures. Pieces like memory bus, network interfaces, NUMA nodes, and the operating system itself can be overloaded and too slow28.

This is why engineers found a different way of fulfilling the demand. All thanks to the Internet and advancement in network technologies. Instead of a bigger machine, we might try to offload and share the computation across multiple remote, smaller, less complex, and much cheaper devices! How does it look in practice? To search for messages with the word “home” in a mobile messaging app, we could fetch millions of past messages (or store them locally in the first place) and run regex matching on each. Instead, we can design an API and call remotely a backend system that splits the search into 100 jobs matching 1/100 of the dataset. Instead of building “monolith” software, we could distribute different functionalities to separate components and move to a “microservice” design. Instead of running a game that requires expensive CPUs and GPUs on a personal computer or gaming console, we run it in a cloud and stream the input and output in high resolution.

scaling
Figure 1-3. Vertical vs Horizontal scalability.

With horizontal and vertical scalability in mind, let me show you a specific scenario from the past. Many modern databases rely on the compaction process to store and look up the data efficiently. We can reuse many indices during this process, deduplicate the same data, and gather fragmented pieces into the sequential stream of data for quicker reads. At the beginning of the Thanos project, we decided to reuse a very naive compaction algorithm for simplicity. We calculated that, in theory, we don’t need to make the compaction process parallel within a single block of data. Given a steady stream of 100+GB of eventually compacted data from a single source, we could rely on a single CPU and a minimal amount of memory and some disk space. The implementation was very naive and unoptimized at the very beginning, following good patterns D. Knuth mentioned. We wanted to avoid the complexity and effort of optimizing the project’s reliability and functionality features. As a result, users who deployed our project quickly hit compactions problems being either too slow to cope with incoming data or consuming hundreds of GB of memory per operations. The cost was the first problem, but not the most urgent. The bigger issue was that many Thanos users did not have bigger machines in their datacentres to scale the memory vertically.

At first glance, the compaction problem looked like a scalability problem. The compaction process was depending on resources that we could not just add up infinitely. As users wanted a solution fast, together with the community, we started brainstorming potential horizontal scalability techniques. We talked about introducing a compactor scheduler service that would assign compaction jobs to different machines or intelligent peer network using a gossip protocol. Without going into details, both solutions would add an enormous amount of complexity, probably doubling or tripling the complication of running the whole Thanos system. Luckily, it took a few days of brave and experienced developer time to redesign the code for efficiency and performance. It allowed the newer version of Thanos to make compactions twice faster and stream data directly from the disk, allowing minimal peak memory consumption. Three years later, the Thanos project still does not need to bring horizontal scalability for this operation.

It might feel funny now, but, personally, this story is quite scary. We were so close to bringing enormous, distributed system-level complexity. It would be fun to develop for sure, but it would also potentially kill the project’s adoption. A similar situation repeated many times in my career in both open and closed source for smaller and bigger projects.

We follow two rules in the matter of optimization: Rule 1. Don’t do it. Rule 2 (for experts only). Don’t do it yet — that is, not until you have a clear and unoptimized solution. (…) optimization makes a system less reliable and harder to maintain, and therefore more expensive to build and operate.

Michael A. Jackson, Principles of program design

Hopefully, previous misconceptions advised you to be careful with the strong “don’t do optimization” words. Michael had an excellent intention of deferring the unnecessary complexity. Unfortunately, as presented by the “lucky” Thanos compaction situation, if you follow that rule blindly, you can quickly end up with the premature “pessimization” that forces premature horizontal scalability. In other words, avoiding complexity can bring even bigger complexity. This appears to me as an unnoticed but critical problem in the industry. It is also one of the main reasons why I wrote this book. We might hit the need for scalability practices significantly quicker than we thought if we defer software performance and efficiency improvements. It is a massive trap because, with some optimization effort, we might even altogether avoid jumping into complications of the scalability methods.

The complications come from the fact that complexity has to live somewhere. We don’t want to complicate code, so we have to complicate the system, which, if built from non-efficient components, wastes not only resources but also an enormous amount of developer or operator time. Horizontal scalability is complex. By design, it involves network operations. As we might know from CAP Theorem29, we inevitably hit either availability or consistency issues as soon as we start distributing our process. Trust me, mitigating those elemental constraints, dealing with race conditions, and understanding the world of network latencies and unpredictability is a hundred times more difficult than extra complexity in our code hidden behind io.Reader interface.

It might feel to you that this section touches only infrastructure systems. That’s not true. It applies to all software. If you write a frontend software or dynamic website, there might be a temptation to move small client computations to the backend. We should probably only do that if such computation depends on the load and grows out of userspace hardware capabilities. Moving it to the server prematurely might cost you in the complexity caused by extra network calls, more error cases to handle and server saturations causing DOS 30. Another example comes from my experience. My master thesis was about “Particle Engine Using Computing Cluster”. In principle, the goal was to add to the game in the Unity3D a particle engine. The trick was that such a particle engine was not supposed to operate on client machines but offloading “expensive” computation to a nearby supercomputer in my University called “Tryton”31. Guess what? Despite the ultra-fast InfiniBand network32, all particles I tried to simulate (realistic rain and crowd) was much slower and less reliable when offloaded to our supercomputer. It was not only less complex but also much faster to compute all locally.

Summing up, when someone says, “don’t optimize, we can just scale horizontally”, be very suspicious. Generally, it is simpler and cheaper to start from efficiency improvements before we escalate to a scalability level. On the other hand, there should also be a judgment that tells you when optimizations are becoming too complex, and scalability might be a better option. You will learn more about that in XREF HERE.

Time to Market is More Important

Time is expensive. One aspect of this is that software developer time and expertise cost a lot. The more features you want your application or system to have, the more time is needed to design, implement, test, secure and optimize the solution’s performance. The second aspect is that the more time a company or individual spends to deliver the product or service, the longer their “time to market” is, which can hurt the financial results.

Once time was money. Now it is more valuable than money. A McKinsey study reports that, on average, companies lose 33% of after-tax profit when they ship products six months late, as compared with losses of 3.5% when they overspend 50% on product development.

Charles H. House and Raymond L. Price, The Return Map: Tracking Product Teams (1991)

It’s hard to measure such impact, but your product might no longer be pioneering when you are “late” to market. You might miss valuable opportunities or respond too late to a competitor’s new product. That’s why companies mitigate this risk by adopting agile methodologies or POC (Proof of Concept) and MVP (Minimal Viable Product) patterns.

Agile and smaller iterations help, but in the end, to achieve faster development cycles, companies try other things too: scale their teams (hire more people, redesign teams), simplify the product, do more automation or do partnerships. Sometimes they try to reduce the product quality. As Facebook’s proud initial motto was “Move fast and break things”33, it’s very common for companies to descope software quality in areas like code maintainability, reliability and performance to “beat” the market.

This what our last fifth misconception is all about. Descoping your software’s efficiency and performance quality to get to the market faster is not always the best idea. It’s good to know the consequences of such a decision. Know the risk first.

Optimization is a difficult and expensive process. Many engineers argue that this process delays entry into the marketplace and reduces profit. This may be true, but it ignores the cost associated with poor-performing products (particularly when there is competition in the marketplace).

Randall Hyde, The Fallacy of Premature Optimization (2009)

Bugs, security issues, and poor performance happen, but it might damage the company. Without looking too far, let’s look at a game released in late 2020 by the biggest polish game publisher, CD Project. The “Cyberpunk 2077” game was known to be a very ambitious, open world, massive and high-quality production. Well marketed, from a publisher with a good reputation, despite the delays, excited players around the world bought 8 million pre-orders. Unfortunately, when released in December 2020, the otherwise excellent game had massive performance issues. It had bugs, crashes, and a low frame rate (slowness) on all consoles and most PC setups. The game on some older consoles like PS4 or Xbox One was claimed to be unplayable.

Interestingly, for me, the game was fabulous despite all the problems. I pre-ordered Cyberpunk 2077 and played it on PS4 pro from day 1. The game had some issues, but it was not making the product unplayable, as others were admitting. It was crashing once every second hour, it was sometimes super slow and laggy, but it was a fully playable, fun and otherwise polished game. There were, of course, follow up updates with plenty of fixes and drastic improvements over the following months.

Unfortunately, it was too late. The damage was done. The issues, which for me felt somewhat minor, were enough to shake the CD Project’s financial perspectives. Five days after launch, the company lost one-third of its stock value costing founders more than one billion dollars. Millions of players were asking for game refunds. Investors sued the CD Project over game issues by investors and famous lead developers were leaving the company. Perhaps the publisher will survive and recover. Still, one can only imagine the implications of broken reputation impacting future productions.

More experienced and mature organizations know well the critical value of software performance, especially the client-facing ones. Amazon found that if their website were loading one second slower, they would lose $1.6 billion annually. They also reported that “100ms of latency costs 1% of profit”34. Google realized that slowing down web search from 400ms to 900ms caused a 20% drop in traffic. For some businesses, it’s even worse. It was estimated that if a broker’s electronic trading platform is five milliseconds slower than the competition, it could lose 1% of its cash flow, if not more. If 10ms slower, this number grows to a 10% drop in revenue.

Realistically speaking, it’s true that millisecond-level slowness might not matter in most software cases. For example, let’s say we want to implement a file converter from PDF to DOCX. Does it matter if the whole experience last 4 seconds or 100 milliseconds? In many cases, it does not. However, when someone puts that as a market value and a competitor’s product has a latency of 200 milliseconds, code efficiency and speed suddenly will be a matter of winning or losing customers. And if it’s physically possible to have such fast file conversion, competitors will try to achieve that sooner or later. This is also why so many projects, even in the open-source, are very loud about their performance results. While sometimes it feels like a cheap marketing trick, this works because if you have two similar solutions with similar feature sets, you will pick the fastest one.

It’s not all about the speed too. During my experience as a consultant around infrastructure systems, I saw many cases when customers migrated away from solutions requiring a larger amount of RAM or disk storage, even if that meant some loss in functionalities35.

To me, the verdict is simple. If you want to win the market, skipping efficiency and performance in your software might not be the best idea. Don’t wait with optimization for the last moment. On the other hand, time to market is critical, so the crucial aspect is to balance a good enough amount of performance work into your software development process. This book aims to help you find that healthy threshold and reduce the time required to improve the efficiency of your software.

At this point, you should be aware of five main misconceptions and why they can mislead us when planning software development. There is a single pattern to all of them - over simplifications. Let’s look at what that means.

Be Vigilant to Simplifications

All those misconceptions are not malicious in any form. It’s only natural for humans to oversimplify complex concepts and processes. Such generalization is often damaging, in the best case, causing unnecessary battles on which programming language or IDE is the best or if we should put this performance optimization ticket on the current sprint or not. In the worst case, simplifications are a source of more serious conflicts and stereotypes. Through the millenniums, we developed the tactic of reducing complex topics to a simple heuristic that can be intuitively executed by the “feeling” part of the brain when needed36. Such simplifications might feel helpful at first glance. However they can also be incredibly misleading.

We talked through five misconceptions that were excuses to avoid thinking about efficiency or speed. To reinforce this, let’s quickly unpack the simplification that was created based on Donald Knuth’s famous “premature optimization is the root of all evil” quote:

The full version of the [Donald’s Knuth] quote is “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.” and I agree with this. It’s usually not worth spending a lot of time micro-optimizing code before its obvious where the performance bottlenecks are. But, conversely, when designing software at a system level, performance issues should always be considered from the beginning. A good software developer will do this automatically, having developed a feel for where performance issues will cause problems. An inexperienced developer will not bother, misguidedly believing that a bit of fine-tuning at a later stage will fix any problems.

Randall Hyde, The Fallacy of Premature Optimization (2009)

Interestingly enough, if one would read the whole article where Donald Knuth put the quote, one would notice that it was more in favour of optimizations than against. The article was basically about writing efficient but also readable programs without goto statements. Nevertheless, premature optimization can be evil. And the essence of that statement is to understand when efficiency and speed optimization are premature. We will discuss that balance in details in XREF HERE.

Nothing is purely white or black. There is always something in the middle. Simplifications are natural and will always be tempting. The only thing we can do is teach others and ourselves not to take things to the extreme. We cannot categorize performance focus as something incredibly wrong or good without full context.

Let’s now look on what is the pragmatic way to think about performance.

Efficiency: The Key to Pragmatic Code Performance

In “Behind Performance”, we learned that performance splits into accuracy, speed and efficiency. I mentioned that in this book, I use efficiency in replacement for both speed and efficiency. The reason is that there is a practical suggestion hidden in that effort regarding how we should think about our code performing in the production. The lesson here is to stop thinking about the code performance regarding how fast it does things. Generally, for non-specialized software, speed only marginally matters. It is the waste and unnecessary consumption of resources, which translates to wasting time itself, which often stops us from achieving software goals like usability in a cost-effective way. Sadly, efficiency is often overlooked.

Let’s say you want to travel from city A to city B across the river. You can grab a fast car and drive around through the nearby bridge to get to city B quickly. But if you jump into the water and slowly swim across the river, you will be much faster in the city. Slower actions can still be faster when done efficiently, for example, by picking a shorter route. One could say, to improve travel performance and beat the swimmer, we could get a faster car, improve the road surface to reduce drag, or even add a rocket engine. We could potentially beat the swimmer, yes, but those drastic changes might be more expensive than simply doing less work and renting a boat instead.

Similar patterns exist in software. Let’s say our algorithm performs search functionality for certain words against specific input, and it performs slowly. Given that we operate on persistent data, the slowest operation is usually the data access, especially if our algorithm does them extensively. It’s very tempting to not think about efficiency and instead find a way to convince users to use SSD instead of HDD storage. This way, we could potentially reduce latency even up to 10x. That would be improving performance by increasing the speed element of the equation. On the contrary, if we would find a way to improve the current algorithm to read data only a few times instead of a million, we could achieve even lower latencies. That would mean we do less work to achieve the same effect, improving efficiency.

I want to propose focusing our efforts on efficiency instead of mere execution speed. That is also why this book’s title is “Efficient Go”, not something more general and catchy37 like “Ultra Performance Go”.

It’s not that speed is less relevant. It is important, and as you will learn in XREF HERE, you can have more efficient code that is much slower and vice versa. Sometimes it’s a tradeoff you will need to make. Both speed and efficiency are essential. Both can impact each other. In practice, when the program is doing less work on the critical path, it will most likely have lower latency. The other way around sometimes applies too. In the HDD vs SDD example above, changing to a faster disk might allow you to remove some caching logic, which results in better efficiency: less memory and CPU time used. As we learned in “Hardware is Getting Faster and Cheaper”, the faster your process is, the less energy it consumes, improving battery efficiency.

I would argue that we generally should focus on improving efficiency before speed as the first step when improving performance. In the context of this book and daily programming in Go, the efficiency element is typically the most impactful and essential. You might be surprised that sometimes after improving efficiency, you achieved desired latency! Let’s go through some further reasons why efficiency might be superior:

It is much harder to make efficient software slow

This is similar to the fact that readable code is easier to optimize. However, as I mentioned before, efficient code usually performs better simply because less work has to be done. In practice, this also translates to the fact that slow software is often inefficient.

Speed is more fragile

The latency of the software process depends on a huge amount of external factors. One can optimize the code for fast execution in a dedicated and isolated environment, but it can be much slower when left running for a longer time. At some point, CPUs might be throttled due to thermal issues with the server. Other processes (e.g. periodic backup) might surprisingly slow your main software38 by preempting it from the CPU core or exhausting memory bus (throughput between your CPU and RAM is limited too). The network might be throttled. There are tons of hidden unknowns to consider when we program for mere execution speed. This is why efficiency is usually what we, as programmers, can control the most.

Speed is less portable

Like above, if we optimize only for speed, we cannot assume it will work the same when moving our application from the developer machine to a server or between various client devices. Different hardware, environment, operating systems can diametrally change the latency of our application. That’s why it’s critical to design software for efficiency. First of all, there are fewer things that can be affected. Secondly, if you make two calls to the database on your developer machine, the chances are that you will do the same number of calls, no matter if you deploy it to an IoT device in the space station or ARM-based mainframe.

You can’t tell if the software is fast by looking at that code

Efficiency is easier to tell with just static analysis. You either do one iteration or thousands. You either allocate memory for an array of a hundred integers or billions. We can still be surprised about underlying inefficiencies, so we have to understand the runtime resource usage fully. However, we can find things and improve just faster than when improving for pure speed. As you will learn in XREF HERE, we cannot assess speed of our code with our own eyes. Experience helps, but we might be surprised by what’s slow and fast in many cases. We need to spend time on benchmarking and load testing to be sure, and that takes time.

Generally, efficiency is something we should do right after or together with readability. We should start thinking about that from the very beginning of the software design. The healthy efficiency awareness, when not taken to the extreme, results in robust development hygiene. It allows us to avoid silly performance mistakes that are hard to improve on in later development stages. Doing less work also often reduces the overall complexity of the code and improve code maintainability and extensibility.

Summary

I think it’s very common for developers to start their development process with compromises in mind. We often sit down with the attitude to negotiate to compromise certain software qualities from the very beginning. We are taught that we have to sacrifice efficiency, readability, testability etc., to accomplish our goals.

In this chapter, I would like to encourage you to hold out and not sacrifice or compromise any quality until it was demonstrated that there is no reasonable way we can achieve all of our goals. Maybe it’s not visible at the start, but many things they do have solutions. // https://youtu.be/3WBaY61c9sE?t=2872

Let’s try to be greedy in software development. Hopefully, at this point, you are aware that we have to think about performance, ideally from the early development stages. We learned what performance consists of. In addition, we learned that there are many misconceptions mentioned in “Common Performance Misconceptions” that are worth reconsidering or challenging when appropriate.

We need to be aware of the risk of premature pessimization and premature scalability as much as we need to think about avoiding premature optimizations.

Finally, we learned that efficiency in the performance equation might give us a bit of an advantage. It is easier to improve performance by improving efficiency first. It helped my students and me many times to effectively approach the subject of performance optimizations.

In the next chapter, we will walk through a detailed and opinionated introduction to Go. Of course, programming for efficiency is essential, but the key to that is knowledge. Let’s walk through things to help us achieve all our software quality goals!

1 One example could be an expensive software error causing Boeing 737 Max pilots to take control over the automation, causing two fatal crashes.

2 Out of memory is an undesired state of application where it uses more memory than the host have or the caller allocated for this process. We will learn about this mechanism more in XREF HERE

3 Donald Knuth stated this twice, once in Structured Programming with goto Statements (1974), second in Computer Programming as an Art (1974). There were debates if this quote should be attributed to C. A. R. Hoare instead, but more recent researches proved against that. The fact that people were trying to find the actual author of the quote speaks of the matter’s importance.

4 In a practical sense, there are limits to how fast our software can be. H. J. Bremermann in 1962 suggested there is a computational physical limit that depends on the mass of the system. We can estimate that 1 kg of the ultimate laptop can process ~1050 bits per second, while the computer with the mass of the Earth planet can process at a maximum of 1075 bits per second. While those numbers feel enormous, even such a large computer would take ages to brute force all chess movements estimated to 10^120 complexity. Those numbers have practical use in cryptography to assess the difficulty of cracking certain encryption algorithms.

5 I even did a small experiment on Twitter, proving this point.

6 UK Cambridge Dictionary defines the performance noun as “How well a person, machine, etc. does a piece of work or an activity”.

7 I would even recommend, with your changelog, sticking to common standard formats like https://keepachangelog.com/en/1.0.0/. This material also contains valuable tips on clean release notes.

8 It’s worth to mention, that hiding features or optimization can sometimes lead to lower readability. Sometimes explicitness is much better and avoids surprises.

9 As the part of the interface “contract”, there might be a comment stating that implementations should cache the result. Hence, the caller should be safe to call it many times. Still, I would argue that it’s better to avoid relying on something not assured by a type system to prevent surprises.

10 All of those three examples of Get implementations could be considered costly to invoke. Input-output (I/O) operations against the filesystem are significantly slower than reading or writing something from memory. Something that involves mutexes means you potentially have to wait on other threads before accessing it. Call to database usually involves all of those, plus potentially communication over the network.

11 This type of style is usually referred to as Hungarian Notation, which used extensively used in Microsoft. There are two types of this notation, too: App and System. Literature indicates that App’s Hungarian can still give many benefits.

12 It is worth highlighting that these days, it is recommended to write code in a way that is easily compatible with IDE functionalities, e.g. your code structure should be a “connected” graph. This means that you connect functions in a way that IDE can assist. Any dynamic dispatching, code injection and lazy loading disables those functionalities and should be avoided unless strictly necessary.

13 Cognitive load is the amount of “brain processing and memory” a person must use to understand a piece of code or function.

14 “Cachability” is often defined as an ability for being cached. It is possible to cache (save) any information to retrieve it later faster. However, the data might be valid only for a short time or only for a tiny amount of requests. If the data depends on external factors (e.g. user or input) and changes frequently, it’s not well cachable.

15 That is a simplification, of course. The process might have used more memory. Profiles do not show memory used by memory maps, stack, and many other caches required for modern application to work. We will learn more about this in XREF HERE

16 Cyril Northcote Parkinson was a British historian who articulated the management phenomenon, which is now known as Parkinson’s law. Stated as “Work expands to fill the time available for its completion,” it was initially referred to as the government office efficiency that highly correlates to the official’s number in the decision-making body.

17 At least that’s what my studying looked like. This phenomenon is also known as “Student Syndrome”.

18 PB means petabyte. One petabyte is 1000 TB. If we assume an average two hours long 4K movie takes 100GB, this means with 1 PB, we could store 10,000 movies, translating to roughly 2,3 years of constant watching.

19 1 zettabyte is 1 million PB, one billion of TB. I won’t even try to visualize this amount of data. (:

20 Dennard, Robert H.; “Design of ion-implanted MOSFET’s with very small physical dimension” (October 1974)

21 MOSFET stands for “metal–oxide–semiconductor field-effect transistor”, which is simply speaking an insulated-gate allowing to switch electronic signals. This particular technology is behind most memory chips and microprocessors produced between 1960 and now. It has proven to be highly scalable and capable of miniaturization. It is the most frequently manufactured device in history, with 13 sextillion pieces produced between 1960 and 2018, https://en.wikipedia.org/wiki/MOSFET

22 Funnily enough, marketing reasons led companies to hide the inability to reduce the size of transistors effectively by switching CPU generation naming convention from transistor gate length to size of the process. 14nm generation CPUs still have 70nm transistors, similar to 10, 7, and 5nm processes.

23 I am not joking, Microsoft has proven that running servers 40 meters underwater is a great idea that improves energy-efficiency.

24 M1 chip is a great example of interesting tradeoff, choosing speed and both energy and performance efficiency over the flexibility of hardware scaling

25 RISC-V is an open standard for the instruction set architecture allowing easier manufacturing compatible “reduced instruction set computer” chips. Such a set is much simpler and allows more optimized and specialized hardware than general-usage CPUs

26 To ensure developers understand and emphasize with users that have a slower connection, Facebook introduced “2G Tuesdays” that turn on the simulated 2G network mode on the Facebook app

27 That option is not as expensive as we might think. Instance type x1e.32xlarge costs $26.6 per hour, so “only” $19,418 per month

28 Even hardware management has to be different for machines with extremely large hardware. That’s why Linux kernels have special hugemem type of kernels that can manage up to 4 times more memory and ~8 times more logical cores for x86 systems.

29 CAP used to be core system design principle. Its acronym comes from Consistency, Availability and Partition Tolerance. It defines a simple rule that only two of those three can be achieved.

30 Denial of Service, is a state of the system that makes the system unresponsive, usually due to malicious attack. It can also be trigged “accidentally” by the unexpectedly large load.

31 Around 2015, it was the faster supercomputer in Poland, offering 530,5 TFLOPS and almost 2000 nodes, most of them with dedicated GPUs

32 InfiniBand is a high-performance network communication standard, especially popular before fabric optic was invented.

33 Funny enough, Mark Zuckerberg on F8 conference in 2014 announced a change of the famous motto to “Move fast with stable infra”.

34 http://radar.oreilly.com/2008/08/radar-theme-web-ops.html

35 “One example I see often, in the cloud-native world is moving logging stack from Elasticsearch to simpler solutions like Loki. Despite the lack of configurable indexing, the Loki project can offer better logging read performance with a smaller amount of resources

36 There are many good books around split brain phenomenons and its consequences. Some of them I can recommend are: “Thinking Fast and Slow” by Daniel Kahneman and “Everything is F*ucked” by Mark Manson

37 There is also another reason. “Efficient Go” name is very close to one of the best documentation pieces you might find about the Go programming language: “Effective Go”! It might also be one of the first pieces of information I have read about Go. It’s specific, actionable and I recommend reading it if you haven’t.

38 This situation is often called a “noisy neighbour”.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset