Chapter 1. What You Need to Know to Write Great Code

Write Great Code will teach you how to write code you can be proud of, code that will impress other programmers, code that will satisfy customers and prove popular with users, and code that people (customers, your boss, and so on) won’t mind paying top dollar to obtain. In general, the volumes in the Write Great Code series will discuss how to write software that achieves legendary status, eliciting the awe of other programmers.

1.1 The Write Great Code Series

Write Great Code: Understanding the Machine is the first of four volumes in the Write Great Code series. Writing great code requires a combination of knowledge, experience, and skill that programmers usually obtain only after years of mistakes and discoveries. The purpose of this series is to share with both new and experienced programmers a few decade’s worth of observations and experience. I hope that these books will help shorten the time and reduce the frustration that it takes to learn things “the hard way.”

This first volume, Understanding the Machine, is intended to fill in the low-level details that are often skimmed over in a typical computer science or engineering curriculum. The information in this volume is the foundation upon which great software is built. You cannot write efficient code without this information, and the solutions to many problems require a thorough grounding in this subject. Though I’m attempting to keep each volume as independent as possible of the others, Understanding the Machine might be considered a prerequisite for all the following volumes.

The second volume, Thinking Low-Level, Writing High-Level, will immediately apply the knowledge gained in this first volume. Thinking Low-Level, Writing High-Level will teach you how to analyze code written in a high-level language to determine the quality of the machine code that a compiler would generate for that code. Armed with this knowledge, you will be able to write high-level language programs that are nearly as efficient as programs handwritten in assembly language. High-level language programmers often get the mistaken impression that optimizing compilers will always generate the best machine code possible, regardless of the source code the programmer gives them. This simply isn’t true. The statements and data structures you choose in your source files can have a big impact on the efficiency of the machine code a compiler generates. By teaching you how to analyze the machine code your compiler generates, Thinking Low-Level, Writing High-Level will teach you how to write efficient code without resorting to assembly language.

There are many other attributes of great code besides efficiency, and the third volume in this series, Engineering Software, will cover some of those. Engineering Software will discuss how to create source code that is easily read and maintained by other individuals and how to improve your productivity without burdening you with the “busy work” that many software engineering books discuss. Engineering Software will teach you how to write code that other programmers will be happy to work with, rather than code that causes them to use some choice words about your capabilities behind your back.

Great code works. Therefore, I would be remiss not to include a volume on testing, debugging, and quality assurance. Whether you view software testing with fear or with disgust, or you feel it’s something that only junior engineers should get stuck doing, an almost universal truth is that few programmers properly test their code. This generally isn’t because programmers actually find testing boring or beneath them, but because they simply don’t know how to test their programs, eradicate defects, and ensure the quality of their code. As a result, few applications receive high-quality testing, which has led the world at large to have a very low opinion of the software engineering profession. To help overcome this problem, the fourth volume in this series, Testing, Debugging, and Quality Assurance, will describe how to efficiently test your applications without all the drudgery engineers normally associate with this task.

1.2 What This Volume Covers

In order to write great code, you need to know how to write efficient code, and to write efficient code, you must understand how computer systems execute programs and how abstractions found in programming languages map to the low-level hardware capabilities of the machine. This first volume teaches you the details of the underlying machine so you’ll know how to write software that best uses the available hardware resources. While efficiency is not the only attribute great code possesses, inefficient code is never great. So if you’re not writing efficient code, you’re not writing great code.

In the past, learning great coding techniques has required learning assembly language. While this is not a bad approach, it is overkill. Learning assembly language involves learning two related subjects: (1) machine organization and (2) programming in assembly language. While learning assembly language programming helps, the real benefits of learning assembly language come from learning machine organization at the same time. Few books have taught machine organization without also teaching assembly language programming. To rectify this problem, this book teaches machine organization independently of assembly language so you can learn to write great code without the excessive overhead of learning assembly language.

“So what is machine organization?” you’re probably wondering. Well, machine organization is a subset of computer architecture, and this book concentrates on those parts of computer architecture and machine organization that are visible to the programmer or are helpful for understanding why system architects chose a particular system design. The goal of learning machine organization is not to enable you to design your own CPU or computer system, but to teach you how to make the most efficient use of existing computer designs.

“Okay, so what is machine organization?” you’re probably still asking. Well, a quick glance at the table of contents will give you an idea of what this subject is all about. Let’s do a quick run-through of the book.

Chapter 2, Chapter 4, and Chapter 5 deal with basic computer data representation — how computers represent signed and unsigned integer values, characters, strings, character sets, real values, fractional values, and other numeric and nonnumeric quantities. If you do not have a solid understanding of how computers represent these various data types internally, it’s difficult to understand why some operations that use these data types are so inefficient. And if you don’t realize they’re inefficient, you’ll likely use them in an inappropriate fashion and the resulting code will not be great.

Chapter 3 discusses binary arithmetic and bit operations used by most modern computer systems. Because these operations are generally available in programming languages, Chapter 3 also offers several insights into how you can write better code by using arithmetic and logical operations in ways not normally taught in beginning programming courses. Learning standard “tricks” such as these is part of how you become a great programmer.

Chapter 6 begins a discussion of one of the more important topics in this book: memory organization and access. Memory access is a common performance bottleneck in modern computer applications. Chapter 6 provides an introduction to memory, discussing how the computer accesses its memory, and describing the performance characteristics of memory. This chapter also describes various machine code addressing modes that CPUs use to access different types of data structures in memory. In modern applications, poor performance often occurs because the programmer does not understand the ramifications of memory access in their programs, and Chapter 6 addresses many of these ramifications.

Chapter 7 returns to the discussion of data types and representation by covering composite data types and memory objects. Unlike the earlier chapters, Chapter 7 discusses higher-level data types like pointers, arrays, records, structures, and unions. All too often programmers use large composite data structures without even considering the memory and performance issues of doing so. The low-level description of these high-level composite data types will make clear their inherent costs enabling you to use them in your programs sparingly and wisely.

Chapter 8 discusses Boolean logic and digital design. This chapter provides the mathematical and logical background you’ll need to understand the design of CPUs and other computer system components. Although this particular chapter is more hardware oriented than the previous chapters, there are still some good ideas that you can incorporate into really great code. In particular, this chapter discusses how to optimize Boolean expressions, such as those found in common high-level programming language statements like if, while, and so on.

Continuing the hardware discussion begun in Chapter 8, Chapter 9 discusses CPU architecture. Although the goal of this book is not to teach you how to design your own CPU, a basic understanding of CPU design and operation is absolutely necessary if you want to write great code. By writing your code in a manner consistent with the way a CPU will execute that code, you’ll get much better performance using fewer system resources. By writing your applications at odds with the way CPUs execute code, you’ll wind up with slower, resource-hogging programs.

Chapter 10 discusses CPU instruction set architecture. Machine instructions are the primitive units of execution on any CPU, and the time spent during program execution is directly determined by the number and type of machine instructions the CPU executes. Understanding how computer architects design machine instructions can provide valuable insight into why certain operations take longer to execute than others. Once you understand the limitations of machine instructions and how the CPU interprets them, you can use this information to turn mediocre code sequences into great code sequences.

Chapter 11 returns to the subject of memory, covering memory architecture and organization. This chapter will probably be one of the most important to the individual wanting to write fast code. It describes the memory hierarchy and how to maximize the use of cache and other fast memory components. Great code avoids thrashing, a common source of performance problems in modern applications. By reading this chapter you will learn about thrashing and how to avoid low-performance memory access in your applications.

Chapter 12, describes how computer systems communicate with the outside world. Many peripheral (input/output) devices operate at much lower speeds than the CPU and memory. You can write the fastest executing sequence of instructions possible, and still have your application run slowly because you don’t understand the limitations of the I/O devices in your system. Chapter 12 presents a discussion of generic I/O ports, system buses, buffering, handshaking, polling, and interrupts. It also discusses how to effectively use many popular PC peripheral devices, including keyboards, parallel (printer) ports, serial ports, disk drives, tape drives, flash storage, SCSI, IDE/ATA, USB, and sound cards. Understanding the impact of these devices on your applications can help you write great, efficient code.

1.3 Assumptions This Volume Makes

For the purposes of this book, you should be reasonably competent in at least one imperative (procedural) programming language. This includes C and C++, Pascal, BASIC, and assembly, as well as languages like Ada, Modula-2, FORTRAN, and the like. You should be capable, on your own, of taking a small problem description and working through the design and implementation of a software solution for that problem. A typical semester or quarter course at a college or university (or several months’ experience on your own) should be sufficient background for this book.

At the same time, this book is not language specific; its concepts transcend whatever programming language(s) you’re using. To help make the examples more accessible to readers, the programming examples in this book will rotate among several languages (such as C/C++, Pascal, BASIC, and assembly). Furthermore, this book does not assume that you use or know any particular language. When presenting examples, this book explains exactly how the code operates so that even if you are unfamiliar with the specific programming language, you will be able to understand its operation by reading the accompanying description.

This book uses the following languages and compilers in various examples:

  • C/C++: GCC, Microsoft’s Visual C++, Borland C++

  • Pascal: Borland’s Delphi/Kylix

  • Assembly language: Microsoft’s MASM, HLA (the High Level Assembler), Gas (on the PowerPC)

  • BASIC: Microsoft’s Visual Basic

You certainly don’t need to know all these languages or have all these compilers to read and understand the examples in this book. Often, the examples appear in multiple languages, so it’s usually safe to ignore a specific example if you don’t completely understand the syntax of the language the example uses.

1.4 Characteristics of Great Code

What do we mean by great code? Different programmers will have different definitions for great code, so it is impossible to provide an all-encompassing definition that will satisfy everyone. However, there are certain attributes of great code that nearly everyone will agree upon, and we’ll use some of these common characteristics to form our definition. For our purposes, here are some attributes of great code:

  • Uses the CPU efficiently (which means the code is fast)

  • Uses memory efficiently (which means the code is small)

  • Uses system resources efficiently

  • Is easy to read and maintain

  • Follows a consistent set of style guidelines

  • Uses an explicit design that follows established software engineering conventions

  • Is easy to enhance

  • Is well-tested and robust (meaning that it works)

  • Is well-documented

We could easily add dozens of items to this list. Some programmers, for example, may feel that great code must be portable, that it must follow a given set of programming style guidelines, or that it must be written in a certain language (or that it must not be written in a certain language). Some may feel that great code must be written as simply as possible, while others may feel that great code is written quickly. Still others may feel that great code is created on time and under budget. You can probably think of additional characteristics.

So what is great code? Here is a reasonable definition:

Great code is software that is written using a consistent and prioritized set of good software characteristics. In particular, great code follows a set of rules that guide the decisions a programmer makes when implementing an algorithm as source code.

Two different programs do not have to follow the same set of rules (that is, they need not possess the same set of characteristics) in order for both to be great programs. As long as they each consistently obey their particular set of rules, they can both be examples of great code. In one environment, a great program may be one that is portable across different CPUs and operating systems. In a different environment, efficiency (speed) may be the primary goal, and portability may not be an issue. Both could be shining examples of great code, even though their goals might be mutually exclusive. Clearly, neither program would be an example of great code when examined according to the rules of the other program; but as long as the software consistently follows the guidelines established for that particular program, you can argue that it is an example of great code.

1.5 The Environment for This Volume

Although this book presents generic information, parts of the discussion will necessarily be specific to a particular system. Because the Intel Architecture PCs are, by far, the most common in use today, this book will use that platform when discussing specific system-dependent concepts. However, those concepts will still apply to other systems and CPUs (for example, the PowerPC CPU in the Power Macintosh or some other RISC CPU in a Unix box) though you may well need to research the solution for your specific platform when an example does not explicitly apply to your system.

Most examples appearing in this book run under both Windows and Linux. This book attempts to stick with standard library interfaces to the operating system (OS) wherever possible, and it makes OS-specific calls only when the alternative is to write “less than great” code.

Most of the specific examples in this book run on a late-model Intel Architecture (including AMD) CPU under Windows or Linux, with a reasonable amount of RAM and other system peripherals normally found on a late-model PC. The concepts, if not the software itself, will apply to Macs, Unix boxes, embedded systems, and even mainframes.

1.6 For More Information

No single book can completely cover everything about machine organization that you need to know in order to write great code. This book, therefore, concentrates on those aspects of machine organization that are most pertinent for writing great software, providing the 90 percent solution for those who are interested in writing the best possible code. To learn that last 10 percent of machine organization, you’re going to need additional resources.

  • Learn assembly language. Fluency in at least one assembly language will fill in many missing details that you just won’t get by learning machine organization alone. Unless you plan on using assembly language in your software systems, you don’t necessarily have to learn assembly language on the platform(s) to which you’re targeting your software. Probably your best bet, then, is to learn 80x86 assembly language on a PC. The Intel Architecture isn’t the best, but there are lots of great software tools for learning assembly language (for example, the High Level Assembler) that simply don’t exist on other platforms. The point of learning assembly language here is not so you can write assembly code, but rather to learn the assembly paradigm. If you know 80x86 assembly language, you’ll have a good idea of how other CPUs (such as the PowerPC or the IA-64 family) operate. Of course, if you need to write assembly code, you should learn the assembly language for the CPU you’ll be using. An excellent choice for learning assembly language is another book of mine, The Art of Assembly Language, available from No Starch Press.

  • Study advanced computer architecture. Machine organization is a subset of the study of computer architecture, but space limitations prevent covering machine organization and computer architecture in complete detail within this book. While you may not need to know how to design your own CPUs, studying computer architecture may teach you something you’ve missed in the presentation of machine organization in this book. Computer Architecture: A Quantitative Approach by Hennessy and Patterson is a well-respected textbook that covers this subject matter.

