Chapter Twenty-Seven

Coding

All computers execute machine code, but programming in machine code is like eating with a toothpick. The bites are so small and the process so laborious that dinner takes forever. Likewise, the bytes of machine code perform the tiniest and simplest imaginable computing tasks—loading a number from memory into the processor, adding it to another, storing the result back to memory—so that it’s difficult to imagine how they contribute to an entire meal.

We have at least progressed from that primitive era at the beginning of the previous chapter, when we were using switches on a control panel to enter binary data into memory. In that chapter, we discovered that we could write simple programs that let us use the keyboard and the video display to enter and examine hexadecimal bytes of machine code. This was certainly better, but it’s not the last word in improvements.

As you know, the bytes of machine code are associated with certain short mnemonics, such as MOV, ADD, JMP, and HLT, that let us refer to the machine code in something vaguely resembling English. These mnemonics are often written with operands that further indicate what the machine-code instruction does. For example, the 8080 machine-code byte 46h causes the microprocessor to move into register B the byte stored at the memory address referenced by the 16-bit value in the register pair HL. This is more concisely written as

MOV B,M

where the M stands for “memory.” The total collection of these mnemonics (with some additional features) is a programming language of a type called assembly language. It’s much easier to write programs in assembly than in machine code. The only problem is that the CPU can’t understand assembly language directly!

In the early days of working with such a primitive computer, you’d probably spend a lot of time writing assembly-language programs on paper. Only when you were satisfied that you had something that might work would you then hand-assemble it, which means that you’d convert the assembly-language statements to machine-code bytes by hand using a chart or other reference material, and then enter them into memory.

What makes hand assembling so hard are all the jumps and calls. To hand-assemble a JMP or CALL instruction, you have to know the exact binary address of the destination, and that is dependent on having all the other machine code instructions in place. It’s much better to have the computer do this conversion for you. But how would this be done?

You might first write a text editor, which is a program that allows you to type lines of text and save them as a file. (Unfortunately, you’d have to hand-assemble this program.) You could then create text files containing assembly-language instructions. You would also need to hand-assemble another program, called an assembler. This program would read a text file containing assembly-language instructions and convert those instructions into machine code, which would be saved in another file. The contents of that file could then be loaded into memory for execution.

If you were running the CP/M operating system on your 8080 computer, much of this work would already be done for you. You’d already have all the tools you need. The text editor is named ED.COM and lets you create and modify text files. (Simple modern-day text editors include Notepad in Windows, and TextEdit included in macOS on Apple computers.) Let’s suppose you create a text file with the name PROGRAM1.ASM. The ASM file type indicates that this file contains an assembly-language program. The file might look something like this:

      ORG 0100h
      LXI DE,Text
      MVI C,9
      CALL 5
      RET
Text: DB 'Hello!$'
      END

This file has a couple of statements we haven’t seen before. The first one is an ORG (for Origin) statement. This statement does not correspond to an 8080 instruction. Instead, it indicates that the address of the next statement is to begin at address 0100h, which is the address where CP/M loads programs into memory.

The next statement is an LXI (Load Extended Immediate) instruction, which loads a 16-bit value into the register pair DE. This is one of several Intel 8080 instructions that my CPU doesn’t implement. In this case, that 16-bit value is given as the label Text. That label is located near the bottom of the program in front of a DB (Data Byte) statement, something else we haven’t seen before. The DB statement can be followed by several bytes separated by commas or (as I do here) by some text in single quotation marks.

The MVI (Move Immediate) statement moves the value 9 into register C. The CALL 5 statement makes a call into the CP/M operating system, which looks at the value in register C and jumps to the appropriate function. That function displays a string of characters beginning at the address given by the DE register pair and stopping when a dollar sign is encountered. (You’ll notice that the text in the last line of the program ends with a dollar sign. The use of a dollar sign to signify the end of a character string is quite odd, but that’s the way CP/M happens to work.) The final RET statement ends the program and returns control to CP/M. (That’s actually one of several ways to end a CP/M program.) The END statement indicates the end of the assembly-language file.

So now you have a text file containing seven lines of text. The next step is to assemble it. CP/M includes a program named ASM.COM, which is the CP/M assembler. You run ASM.COM from the CP/M command line like this:

ASM PROGRAM1.ASM

The ASM program examines the file PROGRAM1.ASM and creates a new file, named PROGRAM1.COM, that contains the machine code corresponding to the assembly-language statements that we wrote. (Actually there’s another step in the process, but it’s not important in this account of what happens.)

The PROGRAM1.COM file contains the following 16 bytes:

11 09 01 0E 09 CD 05 00 C9 48 65 6C 6C 6F 21 24

The first 3 bytes are the LXI instruction, the next 2 are the MVI instruction, the next 3 are the CALL instruction, and the next is the RET instruction. The last 7 bytes are the ASCII characters for the five letters of “Hello,” the exclamation point, and the dollar sign. You can then run the PROGRAM1 program from the CP/M command line:

PROGRAM1

The operating system loads that program into memory and runs. Appearing on the screen will be the greeting

Hello!

An assembler such as ASM.COM reads an assembly-language program (often called a source-code file) and writes out to a file containing machine code—an executable file. In the grand scheme of things, assemblers are fairly simple programs because there’s a one-to-one correspondence between the assembly-language mnemonics and machine code. The assembler works by separating each line of text into mnemonics and arguments and then comparing these small words and letters with a list that the assembler maintains of all the possible mnemonics and arguments. This is a process called parsing, and it involves a lot of CMP instructions followed by conditional jumps. These comparisons reveal which machine-code instructions correspond to each statement.

The string of bytes contained in the PROGRAM1.COM file begins with 11h, which is the LXI instruction. This is followed by the bytes 09h and 01h, which constitute the 16-bit address 0109h. The assembler figures out this address for you: If the LXI instruction itself is located at 0100h (as it is when CP/M loads the program into memory to run), address 0109h is where the text string begins. Generally a programmer using an assembler doesn’t need to worry about the specific addresses associated with different parts of the program.

The first person to write the first assembler had to hand-assemble the program, of course. A person who writes a new (perhaps improved) assembler for the same computer can write it in assembly language and then use the first assembler to assemble it. Once the new assembler is assembled, it can assemble itself.

Every time a new microprocessor is developed, a new assembler is needed. The new assembler, however, can first be written on an existing computer using that computer’s assembler. This is called a cross-assembler. The assembler runs on Computer A but creates code that runs on Computer B.

An assembler eliminates the less creative aspects of assembly-language program (the hand-assembling part), but assembly language still has two major problems. You’ve probably already surmised that the first problem is that programming in assembly language can be very tedious. You’re working down on the level of the CPU, and you have to worry about every little thing.

The second problem is that assembly language isn’t portable. If you were to write an assembly-language program for the Intel 8080, it would not run on the Motorola 6800. You must rewrite the program in 6800 assembly language. This probably won’t be as difficult as writing the original program because you’ve already solved the major organizational and algorithmic problems. But it’s still a lot of work.

Much of what computers do is mathematical calculation, but the way that math is carried out in assembly language is clumsy and awkward. It would be much preferable to instead express mathematical operations using a time-honored algebraic notation, for example:

Angle = 27.5
Hypotenuse = 125.2
Height = Hypotenuse × Sine(Angle)

If this text were actually part of a computer program, each of the three lines would be known as a statement. In programming, as in algebra, names such as Angle, Hypotenuse, and Height are called variables because they can be set to different values. The equals sign indicates an assignment: The variable Angle is set to the value 27.5, and Hypotenuse is set to 125.2. Sine is a function. Somewhere is some code that calculates the trigonometric sine of an angle and returns that value.

Keep in mind also that these numbers are not the integers common in assembly language; these are numbers with decimal points and fractional parts. In computing lingo, they are known as floating-point numbers.

If such statements were in a text file, it should be possible to write an assembly-language program that reads the text file and converts the algebraic expressions to machine code to perform the calculation. Well, why not?

What you’re on the verge of creating here is known as a high-level programming language. Assembly language is considered a low-level language because it’s very close to the hardware of the computer. Although the term high-level is used to describe any programming language other than assembly language, some languages are higher level than others. If you were the president of a company and you could sit at your computer and type in (or better yet, just prop your feet up on the desk and dictate) “Calculate all the profits and losses for this year, write up an annual report, print off a couple of thousand copies, and send them to all our stockholders,” you would be working with a very high-level language indeed! In the real world, programming languages don’t come anywhere close to that ideal.

Human languages are the result of thousands of years of complex influences, random changes, and adaptations. Even artificial languages such as Esperanto betray their origins in real language. High-level computer languages, however, are more deliberate conceptions. The challenge of inventing a programming language is quite appealing to some people because the language defines how a person conveys instructions to the computer. When I wrote the first edition of this book, I found a 1993 estimate that there had been over 1000 high-level languages invented and implemented since the beginning of the 1950s. At year-end 2021, a website entitled the Online Historical Encyclopedia of Programming Languages (hopl.info) puts the total at 8,945.

Of course, it’s not enough to simply define a high-level language, which involves developing a syntax to express all the things you want to do with the language. You must also write a compiler, which is the program that converts the statements of your high-level language to machine code. Like an assembler, a compiler must read through a source-code file character by character and break it down into short words and symbols and numbers. A compiler, however, is much more complex than an assembler. An assembler is simplified somewhat because of the one-to-one correspondence between assembly-language statements and machine code. A compiler usually must translate a single statement of a high-level language into many machine-code instructions. Compilers aren’t easy to write. Whole books are devoted to their design and construction.

High-level languages have advantages and disadvantages. A primary advantage is that high-level languages are usually easier to learn and to program in than assembly languages. Programs written in high-level languages are often clearer and more concise. High-level languages are often portable—that is, they aren’t dependent on a particular processor, as are assembly languages. They allow programmers to work without knowing about the underlying structure of the machine on which the program will be running. Of course, if you need to run the program on more than one processor, you’ll need compilers that generate machine code for those processors. The actual executable files are still specific to individual CPUs.

On the other hand, it’s very often the case that a good assembly-language programmer can write faster and more efficient code than a compiler can. What this means is that an executable produced from a program written in a high-level language will be larger and slower than a functionally identical program written in assembly language. (In recent years, however, this has become less obvious as microprocessors have become more complex and compilers have also become more sophisticated in optimizing code.)

Although a high-level language generally makes a processor much easier to use, it doesn’t make it any more powerful. Some high-level languages don’t support operations that are common on CPUs, such as bit shifting and bit testing. These tasks might be more difficult using a high-level language.

In the early days of home computers, most application programs were written in assembly language. These days, however, assembly languages are rarely used except for special purposes. As hardware has been added to processors that implements pipelining—the progressive execution of several instruction codes simultaneously—assembly language has become trickier and more difficult. At the same time, compilers have become more sophisticated. The larger storage and memory capacity of today’s computers has also played a role in this trend: Programmers no longer feel the need to create code that runs in a small amount of memory and fits on a small diskette.

Interim Archives/Getty Images

Designers of early computers attempted to formulate problems for them in algebraic notation, but the first real working compiler is generally considered to be Arithmetic Language version 0 (or A-0), created for the UNIVAC by Grace Murray Hopper (1906–1992) at Remington-Rand in 1952. Dr. Hopper also coined the term “compiler.” She got an early start with computers when she worked for Howard Aiken on the Mark I in 1944. In her eighties, she was still working in the computer industry doing public relations for Digital Equipment Corporation (DEC).

The oldest high-level language still in use today (although extensively revised over the years) is FORTRAN. Many early computer languages have made-up names that are written in uppercase because they’re acronyms of sorts. FORTRAN is a combination of the first three letters of FORmula and the first four letters of TRANslation. It was developed at IBM for the 704 series of computers in the mid-1950s. For many years, FORTRAN was considered the language of choice for scientists and engineers. It has very extensive floating-point support and even supports complex numbers, which are combinations of real and imaginary numbers.

COBOL—which stands for COmmon Business Oriented Language—is another old programming language that is still in use, primarily in financial institutions. COBOL was created by a committee of representatives from American industries and the US Department of Defense beginning in 1959, but it was influenced by Grace Hopper’s early compilers. In part, COBOL was designed so that managers, while probably not doing the actual coding, could at least read the program code and check that it was doing what it was supposed to be doing. (In real life, however, this rarely occurs.)

An extremely influential programming language that is not in use today (except possibly by hobbyists) is ALGOL. ALGOL stands for ALGOrithmic Language, but ALGOL also shares its name with the second brightest star in the constellation Perseus. Originally designed by an international committee in 1957 and 1958, ALGOL is the direct ancestor of many popular general-purpose languages of the past half century. It pioneered a concept eventually known as structured programming. Even today, sometimes people refer to “ALGOL-like” programming languages.

ALGOL established programming constructs that are now common to nearly all programming language. These were associated with certain keywords, which are words within the programming language to indicate particular operations. Multiple statements were combined into blocks, which were executed under certain conditions or with a particular number of iterations.

The if statement executes a statement or block of statement based on a logical condition—for example, if the variable height is less than 55. The for statement executes a statement or block of statements multiple times, usually based on incrementing a variable. An array is a collection of values of the same type—for example, the names of cities. Programs were organized into blocks and functions.

Although versions of FORTRAN, COBOL, and ALGOL were available for home computers, none of them had quite the impact on small machines that BASIC did.

BASIC (Beginner’s All-purpose Symbolic Instruction Code) was developed in 1964 by John Kemeny and Thomas Kurtz, of the Dartmouth Mathematics department, in connection with Dartmouth’s time-sharing system. Most students at Dartmouth weren’t math or engineering majors and hence couldn’t be expected to mess around with the complexity of computers and difficult program syntax. A Dartmouth student sitting at a terminal could create a BASIC program by simply typing BASIC statements preceded by numbers. The numbers indicated the order of the statements in the program. The first BASIC program in the first published BASIC instruction manual was

10 LET X = (7 + 8) / 3
20 PRINT X
30 END

Many subsequent implementations of BASIC have been in the form of interpreters rather than compilers. While a compiler reads a source-code file and creates an executable file of machine code, an interpreter reads source code and executes it directly without creating an executable file. Interpreters are easier to write than compilers, but the execution time of the interpreted program tends to be slower than that of a compiled program. On home computers, BASIC got an early start when buddies Bill Gates (born 1955) and Paul Allen (born 1953) wrote a BASIC interpreter for the Altair 8800 in 1975 and jump-started their company, Microsoft Corporation.

The Pascal programming language inherited much of its structure from ALGOL but included features from COBOL. Pascal was designed in the late 1960s by Swiss computer science professor Niklaus Wirth (born 1934). It was quite popular with early IBM PC programmers, but in a very specific form: the product Turbo Pascal, introduced by Borland International in 1983 for the bargain price of $49.95. Turbo Pascal was written by Danish student Anders Hejlsberg (born 1960) and came complete with an integrated development environment (or IDE). The text editor and the compiler were combined in a single program that facilitated very fast programming. Integrated development environments had been popular on large mainframe computers, but Turbo Pascal heralded their arrival on small machines.

Pascal was also a major influence on Ada, a language developed for use by the United States Department of Defense. The language was named after Augusta Ada Byron, who appeared in Chapter 15 as the chronicler of Charles Babbage’s Analytical Engine.

And then there’s C, a much-beloved programming language created between 1969 and 1973 largely by Dennis M. Ritchie at Bell Telephone Laboratories. People often ask why the language is called C. The simple answer is that it was derived from an early language called B, which was a simplified version of BCPL (Basic CPL), which was derived from CPL (Combined Programming Language).

Most programming languages seek to eliminate remnants of assembly language such as memory addresses. But C does not. C includes a feature called the pointer, which is basically a memory address. Pointers were very convenient for programmers who knew how to use them, but dangerous for nearly everyone else. By their ability to write over important areas of memory, pointers were a common source of bugs. Programmer Alan I. Holub wrote a book about C entitled Enough Rope to Shoot Yourself in the Foot.

C became the grandparent for a series of languages that were safer than C and added the facility to work with objects, which are programming entities that combine code and data in a very structured way. The most famous of these languages are C++, created by Danish computer scientist Bjarne Stroustrup (born 1950) in 1985; Java, designed by James Gosling (born 1955) at the Oracle Corporation in 1995; and C#, originally designed by Anders Hejlsberg at Microsoft in 2000. At the time of this writing, one of the most used programming languages is another C-influenced language called Python, originally designed by Dutch programmer Guido von Rossum (born 1956) in 1991. But if you’re reading this book in the 2030s or 2040s, you might be familiar with languages that haven’t even been invented yet!

Different high-level programming languages compel the programmer to think in different ways. For example, some newer programming languages focus on manipulating functions rather than variables. These are referred to as functional programming languages, and for a programmer accustomed to working with conventional procedural languages, they can initially seem quite strange. Yet they offer alternative solutions that can inspire programmers to entirely reorient their way of approaching problems. Regardless of the language, however, the CPU still executes the same old machine code.

Yet there are ways in which software can smooth over the differences among various CPUs and their native machine codes. Software can emulate various CPUs, allowing people to run old software and ancient computer games on modern computers. (This is nothing new: When Bill Gates and Paul Allen decided to write a BASIC interpreter for the Altair 8800, they tested it on an Intel 8080 emulator program that they wrote on a DEC PDP-10 mainframe computer at Harvard University.) Java and C# can be compiled into machine-code-like intermediate code that is then converted into machine code when the program is executed. A project called LLVM is intended to provide a virtual link between any high-level programming language and any set of instructions implemented by a CPU.

This is the magic of software. With sufficient memory and speed, any digital computer can do anything that any other digital computer can do. This is the implication of Alan Turing’s work on computability in the 1930s.

Yet what Turing also demonstrated is that there are certain algorithmic problems that will forever be out of reach of the digital computer, and one of these problems has startling implications: You can’t write a computer program that determines if another computer program is working correctly! This means that we can never be assured that our programs are working the way they should.

This is a sobering thought, and it’s why extensive testing and debugging are so important a part of the process of developing software.

One of the most successful C-influenced languages is JavaScript, originally designed by Brendan Eich (born 1961) at Netscape and first appearing in 1995. JavaScript is the language that webpages use to provide interactive capabilities that go beyond the simple presentation of text and bitmaps managed by HTML, the Hypertext Markup Language. As of this writing, almost 98% of the top 10 million websites use at least some JavaScript.

All web browsers in common use today understand JavaScript, which means that you can begin writing JavaScript programs on a desktop or laptop computer without downloading or installing any additional programming tools.

So… would you like to experiment with some JavaScript yourself?

All you need do is create an HTML file that contains some JavaScript using the Windows Notepad or macOS TextEdit program. You save it to a file and then load it into your favorite web browser, such as Edge, Chrome, or Safari.

On Windows, run the Notepad program. (You might need to find it using the Search facility on the Start menu.) It’s ready for you to type in some text.

On macOS, run the TextEdit program. (You might need to locate it using Spotlight Search.) On the first screen that comes up, click the New Document button. TextEdit is designed to create a rich-text file that contains text formatting information. You don’t want that. You want a plain-text file, so in the Format menu, select Make Plain Text. Also, in the Edit menu’s Spelling and Grammar section, deselect the options to check and correct your spelling.

Now type in the following:

<html>
    <head>
        <title>My JavaScript</title>
    </head>
    <body>
        <p id="result">Program results go here!</p>
        <script>
            // JavaScript programs go here
        </script>
    </body>
</html>

This is HTML, which is based around tags that surround various sections of the file. The whole file begins with an <html> tag and ends with the </html> tag that encloses everything else. Within that, the <head> section encloses a <title> that will appear at the top of the webpage. The <body> section encloses a <p> (“paragraph”) with the text “Program results go here!”

The <body> section also encloses a <script> section. That’s where your JavaScript programs will reside. There’s already a little program there that consists solely of a line that begins with two slashes. Those two slashes indicate that this line is a comment. Everything following the two slashes to the end of the line is for the benefit of humans reading the program. It is ignored when the JavaScript is executed.

As you type these lines into Notepad or TextEdit, you don’t need to indent everything as I’ve done. You can even put much of it on the same line. But for sanity’s sake, put the <script> and </script> tags on separate lines.

Now save that file someplace: In either Notepad or TextEdit, select Save from the File menu. Select a location to save the file; the computer’s Desktop is convenient. Name the file MyJavaScriptExperiment.html or something similar. The filename extension following the period is very important. Make sure that it’s html. TextEdit will ask you to verify that’s what you really want. You do!

After you save the file, don’t close Notepad or TextEdit just yet. Keep it open so you can make additional changes to the file.

Now find that file you just saved and double-click it. Windows or macOS should load that file into your default web browser. The title of the webpage should be “My JavaScript” and the upper-left corner of the webpage should say “Program results go here!” If not, check that everything has been typed into the file without errors.

Here’s the process for experimenting with JavaScript: In Notepad or TextEdit you enter some JavaScript between the <script> and </script> tags and then save the file again. Now go to the web browser and refresh the page, probably by clicking a circular arrow icon. In this way, you can run a different JavaScript program or a variation of some program with two steps: Save the new version of the file; then refresh the page in the web browser.

Here’s a reasonable first program that you can type in the area between the <script> and </script> tags:

let message = "Hello from my JavaScript program!";
document.getElementById("result").innerHTML = message;

This program contains two statements, each occupying a different line and ending with a semicolon.

In the first statement, the word let is a JavaScript keyword (meaning that it’s a special word that has meaning within JavaScript), and message is a variable. You can use the let keyword to set that variable to something, and you can later set it to something else. You don’t need to use the word message. You can use msg or anything else that begins with a letter and doesn’t contain spaces or punctuation. In this program, the variable message is set to a string of characters that begin and end with quotation marks. You can put whatever message you want between those quotation marks.

The second statement is definitely more obscure and complex, but it is required to allow the JavaScript to interact with the HTML. The keyword document refers to the webpage. Within the webpage, getElementById searches for an HTML element with the name “result.” That’s the <p> tag, and innerHTML means to put the contents of the message variable between the <p> and </p> tags as if you had originally typed it there.

This second statement is long and messy because JavaScript must be capable of accessing or altering anything on the webpage, so it must be flexible enough to do that.

Compilers and interpreters are fussier about spelling than old-fashioned English teachers, so be sure to type that second statement as shown! JavaScript is a case-sensitive language, which means that it differentiates between uppercase and lowercase. Make sure you’ve typed innerHTML correctly; the words InnerHTML or innerHtml won’t work! That’s why you want to turn off spelling correction in the macOS TextEdit program. Otherwise, TextEdit will change let to Let, and that won’t work.

When you save this new version of the file and refresh the page in the web browser, you’ll see that message in the upper-left corner. If you don’t, check your work!

Let’s try another simple program using the same file. If you don’t want to delete the program that you already wrote, put it between these two special sequences of symbols:

/*
let message = "Hello from my JavaScript program!";
document.getElementById("result").innerHTML = message;
*/

To JavaScript, anything between /* and */ is treated as a comment and ignored. Like many languages influenced by C, JavaScript has two kinds of comments: multiline comments using /* and */, and single-line comments using //.

The next program does some arithmetic:

let a = 535.43;
let b = 289.771;
let c = a * b;
document.getElementById("result").innerHTML = c;

As in many programming languages, multiplication is specified by an asterisk rather than a times sign because the standard multiplication sign is not part of the ASCII character set.

Notice that the last statement is the same as the previous program except now the inner HTML between the <p> tags is being set to the variable c, which is the product of the two numbers. JavaScript doesn’t care if you set the inner HTML to a string of characters or to a number. It’ll do what’s necessary to display the result.

One of the most important features in high-level languages is the loop. You’ve seen how loops are done in assembly language with the JMP instruction and conditional jumps. Some high-level languages include a statement called goto that is very similar to a jump. But goto statements are discouraged except for special purposes. A program that requires many jumps soon becomes very difficult to manage. The technical term is spaghetti code, because the jumps seem to get all tangled up with each other. For this reason, JavaScript doesn’t even implement a goto.

Modern high-level programming languages manage loops without jumping all over the place. For example, suppose you want to add all the numbers between 1 and 100. Here’s one way to write that program with a JavaScript loop:

let total = 0;
let number = 1;

while (number <= 100)
{
    total = total + number;
    number = number + 1;
}

document.getElementById("result").innerHTML = total;

Don’t worry about the blank lines. I use those to separate various parts of the program for clarity. It begins with an initialization section where two variables are set to initial values. The loop consists of the while statement and the block of code between the curly braces. If the number variable is less than or equal to 100, the block of code is executed. This adds number to total and increases number by 1. When number becomes greater than 100, the program continues with the statement following the right curly bracket. That statement displays the result.

You might be puzzled if you encounter an algebra problem with these two statements:

total = total + number;
number = number + 1;

How can total equal total plus number? Doesn’t that mean that number is zero? And how can number equal number plus one?

In JavaScript, the equals sign doesn’t symbolize equality. It is instead an assignment operator. The variable on the left of the equals sign is set to the value calculated on the right of the equals sign. In other words, the value on the right of the equals sign “goes into” the variable on the left. In JavaScript (as in C), testing whether two variables are equal involves two equals signs (==).

For these two statements, JavaScript implements a couple of shortcuts that it has borrowed from C. These two statements can be abbreviated like this:

total += number;
number += 1;

The combination of the plus sign and the equals sign means to add what’s on the right to the variable on the left.

It is very common for variables to be incremented by 1, such as number is here, so the statement that increments number can be abbreviated like this:

number++;

Moreover, the two statements can be combined into one:

total += number++;

The value of number is added to total and then number is increased by 1! But that might be a bit obscure and confusing to people who are not quite as skilled at programming as you are, so you might want to avoid it.

Another common way to write this program is with a loop based on the keyword for:

let total = 0;

for (let number = 1; number <= 100; number++)
{
    total += number;
}
document.getElementById("result").innerHTML = total;

The for statement contains three clauses separated by semicolons: The first initializes the number variable to 1. The block of code within the curly brackets is executed only if the second clause is true—that is, if number is less than or equal to 100. After that block of code is executed, number is incremented. Moreover, because the block of code contains only one statement, the curly braces can be removed.

Here’s a little program that loops through numbers from 1 to 100 and displays the square roots of those numbers:

for (let number = 1; number <= 100; number++)
{
    document.getElementById("result").innerHTML +=
        "The square root of " + number + " is " +
            Math.sqrt(number) + "<br />";
}

The block of code executed within the loop is only one statement, but the statement is so long that I’ve written it on three lines. Notice that the first of these three lines ends with +=, meaning that what follows is added to the inner HTML of the <p> tag, creating more text with each iteration of the loop. What’s added to the inner HTML is a combination of text and numbers. Notice in particular Math.sqrt, which is a JavaScript function that calculates the square root. It’s part of the JavaScript language. (Such a function is sometimes called a builtin function.) Also notice the <br /> tag, which is an HTML line break.

When the program is finished, you’ll see a long list of text. You’ll probably need to scroll the page to see all of it!

The next program that I’ll show you here implements a famous algorithm for finding prime numbers, called the sieve of Eratosthenes. Eratosthenes (176–194 BCE) was the librarian of the legendary library at Alexandria and is also remembered for accurately calculating the circumference of the earth.

Prime numbers are those whole numbers that are equally divisible only by themselves and 1. The first prime number is 2 (the only even prime number), and the primes continue with 3, 5, 7, 11, 13, 17, 19, 23, 29, and so forth.

Eratosthenes’s technique begins with a list of the positive whole numbers beginning with 2. Because 2 is a prime number, cross out all the numbers that are multiples of 2. (That’s all the even numbers except 2.) Those numbers aren’t primes. Because 3 is a prime number, cross out all the numbers that are multiples of 3. We already know 4 isn’t a prime number because it has been crossed out. The next prime is 5, so cross out all the multiples of 5. Continue in this way. What you have left are the prime numbers.

This JavaScript program implementing this algorithm uses a common programming entity called an array. An array is much like a variable in that it has a name, but the array stores multiple items, each of which is referenced by an index enclosed in square brackets following the array name.

The array in this program is named primes, and it contains 10,000 Boolean values. In JavaScript, Boolean values are either true or false, which are JavaScript keywords. (You’ve been familiar with this concept since Chapter 6!)

Here’s how the program creates the array named primes and how it initially sets all the values of that array to true:

let primes = [];

for (let index = 0; index < 10000; index++)
{
    primes.push(true);
}

There’s a much shorter way to do this, but it’s a little more obscure:

let primes = new Array(10000).fill(true);

The main calculation involves two for loops, one inside the other. (The second for loop is said to be nested in the first.) Two variables for indexing the array are required, and instead of using variations of the word index, I’ve used the much shorter i1 and i2. Variables names can include numbers, but the names must begin with letters:

for (let i1 = 2; i1 <= 100; i1++)
{
    if (primes[i1])
    {
        for (let i2 = 2; i2 < 10000 / i1; i2++)
        {
            primes[i1 * i2] = false;
        }
    }
}

The first for loop increments the i1 variable from 2 to 100, which is the square root of 10,000. The if statement executes the next part only if that array element is true, indicating that it’s a prime. That second loop begins increasing the i2 variable from 2. The product of i1 and i2 is therefore 2 times i1, 3 times i1, 4 times i1, and so forth, and those numbers are not prime, so the array element is set to false.

It might seem odd to increase i1 only up to 100, and i2 only up to 10,000 divided by i1, but that’s all that’s necessary to encompass all the primes up to 10,000.

The final part of the program displays the results:

for (let index = 2; index < 10000; index++)
{
    if (primes[index])
    {
        document.getElementById("result").innerHTML +=
            index + " ";
    }
}

If programming in JavaScript interests you, please do not continue using Notepad or TextEdit! There are much better tools available that will let you know when you’ve spelled something incorrectly or blundered in some other way.

Images

If you’d like to examine some simple JavaScript programs that have been heavily commented to provide a type of tutorial, see this chapter section on CodeHiddenLanguage.com.

Sometimes people squabble over whether programming is an art or a science. On the one hand, college curricula are called Computer Science, but on the other hand, you have books such as Donald Knuth’s famous The Art of Computer Programming series. Programming has elements of both science and art, but it’s really something else. “Rather,” wrote physicist Richard Feynman, “computer science is like engineering—it is all about getting something to do something.”

Quite often, this is an uphill battle. As you might have discovered, it is very easy to make errors in computer programs and to spend much time tracking down those errors. Debugging is an art (or a science, or a feat of engineering) in itself.

What you’ve seen is just the proverbial tip of the iceberg in JavaScript programming. But history tells us to be cautious around icebergs! Sometimes, the computer itself does something unexpected. For example, try this little JavaScript program:

let a = 55.2;
let b = 27.8;
let c = a * b;
document.getElementById("result").innerHTML = c;

What this program displays is 1534.5600000000002, which doesn’t look right, and it’s not right. The correct result is simply 1534.56.

What happened?

Floating-point numbers are exceptionally important in computing, so a standard was established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE), and also recognized by the American National Standards Institute (ANSI). ANSI/IEEE Std 754-1985 is called the IEEE Standard for Binary Floating-Point Arithmetic. It is not very lengthy as standards go—just 18 pages—but it spells out the details of encoding floating-point numbers in a convenient manner. It’s one of the most important standards in all computing and is used by virtually all contemporary computers and computer programs that you’re likely to encounter.

The IEEE floating-point standard defines two basic formats: single precision, which requires 4 bytes per number, and double precision, which requires 8 bytes per number. Some programming languages give you a choice which to use; JavaScript uses double precision exclusively.

The IEEE standard is based on the representation of numbers in scientific notation, where a number is in two parts: a significand or mantissa is multiplied by 10 to an integer power called the exponent:

42,705.7846=4.27057846×104

This particular representation is referred to as a normalized format because the mantissa has only one digit to the left of the decimal point.

The IEEE standard represents floating-point numbers in the same way, but in binary. All of the binary numbers that you’ve seen in this book so far have been integers, but it’s also possible to use binary notation for fractional numbers. For example, consider this binary number:

101.1101

Don’t call that period a “decimal point”! Because this is a binary number, that period is more properly called a binary point. The digits to the left of the binary point compose the integer part, and the digits to the right of the binary point compose the fractional point.

When converting binary to decimal in Chapter 10, you saw how digits correspond to powers of 2. Digits to the right of the binary point are similar except they correspond to negative powers of 2. The binary number 101.1101 can be converted to decimal by multiplying the bits by the corresponding positive and negative powers of 2 from left to right:

1×22+0×21+1×20+1×21+1×22+0×23+1×24

Those negative powers of two can be calculated by starting at 1 and repeatedly dividing by 2:

1×4+0×2+1×1+1×0.5+1×0.25+0×0.125+1×0.0625

By this calculation, the decimal equivalent of 101.1101 is 5.8125.

In the normalized form of decimal scientific notation, the significand has only one digit to the left of the decimal point. Similarly, in binary scientific notation, the normalized significand also has only one digit to the left of the binary point. The number 101.1101 is expressed as

1.011101×22

One implication of this rule is that a normalized binary floating-point number always has a 1 and nothing else at the left of the binary point.

The IEEE standard for a double-precision floating-point number requires 8 bytes. The 64 bits are allocated like this:

Images

Because the significand of a normalized binary floating-point number always has a 1 to the left of the binary point, that bit is not included in the storage of floating-point numbers in the IEEE format. The 52-bit fractional part of the significand is the only part stored. So even though only 52 bits are used to store the significand, the precision is said to be 53 bits. You’ll get a feel for what 53-bit precision means in a moment.

The 11-bit exponent part can range from 0 through 2047. This is called a biased exponent, because a number called the bias must be subtracted from the exponent for the signed exponent that actually applies. For double-precision floating-point numbers, this bias is 1023.

The number represented by these values of s (the sign bit), e (the exponent), and f (the significand fraction) is

(1)s×1.f×2e1023

That negative 1 to the s power is a mathematician’s annoyingly clever way of saying, “If s is 0, the number is positive (because anything to the 0 power equals 1); and if s is 1, the number is negative (because −1 to the 1 power is −1).”

The next part of the expression is 1.f, which means a 1 followed by a binary point, followed by the 52 bits of the significand fraction. This is multiplied by 2 to a power. The exponent is the 11-bit biased exponent stored in memory minus 1023.

I’m glossing over a few details. For example, with what I’ve described, there’s no way to represent zero! This is a special case, but the IEEE standard can also accommodate negative zero (to represent very small negative numbers), positive and negative infinity, and a value known as NaN, which stands for “Not a Number.” These special cases are an important part of the floating-point standard.

The number 101.1101 that I used for an example earlier is stored with a 52-bit mantissa of

0111010000000000000000000000000000000000000000000000

I’ve put spaces every four digits to make it more readable. The biased exponent is 1025, so the number is

1.011101×210251023=1.011101×22

Aside from zero, the smallest positive or negative double-precision floating-point number is

1.0000000000000000000000000000000000000000000000000000×21022

That’s 52 zeros following the binary point. The largest is

1.1111111111111111111111111111111111111111111111111111×21023

The range in decimal is approximately 2.2250738585072014 × 10–308 to 1.7976931348623158 × 10308. Ten to the 308th power is a very big number. It’s 1 followed by 308 decimal zeros.

The 53 bits of the significand (including the 1 bit that’s not included) is a resolution approximately equivalent to 16 decimal digits, but it does have limits. For example, the two numbers 140,737,488,355,328.00 and 140,737,488,355,328.01 are stored exactly the same. In your computer programs, these two numbers are identical.

Another problem is that the vast majority of decimal fractions are not stored exactly. For example, consider the decimal number 1.1. This is stored with a 52-bit mantissa of

0001100110011001100110011001100110011001100110011010

That’s the fractional part to the right of the binary point. The complete binary number for decimal 1.1 is this:

1.0001100110011001100110011001100110011001100110011010

If you start converting this number to decimal, you’ll start like this:

1+23+24+27+28+211+

This is equivalent to

1+0.0625+0.03125+0.00390625+0.001953125+0.000244140625+

And eventually you’ll find that it doesn’t equal decimal 1.1 but instead equals

1.10000000000000008881

And once you start performing arithmetic operations on numbers that are not represented exactly, you’ll also get results that are not exact. And that’s why JavaScript indicates that multiplying 55.2 and 27.8 results in 1534.5600000000002.

We are accustomed to thinking about numbers as existing in a continuum without any gaps. By necessity, however, computers store discrete values. The study of discrete mathematics provides some theoretical support to the mathematics of digital computers.

Another layer of complexity in floating-point arithmetic involves the calculation of fun stuff such as roots and exponents and logarithms and trigonometric functions. But all these jobs can be done with the four basic floating-point operations: addition, subtraction, multiplication, and division.

For example, the trigonometric sine can be calculated with a series expansion, like this:

sin(x)=xx33!+x55!x77!+

The x argument must be in radians, of which there are 2π in 360 degrees. The exclamation point is a factorial sign. It means to multiply together all the integers from 1 through the indicated number. For example, 5! equals 1 × 2 × 3 × 4 × 5. That’s just a multiplication. The exponent in each term is also a multiplication. The rest is just division, addition, and subtraction. The only really scary part is the ellipsis at the end, which means to continue the calculations forever. In reality, however, if you restrict yourself to the range 0 through π/2 (from which all other sine values can be derived), you don’t have to go anywhere close to forever. After about a dozen terms, you’re accurate to the 53-bit resolution of double-precision numbers.

Of course, computers are supposed to make things easy for people, so the chore of writing a bunch of routines to do floating-point arithmetic seems at odds with that goal. That’s the beauty of software, though. Once somebody writes the floating-point routines for a particular machine, other people can use them. Floating-point arithmetic is so important to scientific and engineering applications that it’s traditionally been given a very high priority. In the early days of computers, writing floating-point routines was always one of the first software jobs after building a new type of computer. Programming languages usually contain whole libraries of math functions. You’ve already seen the Javascript Math.sqrt function.

It also makes sense to design special hardware that does floating-point calculations directly. The first commercial computer that included optional floating-point hardware was the IBM 704 in 1954. The 704 stored all numbers as 36-bit values. For floating-point numbers, that broke down to a 27-bit significand, an 8-bit exponent, and a sign bit. The floating-point hardware could do addition, subtraction, multiplication, and division. Other floating-point functions had to be implemented in software.

Hardware floating-point arithmetic came to the desktop in 1980, when Intel released the 8087 Numeric Data Coprocessor chip, a type of integrated circuit usually referred to these days as a math coprocessor or a floating-point unit (FPU). The 8087 is called a coprocessor because it couldn’t be used by itself. It could be used only in conjunction with the 8086 and 8088, Intel’s first 16-bit microprocessors. At the time, the 8087 was considered to be the most sophisticated integrated circuit ever made, but eventually math coprocessors were included in the CPU itself.

Today’s programmers use floating-point numbers as if they were simply part of the computer, as they indeed are.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.238.71.155