10
APPLICATION AND SYSTEM PROGRAMMING

Image

Chapter 9 covered how web browsers work. You learned that browsers are complex application programs that provide software-implemented “computers” that support very high-level “instructions.” In this chapter, we’ll write a program that runs in a browser, followed by a similar program that doesn’t use the browser. The structure of the two programs is shown in Figure 10-1.

The operating system hides much of the I/O device complexity from user programs. In a similar manner, a complex user program such as a browser hides much of the complexity of dealing with operating systems from application programs that are built on top of them. This is fine if you’re going to limit yourself to being a high-level application writer. But you need to know more if you’re going to be a system programmer.

Image

Figure 10-1: Two program scenarios

This chapter includes lengthier JavaScript and C code examples than you’ve seen before. Don’t worry if you’re not fluent in these languages—you don’t need to know all the details to follow along.

Let’s look at a game in which the computer asks the user a series of questions to try to guess an animal. New animals and questions that distinguish them are added to the program as needed. The program “learns” by constructing a binary tree of knowledge.

The interaction between the computer (the literal text) and the user (the literal bold text) looks something like this:

Think of an animal.
Does it bark?
Yes
Is it a dog?
Yes
I knew it!
Let's play again.
Think of an animal.
Does it bark?
Yes
Is it a dog?
No
I give up. What is it?
giant purple snorklewhacker
What's a question that I could use to tell a giant purple snorklewhacker from a dog?
Does it live in an anxiety closet?
Thanks. I'll remember that.
Let's play again.
Think of an animal.
Does it bark?
Yes
Is it a dog?
No
Does it live in an anxiety closet?
Yes
Is it a giant purple snorklewhacker?
Yes
I knew it!
Let's play again.

Figure 10-2 shows the implementation plan.

Image

Figure 10-2: Guess the Animal flowchart

As you can see, we ask questions that guide our descent through the tree of knowledge. We congratulate ourselves when we guess correctly. Otherwise, we ask the user to supply the answer and a question, add them to the tree, and start over.

The program follows a path down the tree of knowledge on the left side. When it reaches the end of the path on the right, it either brags or adds to the knowledge base.

Guess the Animal Version 1: HTML and JavaScript

On to the program. We’ll go about this in a way that, although convenient, will upset some of my colleagues. This is a clever hack—something that works but is a bit twisted and ugly. As you saw in the previous chapter, the DOM is a tree that is a subset of a DAG—same with a binary tree. We’re going to build the binary tree of knowledge in the DOM as a set of nested, invisible <div>s. We could create a data structure in JavaScript, but the browser already has something easy that works. As Figure 10-3 shows, our program starts off with an initial question and two answers in the knowledge tree.

Image

Figure 10-3: Initial knowledge tree

Let’s play the game. We answer yes in response to Does it bark? and when the program guesses Is it a dog? we answer no. The program then asks What is it? and we respond with giant purple snorklewhacker. The program then asks us what question would distinguish a giant purple snorklewhacker from a dog and uses our response of Does it live in an anxiety closet? to modify the knowledge tree, as shown in Figure 10-4.

Image

Figure 10-4: Modified knowledge tree

Application-Level Skeleton

Listing 10-1 shows the web page skeleton into which we’ll add the code. Purists would be very upset at this because it combines HTML, CSS, and JavaScript into a single file. But we’re building a simple program, not a website, so it’s convenient to have everything in one place.

 1 <html>
 2   <head>
 3     <!-- include jQuery -->
 4     <script type="text/javascript" src="https://code.jquery.com/jquery-3.1.1.min.js"> </script>
 5
 6     <title>Web Page Skeleton</title>
 7
 8     <style>
 9       <!-- CSS goes here -->
10     </style>
11
12     <script type="text/javascript">
13
14       <!-- JavaScript goes here -->
15
16       $(function() {
17         <!-- JavaScript to run when document ready -->
18       });
19
20     </script>
21   </head>
22
23   <body>
24     <!-- HTML goes here -->
25   </body>
26 </html>

Listing 10-1: Web page skeleton

You can change the title to something like Guess the Animal yourself.

You learned about web browser components in the last chapter (see Figure 9-9). Now we’ll put some of them to use.

Web Page Body

Let’s start by looking at the <body> of the program in Listing 10-2. This replaces the <!-- HTML goes here --> from line 24 of Listing 10-1.

 1 <!-- This is the knowledge tree that is never visible -->
 2
 3 <div id="root" class="invisible">
 4   <div string="Does it bark">
 5     <div string="dog"></div>
 6     <div string="cat"></div>
 7  </div>
 8 </div>
 9
10 <div id="dialog">
11   <!-- The conversation will go here -->
12 </div>
13
14 <!-- Get new animal name dialog -->
15
16 <div id="what-is-it" class="start-hidden">
17   <input id="what" type="text"/>
18   <button id="done-what">Done</button>
19 </div>
20
21 <!-- Get new animal question dialog -->
22
23 <div id="new-question" class="start-hidden">
24   What's a good question that I could use to tell a
25   <span id="new"></span> from a <span id="old"></span>?
26   <input id="question" type="text"/>
27   <button id="done-question">Done</button>
28 </div>
29
30 <!-- Yes and no buttons -->
31
32 <div id="yesno" class="start-hidden">
33   <button id="yes">Yes</button>
34   <button id="no">No</button>
35 </div>

Listing 10-2: Guess the Animal HTML

You can see in lines 3 through 8 that the knowledge tree is preloaded with an initial question and answers. The string attribute is the question, except for leaf nodes where it is the animal name. The question contains two <div>s, the first being for the yes answer and the second for the no. The tree is wrapped in a <div> styled so that it’s never visible.

The dialog in lines 10 through 12 holds the conversation between the computer and the player. Then what-is-it (lines 16–19) contains a text field for the name of a new animal and a button the player presses when done. After that, new-question (lines 23–28) contains a text field for the new question and a button the player presses when done. The yes and no buttons are in yesno (lines 32–35). The three user input <div>s (lines 16, 23, and 32) have a start-hidden class that is used to make these values invisible at the beginning of a game.

The JavaScript

Let’s move on to the actual JavaScript. The first part is shown in Listing 10-3.

The first thing we do is declare the variable node where the skeleton says <!-- JavaScript goes here --> on line 14 of Listing 10-1. Although it could go inside the document ready function, putting it outside makes it easier to access using the browser developer console. We also declare two functions outside of the document ready function since they don’t rely on the page being loaded.

 1 var node; // current position in tree of knowledge
 2
 3 // Append the supplied html to the dialog. Bail if the new node has
 4 // no children because there is no question to ask. Otherwise, make
 5 // the new node the current node and ask a question using the string
 6 // attribute of the node. Turn the animal name into a question if a
 7 // leaf node. Returns true if the new node is a leaf node.
 8
 9 function
10 question(new_node, html)
11 {
12    $('#dialog').append(html);     // add the html to the dialog
13
14    if ($(new_node).length == 0) { // no question if no children
15      return (true);
16    }
17    else {
18      node = new_node;             // descend to new node
19
20      if ($(node).children().length == 0)
21        $('#dialog').append('Is it a ' + $(node).attr('string') + '?');
22      else
23        $('#dialog').append($(node).attr('string') + '?');
24
25      return (false);
26    }
27 }
28
29 // Restarts the game. Hides all buttons and text fields, clears
30 // the text fields, sets the initial node and greeting, asks the
31 // first question, displays the yes/no buttons.
32
33 function
34 restart()
35 {
36    $('.start-hidden').hide();
37    $('#question,#what').val('');
38    question($('#root>div'), '<div><b>Think of an animal.</b></div>');
39    $('#yesno').show();
40 }

Listing 10-3: Guess the Animal JavaScript variable and functions

Next, the <!-- JavaScript to run when document ready --> from line 17 of Listing 10-1 gets the five things shown in Listing 10-4.

 1 restart(); // Sets everything up the first time through.
 2
 3 // The user has entered a new question. Make a node with that
 4 // question and put the old no-node into it. Then, make a node
 5 // with the new animal and put it into the new question node ahead
 6 // of the old no-node so that it becomes the yes choice. Start over.
 7
 8 $('#done-question').click(function() {
 9   $(node).wrap('<div string="' + $('#question').val() + '"></div>');
10   $(node).parent().prepend('<div string="' + $(what).val() + '"></div>');
11   $('#dialog').append("<div>Thanks! I'll remember that.</div><p>");
12   restart();
13 });
14
15 // The user has entered a new animal name and clicked done. Hide
16 // those items and make the new-question text field and done button
17 // visible. Plug the old and new animal names into the query.
18
19 $('#done-what').click(function() {
20    $('#what-is-it').hide();
21    $('#new').text($('#what').val());
22    $('#old').text($(node).attr('string'));
23    $('#new-question').show();
24    $('#dialog div:last').append(' <i>' + $('#what').val() + '</i>');
25 });
26
27 // The user clicked yes in answer to a question. Descend the tree
28 // unless we hit bottom in which case we boast and start over.
29
30 $('#yes').click(function() {
31    if (question($(node).children(':first-child'), ' <i>yes</i><br>')) {
32      $('#dialog').append("<div>I knew it! I'm so smart!</div><p>");
33      restart();
34    }
35 });
36
37 // The user clicked no in answer to a question. Descend the tree
38 // unless we hit bottom, in which case we hide the yes/no buttons
39 // and make the what-is-it text field and done button visible.
40
41 $('#no').click(function() {
42    if (question($(node).children(':last-child'), ' <i>no</i><br>')) {
43      $('#yesno').hide();
44      $('#dialog').append('<div>I give up. What is it?</div>');
45      $('#what-is-it').show();
46    }
47 });

Listing 10-4: Guess the Animal document ready function JavaScript

We invoke the restart function (line 1) to start the game. The other four things are event handlers, the JavaScript equivalent of the interrupt handlers introduced in Chapter 5. There is one event handler for each of the four button elements. Each handler calls an anonymous function (an inline function that doesn’t have a name) when the associated button is pressed.

Practice your text-editing skills by typing in the program. Save the results in a file named something like gta.html and then open the file in your browser. Play the game. Open up the developer tools in your browser and find the HTML inspector; this allows you to look at the HTML that makes up the web page. Watch the tree of knowledge get built as you play.

The CSS

As we touched on in Chapter 9, classes give us a way to label elements so that they can be easily selected. CSS is primarily used for static declarations of properties; it becomes dynamic mostly via programmatic manipulation. The HTML in Listing 10-2 has two CSS classes: start-hidden is dynamic, and invisible is static.

The class attribute is used to make several of the HTML elements in Listing 10-5 members of the start-hidden class. This isn’t just to make our program classy; it’s to give us a way to locate all of these elements with a simple selector. These elements are made invisible whenever the program is started or restarted. They’re made visible as the program runs, and start-hidden allows us to reset everything simply.

The element with the invisible class is always invisible, as it’s the tree of knowledge. Thus, the CSS shown in Listing 10-5 replaces the <!-- CSS goes here --> in line 9 of Listing 10-1.

1 invisible {
2   display: none; /* elements with this class are not displayed */
3 }

Listing 10-5: Guess the Animal CSS

Note that you can use inline style for simple CSS instead, because of course there has to be more than one way to do things in a browser. Writing line 3 of Listing 10-2 as <div id="root" style="display: none"> would have the same effect.

Guess the Animal Version 2: C

As I’ve mentioned, browsers are high-level virtual machines—all their functionality is implemented in software. This enables us to quickly and easily construct our program in part by hiding some of the important underpinnings. Let’s rewrite the program in C so that more of the primitive actions that browsers hide are exposed. This discussion assumes a UNIX-derived operating system.

Terminals and the Command Line

Our C program is going to be extremely retro in that it’s not going to have any fancy buttons or graphics. It will use the command line in a manner similar to the ancient game of Adventure. This is a great opportunity to learn more about how input and output work rather than relying on the fancy widgets built into browsers.

What do I mean by “retro” and “command line”? As Chapter 1 mentions, human language likely started as sounds and gestures, with writing being invented much later. Computer language is the opposite. While interaction did start with pushing buttons and flipping switches when computers still had front panels, it quickly evolved to written language, with gesture and sound recognition coming later. Humans would type and computers would “type back” on terminals (see “Terminals” on page 176).

You probably use a graphical user interface (GUI) to communicate with your computer. It’s actually pretty Stone Age if you think about it. “Ugh! Look! Button! Press! Friend! Cat video! Like! Tweet Tweet Tweet!” GUIs mostly use gestural language, which works well for casual computer users because it doesn’t rely too much on users’ memories—or at least it didn’t in the days before all the icons were changed to be universally unrecognizable.

Most computer systems still support a written command line interface behind all the fancy graphics. Terminals are now implemented in software instead of being a piece of hardware external to the computer. You’ll get a command prompt if you open up the terminal application on your computer; you can type in it, and it will respond.

Instead of using buttons for yes and no, the C version of our program expects the player to type y or n into the terminal program, followed by the ENTER, RETURN, or ↵ key (depending on keyboard). The player similarly types in new animal names and questions. The program also accepts q to quit.

Building the Program

Because C is a compiled language, we can’t just “run” the source code like we could with the interpreted JavaScript version. We have to convert it into machine language first. We can do this pretty easily using the command line. If the source is in a file named, for example, gta.c, you can generate a machine language file called gta by typing the command shown in Figure 10-5 into your terminal.

Image

Figure 10-5: Building the program

Once you have the output file, you can typically just type its name to run it.

Terminals and Device Drivers

A terminal is an I/O device, and—as mentioned in “System and User Space” on page 133—user programs don’t talk to I/O devices directly; the operating system mediates, as shown in Figure 10-6.

Image

Figure 10-6: I/O device mediation

Back when terminals were separate devices, the computer and the terminal were connected through an RS-232 serial connection (see “Serial Communication” on page 152). There were physical wires connecting terminals and computers. Operating systems still pretend that this type of connection exists today, mimicking it in software so that legacy programs continue to work unmodified.

Context Switching

The device driver is more complicated than it seems because a primary reason we have operating systems is so that more than one user program can run at the same time. Because the computer has only one set of registers, the OS must save and restore their contents when switching between user programs. There’s actually a lot of stuff that needs to be saved and restored other than the CPU registers, including the MMU registers and state of any I/O. The whole pile is called the process context, or just context. We don’t want to do context switching frivolously because the size of the context makes it comparatively expensive. The system call process is shown in Figure 10-7.

Image

Figure 10-7: Context switching

As you can see, a lot of work happens behind the scenes when a system call is made. And, as mentioned back in “Relative Addressing” on page 128, sometimes the OS will sleep a user program, even when it can fulfill a request, in order to give another user program a chance to run.

We don’t want to do a context switch every time a user presses a key. One way to minimize context switching in this case is to realize we usually don’t care what the user is typing until they hit ENTER. The user program uses a system call to indicate that it wants to read from the terminal. This puts the user program to sleep, because it can’t do anything while it’s waiting, which allows the OS to perform some other operation, such as switching to run another program. The device driver that handles the idiosyncrasies of the physical device can save characters from the terminal in a buffer and wake up the user program only when the user hits ENTER instead of on every keypress.

What’s a buffer? We saw one back in Figure 6-25; it’s a first-in, first-out (FIFO) data structure, at least in software land. (In hardware land, a buffer is often a circuit used to protect delicate components from buffoons.) Figure 10-8 depicts a FIFO, also known as a queue, which is similar to being in line at the grocery store. As with stacks, a FIFO can overflow by running out of space and underflow by fetching from an empty queue.

Image

Figure 10-8: Dog in queue

Terminals usually operate in full-duplex mode (see “Serial Communication” on page 152), which means there is no direct connection between the keyboard and the display; the keyboard sends data to the computer, and the display receives data from the computer. Originally, as mentioned earlier, there were separate physical wires for each direction. It’s not enough, then, for the terminal device driver to buffer up the input because the user will get confused unless what they type is echoed so they can see it. And terminals are often slower than programs that write to them, so an output buffer is used in addition to the input buffer. A program is put to sleep if it tries to write to a full output buffer. The driver might provide the user some feedback, such as beeping if the input buffer becomes full. The part of the driver that we’ve been discussing looks like Figure 10-9.

Image

Figure 10-9: Terminal device driver buffering and echoing

Real device drivers are more complicated. Additional system calls are used to modify the driver settings. Echoing can be turned on and off. Buffering can be turned off, which is known as raw mode, whereas turning it on is known, of course, as cooked mode. The key(s) that wake up the user program can be set, along with much more, such as which key erases characters (usually BACKSPACE or DELETE).

Standard I/O

Buffering in the device driver solves only part of the problem. User programs have similar issues. It doesn’t do any good to have the device driver buffer up input just to have a user program make a system call for each character. The output buffer doesn’t help too much if the user program makes a system call to write each character. This is a common enough situation that it prompted the creation of the standard input/output library (stdio), which contains buffered I/O functions for user programs.

The stdio library supports buffered input, in which as much input as possible is read from the device driver in a single system call and placed into a buffer. The user program gets characters from the buffer until it’s empty, then tries to get more. On the output side, characters are buffered until either the buffer is full or an important character such as a newline occurs. Together it looks like Figure 10-10.

Image

Figure 10-10: User program with stdio buffering

Seems like a lot of work just to make things run efficiently! And we’re not done yet. How does the user program get connected to the terminal device driver?

It’s way easier to reference someone by their name than it is to provide their complete description, and operating systems take a similar approach to access files. The open system call converts a filename into a handle or file descriptor that can be used to reference the file until it is closed via the close system call. This is akin to getting a claim ticket when you check your backpack in a museum. The stdio library includes analogous fopen and fclose functions that use the system calls but also set up and tear down the buffering system. Because the UNIX abstractions treat devices just like files, you can open a special file such as /dev/tty to access a terminal device.

Circular Buffers

Earlier I said queues are like being in line at a grocery store. Although they do have that outward appearance, that’s not how buffers such as the stdio output buffer in Figure 10-10 are actually implemented.

Think about what happens in a grocery line. When the person in front is done, everybody else in line must move forward one position. Let’s queue up a frog, as shown in Figure 10-11. As you can see, we need to keep track of the end of the line so we know where to insert things.

Image

Figure 10-11: Inserting into a queue

Now let’s look at what happens when the frog is removed from the queue (Figure 10-12).

Image

Figure 10-12: Removing from a queue

As you can see, a lot of work is involved. When the f is removed, the r must be copied to where the f was, then the o to where the r was, and so on. Let’s try a different approach. Rather than everyone in the line moving, let’s have the checker get some exercise in Figure 10-13.

Image

Figure 10-13: Removing from a queue by moving the checker

This is a lot less work, except for the checker. But it causes a new problem. At some point, the line backs up to the door even though there’s space at the front. Nobody else can get in line.

What we need is some way to funnel new people into the space at the front of the line. We can do this by bending the line so that it’s circular, as shown in Figure 10-14.

Image

Figure 10-14: Circular buffer

As you can see, data can be added to the queue as long as the in arrow is clockwise from the out arrow. Likewise, data in the queue can be removed as long as the out arrow is counterclockwise from the in arrow. A bit of arithmetic is needed to wrap around from the end of the buffer to the beginning. The next location is the current one plus 1, modulo the buffer size.

These structures have many names, including circular buffers, circular queues, and ring buffers. They’re a pretty standard approach, and not just in stdio or device drivers.

Better Code Through Good Abstractions

Every time we play the Guess the Animal game, we start over from scratch with a program that knows only about cats and dogs. It would be nice if we could remember our game and continue where we left off. That’s easy to do in our C program; it’s a side benefit that results from the file abstraction.

Adding such a feature to the JavaScript version is much more difficult. Figure 10-15 illustrates why.

Image

Figure 10-15: Browser and operating system interfaces

You can see that the OS has a single interface that works for both devices and files. This interface is used both by the browser on the left and by the C version of the program on the right. That means the C program, like the browser, can use the same code to read input from a file as it does to read user input from a device. But the browser doesn’t pass this abstraction on to the JavaScript programmer. Instead, a completely separate piece of code using a completely different interface would be needed to add the new feature there. The choice of interface can have a big impact on both the ease of programming and the clarity of the result.

Some Mechanics

Back to our C program. Getting a C program ready to run requires compiling it and then linking it to other code that it uses, such as the stdio library. The section “Running Programs” on page 137 mentions that a runtime library is also included; the C version is often named crt0. It’s responsible for tasks like setting up the stack and the heap so they’re ready to use. It also opens up a pair of files that are connected to the terminal device driver by default, one for input and one for output.

The stdio library maps the system file descriptors into file pointers, addresses that reference the data structures that it uses for buffering and bookkeeping. It starts with three: stdin (standard input), stdout (standard output), and stderr (standard error). The intent is for things that are important to go to stderr instead of stdout; they both go to the same place, but stderr is unbuffered and stdout is buffered. If you use stdout for error messages, they get buffered, and you may never see them if your program crashes. The file pointers stdout and stderr share the same file descriptor, as shown in Figure 10-16, unless changed.

Image

Figure 10-16: The file pointers stdin, stdout, and stderr

Invention is often sparked by strange events. According to Steve Johnson, stderr was not part of the original stdio library; it was added as a side effect of the development of the first computer typesetting software (troff, written by Joseph Ossanna, 1928–1977) for the C/A/T photoypesetter. You take laser and inkjet printing for granted, but this beast projected images onto silver photographic paper, which then had to be developed. That became very expensive when the Hunt brothers cornered the silver market, and folks were asked to cut down on phototypesetter use. It was not uncommon to send a job to the typesetter only to get back a beautifully formatted page containing a cannot open file error message. The stderr file pointer was born so that error messages could go to the terminal instead of to the typesetter in order to save money.

Buffer Overflow

As long as we’re on the subject of stdio, let’s talk about a class of very serious system programming errors called buffer overflow. When stdio was originally written, it included a function called gets that read a string up to the next newline character from stdin into a user-supplied buffer. We could use it as shown in Listing 10-6 to read the y, n, or q response; there’s room in buffer for the character and a NUL terminator.

1 char buffer[2];
2
3 gets(buffer);

Listing 10-6: Using gets to read input

Why might this be a problem? Because gets doesn’t check to make sure that the input doesn’t run off the end of the buffer. Say we have a more serious program that also has a variable named launch_missiles, which just happens to be the next thing in memory (Figure 10-17).

Image

Figure 10-17: Buffer overflow in memory

A malicious user might discover that answering yyy would store a y in launch_missiles, which for all intents and purposes is the same as the nonexistent buffer[2]. That could get really ugly. As a matter of fact, it has. A very large number of discovered security issues result from exactly this sort of buffer overflow bug. This was fixed in stdio by the addition of an fgets function that checks bounds. But be careful—there are many, many ways in which buffer overflow bugs can occur. Never, ever assume that buffer sizes are big enough! There’s more detail about buffer overflows in Chapter 13.

The C Program

There are many C libraries in addition to stdio. The string library, for example, includes functions for comparing and copying strings, and the catchall standard library stdlib includes functions for memory management.

Listing 10-7 shows the C program for our game’s prologue. The first part brings in the library information we need (lines 1–3). Next, a node structure is declared (lines 5–9) that contains pointers to two leaves and a placeholder for the question or animal string. Note that we didn’t have to do something like this in our JavaScript version because we took advantage of the existing HTML <div>; had we not done that, there would have been a JavaScript equivalent. Notice that the node structure is defined such that we can allocate the node and string together, as in “More Efficient Memory Allocation” on page 196.

1 #include <stdio.h>  // standard I/O library
2 #include <stdlib.h> // standard library for exit and malloc
3 #include <string.h> // string library
4
5 struct node {
6   struct node *no;  // references no answer node
7   struct node *yes; // references yes answer node
8   char string[1];   // question or animal
9 };

Listing 10-7: Guess the Animal in C: prologue

Next, we define a function to help with memory allocation (Listing 10-8). Although memory allocation is no big deal, we need to do it in several places, and it gets tedious to check for errors each time. More recent languages include exception-handling constructs that make this sort of thing simpler.

Since the only time that we need to allocate memory is when making a new node, we use a function that takes the string to install in the node. In addition to allocating memory, the string is copied into the node, and the yes and no pointers are initialized.

10 struct  node    *
11 make_node(char *string)
12 {
13     struct  node    *memory;        // newly allocated memory
14
15     if ((memory = (struct node *)malloc(sizeof (struct node) + strlen(string))) == (struct node *)0) {
16         (void)fprintf(stderr, "gta: out of memory. ");
17         exit(-1);
18     }
19
20     (void)strcpy(memory->string, string);
21     memory->yes = memory->no = (struct node *)0;
22
23     return (memory);
24 }

Listing 10-8: Guess the Animal in C: memory allocator

We use the fprintf function in stdio for our error message because, as discussed earlier, things sent to stderr are unbuffered, which gives us a better chance of seeing the message if the program fails unexpectedly.

Note that the cast operator is used to cast the fprintf as void on line 16. When fprintf returns a value that we’re ignoring, the cast tells the compiler that we’re doing it deliberately, instead of forgetting to check something, so that it doesn’t generate warning messages. It also informs someone reading the code that the return value is being deliberately ignored, so it’s not a mistake. Recent changes to some compilers eliminate these warnings unless explicitly requested.

The call to exit on line 17 terminates the program. That’s the only reasonable option when there isn’t enough memory available to continue running the program.

The printf (print formatted) function is part of stdio and has made its way into many other languages. The first argument is a format string that determines the interpretation of the remainder of the arguments. A % followed by a code means “replace me with the next argument according to the code.” In this case, %s means “treat the next argument as a string.”

The rest of the program is shown in Listing 10-9.

 25 int
 26 main(int argc, char *argv[])
 27 {
 28     char            animal[50];     // new animal name buffer
 29     char            buffer[3];      // user input buffer
 30     int             c;              // current character from buffer
 31     struct  node    **current;      // current tree traversal node
 32     FILE            *in;            // input file for training data or typing
 33     struct  node    *new;           // newly created node
 34     FILE            *out;           // output file for saving training data
 35     char            *p;             // newline removal pointer
 36     char            question[100];  // new question buffer
 37     struct  node    *root;          // root of the tree of knowledge
 38
 39     //  Process the command line arguments.
 40
 41     in = out = (FILE *)0;
 42
 43     for (argc--, argv++; argc > 1 && argc % 2 == 0; argc -= 2, argv += 2) {
 44         if (strcmp(argv[0], "-i") == 0 && in == (FILE *)0) {
 45             if ((in = fopen(argv[1], "r")) == (FILE *)0) {
 46                 (void)fprintf(stderr, "gta: can't open input file `%s'. ", argv[1]);
 47                 exit(-1);
 48             }
 49         }
 50
 51         else if (strcmp(argv[0], "-o") == 0 && out == (FILE *)0) {
 52             if ((out = fopen(argv[1], "w")) == (FILE *)0) {
 53                 (void)fprintf(stderr, "gta: can't open output file `%s'. ", argv[1]);
 54                 exit(-1);
 55             }
 56         }
 57
 58         else
 59             break;
 60     }
 61
 62     if (argc > 0) {
 63         (void)fprintf(stderr, "usage: gta [-i input-file-name] [-o output-file-name] ");
 64         exit(-1);
 65     }
 66
 67     //  Read from standard input if no input file was specified on the command line.
 68
 69     if (in == (FILE *)0)
 70         in = stdin;
 71
 72     //  Create the initial tree of knowledge.
 73
 74     root = make_node("Does it bark");
 75     root->yes = make_node("dog");
 76     root->no = make_node("cat");
 77
 78     for (;;) {      // play games until the user quits.
 79
 80         if (in == stdin)
 81             (void)printf("Think of an animal. ");
 82
 83         current = &root;    //  start at the top
 84
 85         for (;;) {          // play a game
 86
 87             for (;;) {      // get valid user input
 88                 if (in == stdin) {
 89                     if ((*current)->yes == (struct node *)0)
 90                         (void)printf("Is it a ");
 91
 92                     (void)printf("%s?[ynq] ", (*current)->string);
 93                 }
 94
 95                 if (fgets(buffer, sizeof (buffer), in) == (char *)0 || strcmp(buffer, "q ") == 0) {
 96                     if (in != stdin) {
 97                         (void)fclose(in);
 98                         in = stdin;
 99                     }
100                     else {
101                         if (in == stdin)
102                             (void)printf(" Thanks for playing.  Bye. ");
103                         exit(0);
104                     }
105                 }
106                 else if (strcmp(buffer, "y ") == 0) {
107                     if (out != (FILE *)0)
108                         fputs("y ", out);
109
110                     current = &((*current)->yes);
111
112                     if (*current == (struct node *)0) {
113                         (void)printf("I knew it! ");
114                         break;
115                     }
116                 }
117                 else if (strcmp(buffer, "n ") == 0) {
118                     if (out != (FILE *)0)
119                         fputs("n ", out);
120
121                     if ((*current)->no == (struct node *)0) {
122                         if (in == stdin)
123                             (void)printf("I give up.  What is it? ");
124
125                         fgets(animal, sizeof (animal), in);
126
127                         if (out != (FILE *)0)
128                             fputs(animal, out);
129
130                         if ((p = strchr(animal, ' ')) != (char *)0)
131                             *p = '';
132
133                         if (in == stdin)
134                             (void)printf(
135                              "What's a good question that I could use to tell a %s from a %s? ",
136                               animal, (*current)->string);
137                         fgets(question, sizeof (question), in);
138
139                         if (out != (FILE *)0)
140                             fputs(question, out);
141
142                         if ((p = strchr(question, ' ')) != (char *)0)
143                             *p = '';
144
145                         new = make_node(question);
146                         new->yes = make_node(animal);
147                         new->no = *current;
148                         *current = new;
149
150                         if (in == stdin)
151                             (void)printf("Thanks!  I'll remember that. ");
152
153                         break;
154                     }
155
156                     else
157                         current = &((*current)->no);
158                 }
159                 else {
160                     if (in == stdin)
161                         (void)printf("Huh?  Please answer y for yes, n for no, or q for quit. ");
162
163                     while ((c = getc(in)) != ' ' && c != EOF)
164                         ;
165                 }
166             }
167
168             break;
169         }
170
171         if (in == stdin)
172             (void)printf("Let's play again. ");
173     }
174 }

Listing 10-9: Guess the Animal in C: mainline

There’s nothing particularly interesting about this code except the memory management, as the program does pretty much the same thing as the JavaScript version. Lines 28 through 37 declare variables. Lines 74 through 76 create the initial nodes depicted in Figure 10-18. Note that all the strings are NUL-terminated ('').

Image

Figure 10-18: Guess the Animal in C: initial nodes

Let’s play the game as we did earlier in “Guess the Animal Version 1: HTML and JavaScript” on page 262. After the player supplies a new question, a new node is allocated for it. There are a couple of points of interest here. Be careful getting the length of a string using the strlen (string length) function. It returns the actual length of the string, not the amount of memory used, which is 1 byte more to account for the NUL terminator. But notice that we don’t add 1 when allocating memory for strings because of the way we’re allocating memory for the node, which already includes the extra byte.

Whenever we descend the tree in response to a yes or no answer, we keep a current pointer to make it easy to insert the new question node. We need to detach either the yes or no, which we do by having current point to whatever node pointer is being replaced. Because current points to a node pointer, it’s a pointer to a pointer. When we say *current = new; we’re dereferencing the pointer and saying “replace whatever the pointer is pointing to.” In Figure 10-19, the no pointer in the new node is set to current, which is the old answer, and current points to the yes pointer in the root node, which gets replaced with the pointer to the new node.

Image

Figure 10-19: Guess the Animal in C: adding new nodes

Training

Recall that our C program can be run with command line options for reading and writing training data. We can run the program as follows:

prompt> gta -o training
Think of an animal.
Does it bark?
n
Is it a dog?
n
I give up. What is it?
giant purple snorklewhacker
What's a question that I could use to tell a giant purple snorklewhacker from a dog?
Does it live in an anxiety closet?
Thanks. I'll remember that.
Let's play again.
Think of an animal.
Does it bark?
q
Thanks for playing. Bye.

Now, if you look in the training file, you’ll see that it contains exactly what you typed:

n
n
giant purple snorklewhacker
Does it live in an anxiety closet?

If we rerun the program as:

prompt> gta -i training

the contents of the training file will get read in so that the program starts where we left off.

Way back in “What Is Computer Programming?” on page xxix, I mentioned that you need to know a lot about everything in order to be a good programmer. Our program isn’t very good grammatically. It works fine if the animal is a dog, because it will ask Is it a dog?. But what if it’s an elephant? It’s not grammatically correct to ask Is it a elephant?. What are the rules for making sure the grammar is correct? Can you modify the code to make it grammatically more better?

Summary

In this chapter, you’ve seen a program written in two ways: once as a high-level application and once as a lower-level system program. On one hand, writing high-level application programs can be easier because many small details are handled automatically. On the other hand, some features, such as recording and playback, are much more difficult to implement in environments that don’t include uniform interfaces.

Furthermore, using very complex application environments for simple applications increases the likelihood of bugs. The probability of bugs is the sum of your application code and the code for the environment in which it runs. How many times has your browser begun running very slowly and needed to be restarted, usually due to internal memory management errors? How often has your browser just crashed?

You’ve seen that system programming involves much more attention to detail, such as the management of strings, memory, and buffers. But these details are important when the goal is to craft code that is concise and secure. In the next chapter, we’ll look at a different type of detail: structuring problems so that they’re easier to solve.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.128.199.162