Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

1. Program Structure

Martin Kalin¹

(1)

Chicago, IL, USA

1.1 Overview

This chapter focuses on how C programs are built out of functions, which are a construct in just about all modern program languages. The chapter uses short code segments and full programs to explain topics such as these:

Functions as program modules
Control flow within a program
The special function named main
Passing arguments to a function
Returning a value from a function
Writing functions that take a variable number of arguments

C distinguishes between function declarations, which show how a function is to be called, and function definitions, which provide the implementation detail. This chapter introduces the all-important distinction, and later chapters put the distinction to use in a variety of examples. The chapter also compares C functions with assembly-language blocks, which is helpful in clarifying how C source code compiles into machine-executable code.

Every general-purpose programming language has control structures such as tests and loops. Once again, short code examples introduce the basics of C’s principal control structures; later code examples expand and refine this first look at control structures.

1.2 The Function

A C program consists of one or more functions, with a function as a program module that takes zero or more arguments and can return a value. To declare a function is to describe how the function should be invoked, whereas to define a function is to implement it by providing the statements that make up the function’s body. A function’s body provides the operational details for whatever task the function performs. A declaration is a function’s interface, whereas a definition is a function’s implementation. The following is an example of the declaration and the definition for a very simple function that takes two integer values as arguments and returns their sum.

int add2(int, int); /* declaration ends with semicolon, no body */

int add2(int n1, int n2) { /* definition: the body is enclosed in braces */

int sum = n1 + n2; /* could avoid this step, here for clarity */

return sum; /* could just return n1 + n2 */

} /* end of block that implements the function */

Listing 1-1

Declaring and defining a function

The add2 example (see Listing 1-1) contrasts a function’s declaration with its definition. The declaration has no body of statements enclosed in braces, but the definition must have such a body. In a contrived example, the body could be empty, but the braces still would be required in the definition and absent from the declaration.

If some other function main calls add2, then the declaration of add2 must be visible to main. If the two functions are in the same file, this requirement can be met by declaring add2 above main. There is, however, a shortcut. If add2 is defined above main in the same file, then this definition doubles as a declaration (see Listing 1-2).

int add2(int n1, int n2) { /* definition: the body is enclosed in braces */

int sum = n1 + n2; /* could avoid this step, here for clarity */

return sum; /* could just return n1 + n2 */

} /* end of block that implements the function */

int main() {

return add2(123, 987); /* ok: add2 is visible to main */

}

Listing 1-2

More on declaring and defining a function

Program structure may require that a function be declared and defined separately. For instance, if a program’s functions are divided among various source files, then a function defined in a given file would have to be declared in another file to be visible there. Examples are forthcoming.

As noted, a function’s body is enclosed in braces, and each statement within the body ends with a semicolon. Indentation makes source code easier to read but is otherwise insignificant—as is the placement of the braces. My habit is to put the opening brace after the argument list and the closing brace on its own line.

In a program, each function must be defined exactly once and with its own name, which rules out the name overloading (popular in languages such as Java) in which different functions share a name but differ in how they are invoked. A function can be declared as often as needed. As promised, an easy way of handling declared functions is forthcoming.

In the current example, the declaration shows that function add2 takes two integer (int) arguments and returns an integer value (likewise an int). The definition of function add2 provides the familiar details, and this definition could be shortened to a single statement:

return n1 + n2;

If a C function does not return a value, then void is used in place of a return data type. The term void, which is shorthand for no value, is technically not a data type in C; for instance, there are no variables of type void. By contrast, int is a data type. An int variable holds a signed integer value and so is able to represent negative and nonnegative values alike; the underlying implementation is almost certainly 32 bits in size and almost certainly uses the 2’s complement representation, which is clarified later.

There are various C standards, which relax some rules of what might be called orthodox C. Furthermore, some C compilers are more forgiving than others. In orthodox C , for example, there are no nested function definitions: one function cannot be defined inside another. Also, later standardizations of C extend the comment syntax from the slash-star opening and star-slash closing illustrated in Listing 1-1, and an until-end-of-line comment may be introduced with a double slash. To make compilation as simple as possible, my examples stick with orthodox C, avoiding constructs such as nested functions and double slashes for comments.

1.3 The Function main

In style, C is a procedural or imperative language, not an object-oriented or functional one. The program modules in a C program are functions, which have global scope or visibility by default. There is a way to restrict a function’s scope to the file in which the function is defined, as a later chapter explains. The functions in a C program can be distributed among arbitrarily many different source files, and a given source file can contain as many functions as desired.

A C program’s entry point is the function main in that program execution begins with the first statement in main. In a given program, regardless of how many source files there are, the function main (like any function) should be defined exactly once. If a collection of C functions does not include the appropriate main function, then these functions compile into an object module, which can be part of an executable program , but do not, without main, constitute an executable program.

#include <stdio.h>

/* This definition of add2, occurring as it does _above_ main, doubles as the

function's declaration: main calls add2 and so the declaration of add2 must

be visible above the call. If function add2 were _defined_ below main, then

the function should be declared here above main to avoid compiler warnings. */

int add2(int n1, int n2) { /* definition: the body is enclosed in the braces */

int sum = n1 + n2; /* could avoid this step, kept here for verbosity */

return sum; /* we could just return n1 + n2 */

}

int main() {

int k = -26, m = 44;

int sum = add2(k, m); /* call the add2 function, save the returned value */

/* %i means: format as an integer */

printf("%i + %i = %i ", k, m, sum); /* output: -26 + 44 = 18 */

return 0; /* 0 signals normal termination, < 0 signals some error */

}

Listing 1-3

An executable program with main and another function

The revised add2 example (see Listing 1-3) can be compiled and then run at the command line as shown in the following, assuming that the file with the two functions is named add2.c. These commands are issued in the very directory that holds the source file add2.c. My comments begin with two ## symbols:

% gcc -o add2 add2.c ## alternative: % gcc add2.c -o add2

% ./add2 ## On Windows, drop the ./

The flag -o stands for output. Were this flag omitted, the executable would be named a.out (A.exe on Windows) by default. On some systems, the C compiler may be invoked as cc instead of gcc. If both commands are available, then cc likely invokes a native compiler—a compiler designed specifically for that system. On Unix-like systems, this command typically is a shortcut:

% make add2 ## expands to: gcc -o add2 add2.c

The add2 program begins with an include directive. Here is the line:

#include <stdio.h>

This directive is used during the compilation process, with details to follow. The file stdio.h, with h for header, is an interface file that declares input/output functions such as printf , with the f for formatted. The angle brackets signal that stdio.h is located somewhere along the compiler’s search path (on Unix-like systems, in a directory such as /usr/include or /usr/local/include). The implementation of a standard function such as printf resides in a binary library (on Unix-like systems, in a directory such as /usr/lib or /usr/local/lib), which is linked to the program during the full compilation process.

Header Files For Function Declarations

Header files are the natural way to handle function declarations—but not function definitions. A header file such as stdio.h can be included wherever needed, and even multiple includes of the same header file, although inefficient, will work. However, if a header file contains function definitions, there is a danger. If such a file were included more than once in a program’s source files, this would break the rule that every function must be defined exactly once in a program. The sound practice is to use header files for function declarations, but never for function definitions.

What is the point of having the main function return an int value? Which function gets the integer value that main returns? When the user enters

% ./add2

at the command-line prompt and then hits the Return key, a system function in the exec family (e.g., execv) executes. This exec function then calls the main function in the add2 program, and main returns 0 to the exec function to signal normal termination (EXIT_SUCCESS). Were the add2 program to terminate abnormally, the main function might return the negative value -1 (EXIT_FAILURE). The symbolic constants EXIT_SUCCESS and EXIT_FAILURE are clarified later.

Is There Easy-To-Find Documentation on Library Functions?

On Unix-like systems, or Windows with Cygwin installed ( https://cygwin.com ), there is a command-line utility man (short for manual) that contains documentation for the standard library functions and for utilities that often have the same name as a function: googling for man pages is a good start.

1.4 C Functions and Assembly Callable Blocks

The function construct is familiar to any programmer working in a modern language. In object-oriented languages , functions come in special forms such as the constructor and the method. Many languages, including object-oriented ones, now include anonymous or unnamed functions such as the lambdas added in object-oriented languages such as Java and C#, but available in Lisp since the 1950s. C functions are named.

Most languages follow the basic C syntax for functions, with some innovations along the way. The Go language , for example, allows a function to return multiple values explicitly. Functions are straightforward with respect to flow of control: one function calls another, and the called function normally returns to its caller. Information can be sent from the caller to the callee through arguments passed to the callee; information can be sent from the callee back to the caller through a return value. Even in C, which allows only a single return value at the syntax level, multiple values can be returned by returning an array or other aggregate structure. Additional tactics for returning multiple values are available, as shown later.

Assembly languages do not have functions in the C sense, although it is now common to talk about assembly language functions. The assembly counterpart to the function is the callable block, a routine with a label as its identifier; this label is the counterpart of a function’s name. Information is passed to a called routine in various ways, but with CPU registers and the stack as the usual way. This section uses the traditional Hello, world! program in a first look at (Intel) assembly code.

#include <stdio.h>

int main() {

/* msg is a pointer to a char, the H in Hello, world! */

char* msg = "Hello, world!"; /* the string is implemented as an array of characters */

printf("%s ", msg); /* %s formats the argument as a string */

return 0; /* main must return an int value */

}

Listing 1-4

The traditional greeting program in C

The hi program (see Listing 1-4) has three points of interest for comparing C and assembly code. First, the program initializes a variable msg whose data type is char*, which is a pointer to a character. The star could be flush against the data type, in between char and msg , or flush against msg:

char* msg = ...; /* my preferred style, some limitations */

char * msg = ...; /* ok, but unusual */

char *msg = ...; /* perhaps the most common style */

A string in C is implemented as array of char values, with the 1-byte, nonprinting character 0 terminating the array:

+---+---+---+---+---+ +---+---+---+

msg--->| H | e | l | l | o |...| l | d | | ## is a char

+---+---+---+---+---+ +---+---+---+

The slash before the 0 in identifies an 8-bit (1-byte) representation of zero. A zero without the backslash (0) would be an int constant, which is typically 32 bits in size. In C, character literals such as are enclosed in single quotes:

char big_A = 'A'; /* 65 in ASCII (and Unicode) */

char nt = ''; /* non-printing 0, null terminator for strings */

In the array to which msg points, the last character is called the null terminator because its role is to mark where the string ends. As a nonprinting character, the null terminator is perfect for the job. Of interest now is how the assembly code represents a string literal.

The second point of interest is the call to the printf function . In this version of printf, two arguments are passed to the function: the first argument is a format string, which specifies string (%s) as the formatting type; the second argument is the pointer variable msg, which points to the greeting by pointing to the first character H. The third and final point of interest is the value 0 (EXIT_SUCCESS) that main returns to its caller, some function in the exec family.

The C code for the hi program can be translated into assembly. In this example, the following command was used:

% gcc -O1 -S hi.c ## -O1 = Optimize level 1, -S = save assembly code

The flag -O1 consists of capital letter O for optimize followed by 1, which is the lowest optimization level . This command produces the output file hi.s, which contains the corresponding assembly code. The file hi.s could be compiled in the usual way:

% gcc -o hi hi.s ## produces same output as compiling hi.c

.file "hi.c" ## C source file

.LC0: ## .LC0 is the string's label (address)

.string "Hello, world!" ## string literal

.text ## text (program) area: code, not data

.globl main ## main is globally visible

.type main, @function ## main is a function, not a variable (data)

main: ## label for main, the entry point

.cfi_startproc ## Call Frame Information: metadata

Subq $8, %rsp ## grow the stack by 8 bytes

.cfi_def_cfa_offset 16 ## more metadata

Movl $.LC0, %edi ## copy (pointer to) the string into register %edi

Call puts ## call puts, which expects its argument in %edi

Movl $0, %eax ## copy 0 into register %eax, which holds return value

Addq $8, %rsp ## shrink the stack by 8 bytes

.cfi_def_cfa_offset 8 ## more metadata

ret ## return to caller

.cfi_endproc ## all done (metadata)

Listing 1-5

The hi program in assembly code

The hi program in assembly code (see Listing 1-5) uses AT&T syntax. There are alternatives, including so-called Intel assembly. The AT&T version has advantages , which are explained in the forthcoming discussions. In the example, the ## symbols introduce my comments.

To begin, some points about syntax should be helpful:

Identifiers that begin with a period (e.g., .file) are directives that guide the assembler in translating the assembly code into machine-executable code.
Identifiers that end with a colon (with or without a starting period) are labels, which serve as pointers (addresses) to relevant parts of the code. For example, the label main: points to the start of the callable code block that, in assembly, corresponds to the main function in C.
CPU registers begin with a percentage sign %. In a register name such as %eax, the e is for extended, which means 32 bits in Intel. On a 64-bit machine, the register %eax comprises the lower 32 bits of the 64-bit register %rax. In general, register names that start with the e are the lower 32 bits of the corresponding registers whose names start with r: %eax and %rax are one example, and %edi and %rdi are another example. A 32-bit machine would have only e registers.
In instructions such as movl, the l is for longword, which is 32 bits in Intel. In instruction addq, the q is for quadword, which is 64 bits. By the way, the various mov instructions are actually copy instructions: the contents of the source are copied to the destination, but the source remains unchanged.

The essentials of this assembly code example begin with two labels . The first, .LC0:, locates the string greeting “Hello, world!”. This label thus serves the same purpose as the pointer variable msg in the C program. The label main: locates the program’s entry point and, in this way, the callable code block that makes up the body of the main: routine.

Two other parts of the main: routine deserve a look. The first is the call to the library routine puts , where the s indicates a string. In C code, the call would look like this:

puts("This is a string."); /* C code (puts adds a newline) */

In C, puts would be called with a single argument. In assembly code, however, the puts is called without an explicit argument. Instead, the expected argument—the address of the string to print—is copied to the register %edi, which comprises the lower 32 bits of the 64-bit register %rdi. For review, here is the code segment :

Movl $.LC0, %edi ## copy (pointer to) the string into %edi

Call puts ## call puts, which expects argument in %edi

A second interesting point about the main: routine is the integer value returned to its invoker, again some routine in the exec family. The 32-bit register %eax (the lower 32 bits of the 64-bit %rax) is sometimes used for general-purpose scratchpad, but in this case is used for a special purpose—to hold the value returned from the main: routine. The assembly code thus puts 0 in the register immediately before cleaning up the stack and returning:

movl $0, %eax ## copy 0 into %eax, which holds return value

Although assembly-language programs are made up of callable routines rather than functions in the C sense, it is common and, indeed, convenient to talk about assembly functions. For the most part, the machine-language library routines originate as C functions that have been translated first into assembly language and then into machine code (see the sidebar).

How are C Programs Compiled?

The compilation of a C program is a staged process, with four stages:

+----------+ +-------+ +--------+ +----+

+----------+ +-------+ +--------+ +----+

There are flags for the gcc utility, as well as separately named utilities (e.g., cpp for preprocess only), for carrying out the process only to a particular stage. The preprocess stage handles directives such as #include, which start with a sharp sign. The compile stage generates assembly language code, which the assemble stage then translates into machine code. The link stage connects the machine code to the appropriate libraries. The command

% gcc --save-temps net.c

would compile the code but also save the temporary files: net.i (text, from preprocess stage), net.s (text, from compile stage), and net.o (binary, from assemble stage).

1.4.1 A Simpler Program in Assembly Code

A simpler program in assembly language shows that many assembler directives can be omitted; the remaining directives make the code easier to read. Also, no explicit stack manipulation is needed in the forthcoming example, which is written from scratch rather than generated from C source code.

## hello program

.data # data versus code section

.globl hello # global scope for label hello

hello: # label == symbolic address

.string "Hello, world!" # a character string

.text # text == code section

.global main # global scope for main subroutine

main: # start of main

movq $hello, %rdi # copy address of the greeting to %rdi

call puts # call library routine puts

movq $0, %rax # copy 0 to %rax (return value)

ret # return control to routine's caller

Listing 1-6

A bare-bones program in assembly language

The hiAssem program (see Listing 1-6) prints the traditional greeting , but using assembly code rather than C. The program can be compiled and executed in the usual way except for the added flag -static:

% gcc -o hiAssem -static hiAssem.s

% ./hiAssem ## on Windows, drop the ./

The program structure is straightforward:

1.
Identify a string greeting with a label, in this case hello:.
2.
Identify the entry point with a label, in this case main:.
3.
Copy the greeting’s address hello: into register %rdi, where the library routine puts expects this address.
4.
Call puts.
5.
Copy zero into register %rax, which holds a called routine’s return value.
6.
Return to the caller.

Even the short examples in this section illustrate the basics of C programs: functions in C correspond to callable blocks (routines) in assembly language, and in the normal flow of control, a called function returns to its caller. With respect to called functions, the system provides scratchpad storage, for local variables and parameters, with CPU registers and the stack as backup.

1.5 Passing Command-Line Arguments to main

The main function seen so far returns an int value and takes no arguments. The declaration is

int main(); /* one version */

The main function need not return a value, however:

void main(); /* another version, returns nothing */

The function main also can take arguments from the command line:

int main(int argc, char* argv[ ]); /* with two arguments, also could return void */

The two arguments in the last declaration of main are named, by tradition, argc (c for count) and argv (v for values). Here is a summary of the information in each argument:

The first argument to the main function is argc, a count of the command-line arguments. This count is one or more because the name of the executable program is, again by tradition, the first command-line argument. If the program hi is invoked from the command line as follows:

% ./hi

then argc would have a value of one. If the same program were invoked as follows:
% ./hi one two three

then argc would have a value of four. A program is not obligated to use the command-line arguments passed to it.
The second argument (argv) passed to main is trickier to explain. All of the command-line arguments, including the program’s name, are strings. Recall that a string in C is an array of characters with a null terminator. Because there may be multiple command-line arguments, these are stored in a list (a C array), each of whose elements holds the address of the first character in a command-line string. For example, in the invocation of program hi, the first element in the argv array points to the h in hi; the second element in this array points to the o in one; and so forth.
The empty square brackets in argv[ ] indicate an array of unspecified length, as the array’s length is given in argc; the char* (pointer to character) data type indicates that each array element is a pointer to the first character in each command-line string. The argv argument is thus a pointer to an array of pointers to char; hence, the argv argument is sometimes written as char** argv, which means literally that argv is a pointer to pointer(s) to characters.

The details about arrays are covered thoroughly in Chapter 3, but the preceding sketch should be enough to clarify how command-line arguments work in C .

#include <stdio.h>

int main(int argc, char* argv[ ]) {

if (argc < 2) {

puts("Usage: cline <one or more cmd-line args>");

return -1; /** -1 is EXIT_FAILURE **/

}

puts(argv[0]); /* executable program's name */

int i;

for (i = 1; i < argc; i++)

puts(argv[i]); /* additional command-line arguments */

return 0; /** 0 is EXIT_SUCCESS **/

}

Listing 1-7

Command-line arguments for main

The cline program (see Listing 1-7) first checks whether there are at least two command-line arguments—at least one in addition to the program’s name. If not, the usage section introduced by the if clause explains how the program should be run. Otherwise, the program uses the library function puts (put string) to print the program’s name (argv[0]) and the other command-line argument(s). (The for loop used in the program is clarified in the next section.) Here is a sample run :

% ./cline A 1 B2

./cline

Later examples put the command-line arguments to use. The point for now is that even main can have arguments passed to it. Both of the control structures used in this program, the if test and the for loop, now need clarification.

1.6 Control Structures

A block is a group of expressions (e.g., integer values to initialize an array) or statements (e.g., the body of a loop). In either case, a block starts with the left curly brace { and ends with a matching right curly brace }. Blocks can be nested to any level, and the body of a function—its definition—is a block. Within a block of statements, the default flow of control is straight-line execution.

#include <stdio.h>

int main() {

int n = 27; /** 1 **/

int k = 43; /** 2 **/

printf("%i * %i = %i ", n, k, n * k); /** 3 **/

return 0; /** 4 **/

}

Listing 1-8

Default flow of control

The straight-line program (see Listing 1-8) consists of the single function main, whose body has four statements, labeled in the comments for reference. There are no tests, loops, or function calls that interfere with the straight-line execution: first statement 1, then statement 2, then statement 3, and then statement 4. The last statement exits main and thereby effectively ends the program’s execution. Straight-line execution is fast, but program logic typically requires a more nuanced flow of control.

C has various flavors of the expected control structures, which can be grouped for convenience into three categories : tests, loops, and (function) calls. This section covers the first two, tests and loops; the following section expands on flow of control in function calls .

#include <stdio.h>

int main() {

int n = 111, k = 98;

int r = (n > k) ? k + 1 : n - 1; /* conditional operator */

printf("r's value is %i ", r); /* 99 */

if (n < k) puts("if");

else if (r > k) puts("else if"); /** prints **/

else puts("else");

r = 0; /* reset r to zero */

switch (r) {

case 0:

puts("case 0"); /** prints **/

case 1:

puts("case 1"); /** prints **/

break; /** break out of switch construct **/

case 2:

puts("case 2");

break;

case 3:

puts("case 3");

break;

default:

puts("none of the above");

} /* end of switch */

}

Listing 1-9

Various ways to test in C

The tests program (see Listing 1-9) shows three ways in which to test in a C program. The first way uses the conditional operator in an assignment statement. The conditional expression has three parts:

(test) ? if-test-is-true : if-test-is-false ## true is non-zero, false is zero

In this example, the conditional expression is used as source in an assignment:

int r = (n > k) ? k + 1 : n - 1; /* n is 111, k is 98 */

A conditional expression consists of a test, which yields one of two values: one value if the test is true and another if the test is false. The test evaluates to true (nonzero in C, with a default of 1) because n is 111 and k is 98, making the expression (n > k) true; hence, variable r is assigned the value of the expression immediately to the right of the question mark, k + 1 or 99. Otherwise, variable r would be assigned the value of the expression immediately to the right of colon, in this case 110. The expressions after the question mark and the colon could themselves be conditional expressions, but readability quickly suffers.

The conditional operator is convenient and is used commonly to assign a value to a variable or to return a value from a function. This operator also highlights a general rule in C syntax: tests are enclosed in parentheses, in this example, (n > k). The same syntax applies to if-tests and to loop-tests. Parentheses always can be used to enhance readability, as later examples emphasize, but parentheses are required for test expressions.

The middle part of the tests program introduces the syntax for if-else constructs, which can be nested to any level. For instance, the body of an else clause could itself contain an if else construct . In an if and an else if clause, the test is enclosed in parentheses. There can be an if without either an else if or an else, but any else clause must be tied to a prior if or else if, and every else if must be tied to an if. In this example, the conditions and results (in this case, puts calls) are on the same line. Here is a more readable version:

if (n < k)

puts("if");

else if (r > k)

puts("else if"); /** prints **/

else

puts("else");

In this example, the body of the if, the else if, and the else is a single statement; hence, braces are not needed. The bodies are indented for readability, but indentation has no impact on flow of control . If a body has more than one statement, the body must be enclosed in braces:

if (n < k) { /* braces needed here */

puts("if");

puts("just demoing");

}

Using braces to enclose even a single body statement is admirable but rare.

The last section of the tests program introduces the switch construct , which should be used with caution. The switch expression, in this case the value of variable r, is enclosed as usual in parentheses. The value of r now determines the flow of control. Four case clauses are listed, together with an optional default at the end. The value of r is zero, which means control moves to case 0 and the puts statement is executed. However, there is no break statement after this puts statement—and so control continues through the next case, in this example case 1; hence, the second puts statement executes. If the value of r happened to be 2, only one puts statement would execute because the case 2 body consists of the puts statement followed by a break statement.

The body of a case statement can consist of arbitrarily many statements. The critical point is this: once control enters a case construct, the flow is sequential until either a break is encountered or the switch construct itself is exited. In effect, the case expressions are targets for a high-level goto, and control continues straight line until there is a break or the end of the switch.

The break statement can be used to break out of a switch construct, or out of a loop. The discussion now turns to loops.

C has three looping constructs: while, do while, and for. Any one of the three looping constructs is sufficient to implement program logic, but each type of loop has its natural uses. For instance, a counted loop that needs to iterate a specified number of times could be implemented as while loop, but a for loop readily fits this bill. A conditional loop that iterates until a specified condition fails to hold is implemented naturally as a while or a do while loop.

The general form of a while loop is

while (<condition>) {

/* body */

}

If the condition is true (nonzero), the body executes, after which the condition is tested again. If the condition is false (zero), control jumps to the first statement beyond the loop’s body. (If the loop’s body consists of a single statement, the body need not be enclosed in parentheses.) The do while construct is similar, except that the loop condition occurs at the end rather than at the beginning of a loop; hence, the body of a do while loop executes at least once. The general form is

do {

/* body */

} while (<condition>);

The break statement in C breaks out of a single loop. Consider this code segment:

while (someCondition) { /* loop 1 */

while (anotherCondition) { /* loop 2 */

/* ... */

if (thisHappens) break; /* breaks out of loop2, but not loop1 */

}

/* ... */

}

The break statement in loop2 breaks out of this loop only, and control resumes within loop1. C does have goto statement whose target is a label, but this control construct should be mentioned just once and avoided thereafter .

#include <stdio.h>

int main() {

int n = -1;

while (1) { /* 1 == true */

printf("A non-negative integer, please: ");

scanf("%i", &n);

if (n > 0) break; /* break out of the loop */

}

printf("n is %i ", n);

n = -1;

do {

printf("A non-negative integer, please: ");

scanf("%i", &n);

} while (n < 0);

printf("n is %i ", n);

return 0;

}

Listing 1-10

The while and do while loops

The whiling program (see Listing 1-10) prompts the user for a nonnegative integer and then prints its value. The program does not otherwise validate the input but rather assumes that only decimal numerals and, perhaps, the minus sign are entered. The focus is on contrasting a while and a do while for the same task.

The condition for the while loop is 1, the default value for true:

while (1) { /* 1 == true */

This loop might be an infinite one except that there is a break statement, which exits the loop: if the user enters a nonnegative integer, the break executes.

The do while loop is better suited for the task at hand: first, the user enters a value, and only then does the loop condition test whether the value is greater than zero; if so, the loop exits. In both loops, the scanf function is used to read user input. The details about scanf and its close relatives can wait until later.

Among the looping constructs, the for loop has the most complicated syntax. Its general form is

for (<init>;<condition>;<post-body>) {

/* body */

}

A common example is

for (i = 0; i < limit; i = i + 1) { /* int i, limit = 100; from above */

/* body */

}

The init section executes exactly once, before anything else. Then the condition is evaluated: if true, the loop’s body is executed; otherwise, control goes to the first statement beyond the loop’s body. The post-body expression is evaluated per iteration after the body executes; then the condition is evaluated again; and so on. Any part of the for loop can be empty. The construct

for (;;) { /* huh? */ }

is an obfuscated version of a potentially infinite loop . As shown earlier, a more readable way to write such a loop is

while (1) { /** clearer **/ }

1.7 Normal Flow of Control in Function Calls

A called function usually returns to its caller. If a called function returns a value, the function has a return statement that both returns the value and marks the end of the function’s execution: control returns to the caller at the point immediately beyond the call. A function with void instead of a return type might contain a return statement, but without a value; if not, the function returns after executing the last statement in the block that makes up the function’s body.

The normal return-to-caller behavior takes advantage of how modern systems provide scratchpad for called functions. This scratchpad is a mix of general-purpose CPU registers and stack storage. As functions are called, the call frames on the stack are allocated automatically; as functions return, these call frames can be freed up for future use. The underlying system bookkeeping is simple, and the mechanism itself is efficient in that registers and stack call frames are reused across consecutive function calls .

#include <stdio.h>

#include <stdlib.h> /* rand() */

int g() {

return rand() % 100; /* % is modulus; hence, a number 0 through 99 */

}

int f(int multiplier) {

int t = g();

return t * multiplier;

}

int main() {

int n = 72;

int r = f(n);

printf("Calling f with %i resulted in %i. ", n, r); /* 5976 on sample run */

return r; /* not usual, but all that's required is a returned int */

}

Example 1-1

Normal calls and returns for functions

The calling program (see Example 1-1) illustrates the basics of normal return-to-caller behavior. When the calling program is launched from the command line, recall that a system function in the exec family invokes the calling program’s main function. In this example, main then calls function f with an int argument, which function f uses a multiplier. The number to be multiplied comes from function g, which f calls. Function g, in turn, invokes the library function rand, which returns a pseudorandomly generated integer value. Here is a summary of the calls and returns , which seem so natural in modern programming languages:

calls calls calls calls

exec-function------->main()------->f(int)------->g()------->rand()

exec-function<-------main()<-------f(int)<-------g()<-------rand()

returns returns returns returns

Further examples flesh out the details in the return-to-caller pattern. One such example analyzes the assembly code in the pattern. A later example looks at abnormal flow of control through signals, which can interrupt an executing program and thereby disrupt the normal pattern.

1.8 Functions with a Variable Number of Arguments

The by-now-familiar printf function takes a variable number of arguments. Here is its declaration:

int printf(const char* format, ...); /* returns number of characters printed */

The first argument is the format string , and the optional remaining arguments—represented by the ellipsis—are the values to be formatted. The printf function requires the first argument, but the number of additional arguments depends on the number of values to be formatted. There are many other library functions that take a variable number of arguments, and programmer-defined functions can do the same. Two examples illustrate .

#include <stdio.h>

#include <unistd.h>

#include <sys/syscall.h>

int main() {

/* 0755: owner has read/write/execute permissions, others read/execute permissions */

int perms = 0755; /* 0 indicates base-8, octal */

int status = syscall(SYS_chmod, "/usr/local/website", perms);

if (-1 == status) perror(NULL);

return 0;

}

Example 1-2

The library function syscall

The sysCall program (see Example 1-2) invokes the library function syscall , which takes a variable number of arguments; the first argument, in this case the symbolic constant SYS_chmod, is required. SYS_chmod is clarified shortly.

The syscall function is an indirect way to make system calls, that is, to invoke functions that execute in kernel space, the address space reserved for those privileged operating system routines that manage shared system resources: processors, memory, and input/output devices. This indirect approach allows for fine-tuning that the direct approach might not provide. This example is contrived in that the function chmod (change mode) could be called directly with the same effect. The mode refers to various permissions (e.g., read and write permissions) on the target, in this case a directory on the local file system.

As noted, the first argument to syscall is required. The argument is an integer value that identifies the system function to call. In this case, the argument is SYS_chmod , which is defined as 90 in the header file syscall.h and identifies the system function chmod. The variable arguments to function syscall are as follows:

The path to the file whose mode is to be changed, in this case /usr/local/website. The path is given as a string. (The directory /usr/local/website must exist for the program to work, and this directory must be accessible to whoever runs the program.)
The file permissions , in this case 0777 (base-8): everyone can read/write/execute.

The header file stdarg.h has a data type va_list (list of variable arguments) together with utilities to help programmers write functions with a variable number of arguments. These utilities allocate and deallocate storage for the variable arguments, support iteration over these arguments, and convert each argument to whatever data type is appropriate. The utilities are well designed and worth using. As a popular illustration of a function with a variable number of arguments, the next code example sums up and then averages the arguments. In the example, the required argument and the others happen to be of the same data type, in the current case int, but this is not a requirement. Recall again the printf function , whose first argument is a string but whose optional, variable arguments all could be of different types.

#include <stdio.h>

#include <stdarg.h> /* va_list type, va_start va_arg va_end utilities */

double avg(int count, ...) { /* count is how many, ellipses are the other args */

double sum = 0.0;

va_list args;

va_start(args, count); /* allocate storage for the additional args */

int i;

for (i = 0; i < count; i++) sum += va_arg(args, int); /* compute the running sum */

va_end(args); /* deallocate the storage for the list */

if (count > 0) return sum / count; /* compiler promotes count to double */

else return 0;

}

void main() {

printf("%f ", avg(4, 1, 2, 3, 4));

printf("%f ", avg(9, 9, 8, 7, 6, 5, 4, 3, 2, 1));

printf("%f ", avg(0));

}

Example 1-3

A function with a variable number of arguments

The varArgs program (see Example 1-3) defines a function avg with one named argument count and then an ellipsis that represents the variable number of other arguments. In this example, the int parameter count is a placeholder for the required argument, which specifies how many other arguments there are. In the first call from main to the function avg , the first 4 in the list become count, and the remaining four values make up the variable arguments.

In the function avg , local variable nums is declared to be of type va_list. The utility va_start is called with args as its first argument and count as its second. The effect is to provide storage for the variable arguments. The later call to va_end signals that this storage no longer is needed. Between the two calls, the va_arg utility is used to extract from the list one int value at a time. The programmer needs to specify, in the second argument to va_arg, the data type of the variable arguments. In this example, the type is the same throughout: int. In a richer example, however, the type could vary from one argument to the next. Finally, function main makes three calls to function avg, including a call that has no arguments other than the required one, which is 0.

1.9 What’s Next?

C has basic or primitive data types such as char (8 bits), int (typically 32 bits), float (typically 32 bits), and double (typically 64 bits) together with mechanisms to create arbitrarily rich, programmer-defined types such as Employee and digital_certificate. Names for the primitive types are in lowercase. Data type names, like identifiers in general, start with a letter or an underscore, and the names can contain any mix of uppercase and lowercase characters together with decimal numerals. Most modern languages have naming conventions similar to those in C. The basic types in C deliberately match the ones on the underlying system, which is one way that C serves as a portable assembly language. The next chapter focuses on data types, built-in and programmer-defined.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 1. Program Structure

Create new playlist

Sign In

Sign Up

1. Program Structure

1.1 Overview

1.2 The Function

1.3 The Function main

1.4 C Functions and Assembly Callable Blocks

1.4.1 A Simpler Program in Assembly Code

1.5 Passing Command-Line Arguments to main

1.6 Control Structures

1.7 Normal Flow of Control in Function Calls

1.8 Functions with a Variable Number of Arguments

1.9 What’s Next?

Table of Contents for
1. Program Structure