Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

2. Basic Data Types

Martin Kalin¹

(1)

Chicago, IL, USA

2.1 Overview

C requires explicit data typing for variables, arguments passed to a function, and a value returned from a function. The names for C data types occur in many other languages as well: int for signed integers, float for floating-point numbers, char for numeric values that serve as character codes, and so on. C programmers can define arbitrarily rich data types of their own such as Employee and Movie, which reduce ultimately to primitive types such as int and float. C’s built-in data types deliberately mirror machine-level types such as integers and floating-point numbers of various sizes.

At a technical level, a data type such as int, float, char, or Employee determines

The amount of memory required to store values of the type (e.g., the int value -3232, a pointer to the string “ABC”)
The operations allowed on values of type (e.g., an int value can be shifted left or right, whereas a float value should not be shifted at all)

The sizeof operator gives the size in bytes for any data type or value of that type. Here is a code segment to illustrate:

printf("%lu ", sizeof(char)); /* 1 (%lu... for long unsigned) */

printf("%lu %lu ", sizeof(float), sizeof(99)); /* 4, 4 */

The sizeof(char) is required to be 1, which accommodates 7-bit and 8-bit character encodings such as ASCII and Latin-1, respectively. C also has a wchar_t type (w for wide), which is 4 bytes in size and designed for multibyte character codes such as Unicode. Types other than char, such as int and float, must be at least sizeof(char) but typically are greater. On a modern handheld device or desktop computer, for example, sizeof(int) and sizeof(float) are 4 bytes apiece .

#include <stdio.h>

#include <wchar.h> /* wchar_t type */

void main() {

printf("char size: %lu ", sizeof(char)); /* 1 (long unsigned) */

printf("wchar_t size: %lu ", sizeof(wchar_t)); /* 4 */

/* Signed and unsigned variants of each type are of same size. */

printf("short size: %lu ", sizeof(short)); /* 2 */

printf("int size: %lu ", sizeof(int)); /* 4 */

printf("long size: %lu ", sizeof(long)); /* 8 */

printf("long long size: %lu ", sizeof(long long)); /* 8, maybe more */

/* floating point types are all signed */

printf("float size: %lu ", sizeof(float)); /* 4 */

printf("double size: %lu ", sizeof(double)); /* 8 */

printf("long double size: %lu ", sizeof(long double)); /* 16 */

}

Listing 2-1

The sizeof various basic data types

The dataTypes (see Listing 2-1) program prints the byte sizes for the basic C data types. These sizes are the usual ones on modern devices. The following sections focus on C’s built-in data types and built-in operations on these types. Technical matters such as the 2’s complement representation of integers and the IEEE 754 standard for floating-point formats is covered in detail.

2.2 Integer Types

All of C’s integer types come in signed and unsigned flavors. The unsigned types have a one-field implementation: all of the bits are magnitude bits. By contrast, signed types have a two-field implementation:

The most significant (by convention, the leftmost) bit is the sign bit, with 0 for nonnegative and 1 for negative.
The remaining bits are magnitude bits.

The signed and unsigned integer types come in various sizes.

Table 2-1 lists the basic integer types in C, which have the very bit sizes as their machine-level counterparts. C also has a long long type, which must be at least 8 bytes in size and typically is the same size as long: 8.

Table 2-1

Basic integer data types

Type	Byte size	Range
unsigned char	1	0 to 255
signed char	1	-128 to 127
unsigned short	2	0 to 65,535
signed short	2	-32,768 to 32,767
unsigned int	4	0 to 4,294,967,295
signed int	4	-2,147,483,648 to 2,147,483,647
unsigned long	8	0 to 18,446,744,073,709,551,615
signed long	8	–9,223,372,036,854,775,808 to 9,223,372,036,854,775,807

C does not have a distinct boolean type but instead uses integer values to represent true and false: 0 represents false, and any nonzero value (e.g., -999 and 403) represents true. The default value for true is 1. For example, a potentially infinite loop might start out like this:

while (1) { /** 1 is true in boolean context **/

In C source code, an integer constant such as 22 defaults to type int, where int is shorthand for signed int. The constant 22L or 22l is of type long. Here are some quick examples of data type shorthands :

int n; /* short for: signed int n; */

signed m; /* short for: signed int m; */

unsigned k; /* short for: unsigned int k; */

short s; /* short for: signed short s; */

signed short t; /* the full type written out */

As the examples indicate, unsigned must be used explicitly if unsigned is the desired variant.

The type of a variable does not restrict the bits that can be stored in it, which means that even everyday C can be obfuscating. An example may be useful here.

#include <stdio.h>

#include <limits.h> /* includes convenient min/max values for integer types */

void main() { /* void instead of int for some variety */

unsigned int n = -1, m = UINT_MAX; /* In 2's complement, -1 is all 1s */

signed int k = 0xffffffff; /* 0x or 0X for hex: f = 4 1s in hex */

if (n == m) printf("m and n have the same value "); /* prints */

if (k == m) printf("m and k have the same value "); /* prints */

printf("small as signed == %i, small as unsigned == %u ",

n, n); /* -1, 4294967295 */

signed int small = -1; /* signed converts to unsigned in mixed comparisons */

unsigned int big = 98765; /* comparing big and small is a mixed comparison */

if (small > big) printf("yep, something's up... "); /** small value is UINT_MAX **/

}

Listing 2-2

Data types and bits

The obfusc program (see Listing 2-2) is a cautionary tale on the distinction between internal (machine-level) and external (human-level) representation. The example’s important points can be summed up as follows:

The data type of a variable does not restrict the bits that can be assigned to it. For example, the compiler does not warn against assigning the negative value -1 to the unsigned variable n. For the compiler, the decimal value -1 is, in the 2’s complement representation now common across computing devices, all 1s in binary. Accordingly, the variable n holds 32 1s when -1 is assigned to this variable. (Further details of the 2’s complement representation are covered shortly.)
The equality operator == , when applied to integer values, checks for identical bit patterns. If the left and the right side expressions (in this example, the values of two variables) have identical bit patterns, the comparison is true; otherwise, false. The variables n, m, and k all store 32 1s in binary; hence, they are all equal in value by the equality operator ==.
In print statements , the internal representation of a value (the bit string) can be formatted to yield different external representations. For example, the 32 1s stored in variable n can be printed as a negative decimal value using the formatter %i (integer) or %d (decimal). Recall that in 2’s complement, a value is negative if its high-order (leftmost) bit is a 1; hence, the %i formatter for signed values treats the 32 1s as the negative value -1: the high-order bit is the sign bit 1 (negative), and the remaining bits are the magnitude bits. By contrast, the %u formatter for unsigned treats all of the bits as magnitude bits, which yields the value of the symbolic constant UINT_MAX (4,294,967,295) in decimal.
Comparing expressions of mixed data types is risky because the compiler coerces one of the types to the other, following rules that may not be obvious. In this example, the value -1 stored in the signed variable small is converted to unsigned so that the comparison is apple to apple rather than apple to orange. As noted earlier, -1 is all 1s in binary; hence, as unsigned, this value is UNIT_MAX, far greater than the 98,765 stored in big.

In mixed integer comparisons, the compiler follows two general rules:

Signed values are converted to unsigned ones.
Smaller value types are converted to larger ones. For example, if a 2-byte short is compared to a 4-byte int, then the short value is converted to an int value for the comparison.

When floating-point values occur in expressions with integer values, the compiler converts the integer values into floating-point ones.

In assembly code , an instruction such as cmpl would be used to compare two integer values. The l in cmpl determines the number of bits compared: in this case, 32 because l is for longword, a 32-bit word in the Intel architecture. Were two 64-bit values being compared, then the instruction would be cmpq instead, as the q stands for quadword, a 64-bit word in this same architecture. At the assembly level, as at the machine level, the size of a data type is built into the instruction’s opcode, in this example cmpl.

An earlier example showed that C’s signed char and unsigned char are likewise integer types. As the name char indicates, the char type is designed to store single-byte character codes (e.g., ASCII and Latin-1); the more recent wchar_t type also is an integer type, but one designed for multibyte character codes (e.g., Unicode). For historical reasons, the char type is shorthand for either signed char or unsigned char, but which is platform dependent. For the remaining types, this is not the case. For example, short is definitely an abbreviation for signed short.

2.2.1 A Caution on the 2’s Complement Representation

The 2’s complement representation of signed integers has a surprising but well-publicized peculiarity. The header file limits.h provides the constant INT_MIN, the minimum value for a 4-byte signed int value. The binary representation, with the most significant bits on the left, is

10000000 00000000 00000000 00000000 /* INT_MIN in binary */

For readability, the binary representation has been broken into four 8-bit chunks. The rightmost (least significant) bit is a 0, which makes the value (-2,147,483,648) even rather than odd. The leftmost (most significant) bit is the sign bit: 1 for negative as in this case and 0 for nonnegative. There are similar constants for other integer types (for instance, SHRT_MIN and LONG_MIN).

There is a straightforward algorithm for computing the absolute value of a negative 2’s complement value. For example, recall that the -1 in binary, under the 2’s complement representation , is all 1s: 1111…1. Here is the recipe for computing the absolute value in binary:

1.
Invert the 1s in -1, which yields all 0s: 00000…000.
2.
Add 1, which yields 00000…001 or 1 in binary and decimal, the absolute value of -1 in decimal.

The same recipe yields -1 from 1: invert the bits in 1 (yielding 11111…0) and then add 1 (yielding 11111…1), which again is all 1s in binary and -1 in decimal.

In the case of INT_MIN, the peculiarity becomes obvious:

1.
Invert the bits, which transforms INT_MIN to 01111111 11111111 11111111 11111111.
2.
Add 1 to yield 10000000 00000000 00000000 00000000, which is INT_MIN again.

In C, the unary minus operator is shorthand for (a) inverting the bits and (b) adding 1. This code segment illustrates

int n = 7;

int k = -n; /* unary-minus operator */

int m = ~n + 1; /* complement operator and addition by 1 */

The value of k and of m is the same: -7. In the case of INT_MIN, however, the peculiarity is that

INT_MIN == -INT_MIN

A modern C compiler does issue a warning when encountering the expression -INT_MIN, cautioning that the expression causes an overflow because of the addition operation. By the way, no other int value is equal to its negation under the 2’s complement representation.

2.2.2 Integer Overflow

A programmer who uses any of the primitive C types needs to stay alert when it comes to sizeof and the potential for overflow. The next code example illustrates with the int type.

#include <stdio.h>

#include <limits.h> /* INT_MAX */

int main() {

printf("Max int in %lu bytes %i. ", sizeof(int), INT_MAX); /* 4 bytes 2,147,483,647 */

int n = 81;

while (n > 0) {

printf("%12i %12x ", n, n);

n *= n; /* n = n * n */

}

printf("%12i %12x ", n, n); /* -501334399 e21e3e81 */

return 0;

}

/* 81 51

6561 19a1

43046721 290d741

-501334399 e21e3e81 ## e is 1101 in binary */

Listing 2-3

Integer overflow

The overflow program (see Listing 2-3) initializes int variable n to 81 and then loops. In each loop iteration, n is multiplied by itself as long as the resulting value is greater than zero. The trace shows that loop iterates three times, and on the third iteration, the new value of n becomes negative . As the hex output shows, the leftmost (most significant) four bits are hex digit e, which is 1110 in binary: the leftmost 1 is now the sign bit for negative. In this example, the overflow could be delayed, but not prevented, by using a long instead of an int.

There is no compiler warning in the overflow program that overflow may result. It is up to the programmer to safeguard against this possibility. There are libraries that support arbitrary-precision arithmetic in C, including the GMP library (GNU Multiple Precision Arithmetic Library at https://gmplib.org ). A later code example uses embedded assembly code to check for overflow.

2.3 Floating-Point Types

C has the floating-point types appropriate in a modern, general-purpose language. Computers as a rule implement the IEEE 754 specification ( https://standards.ieee.org/ieee/754/6210/ ) in their floating-point hardware, so C implementations follow this specification as well.

Table 2-2 lists C’s basic floating-point types . Floating-point types are signed only, and their values have a three-field representation under IEEE 754: sign bits, exponent bits, and significand (magnitude) bits. A floating-point constant such as 3.1 is of type double in C, whereas 3.1F and 3.1f are of type float. Recall that a double is 8 bytes in size, but a float is only 4 bytes in size.

Table 2-2

Basic floating-point data types

Type	Byte size	Range	Precision
float	4	1.2E-38 to 3.4E+38	6 places
double	8	2.3E-308 to 1.7E+308	15 places
long double	16	3.4E-4932 to 1.1E+4932	19 places

2.3.1 Floating-Point Challenges

Floating-point types pose challenges that make these types unsuitable for certain applications. For instance, there are decimal values such as 0.1 that have no exact binary representation, as this short code segment shows:

float n = 0.1f;

printf("%.24f ", n); /* 0.100000001490116119384766 */

In the printf statement , the formatter %.24f specifies a precision of 24 decimal places. As a later example illustrates, unexpected rounding up can occur when a particular decimal value does not have an exact binary representation. Even this short code segment underscores that floating-point types should not be used in financial, engineering, and other applications that require exactness and precision. In such applications, there are libraries such as GMP ( http://gmplib.org ), mentioned earlier, to support arbitrary-precision arithmetic.

What’s a Macro?

A macro is a code fragment with a name and is created with a #define directive. The macro expands into its definition during the preprocessing stage of compilation. Here is a macro for pi from the math.h header file:

#define M_PI 3.14159265358979323846 /* the # need not be flush against the define */

Although macros are often named in uppercase, this is convention only. Here are two parameterized macros for computing the max and min of two integer arguments:

#define min(x, y) (y) ^ ((x ^ y) & -(x < y)) /* details of bitwise operators later */

#define max(x, y) (x) ^ ((x ^ y) & -(x < y)) /* ^ bitwise xor, & bitwise and */

These macros look like functions, but the compiler does no type-checking on the arguments. Here are two sample uses:

int n = min(-127, 44); /* -127 */

n = max(373, 1404); /* 1404 */

Another example underscores the problem of comparing floating-point values , in particular for equality. Imagine a company in which sales people earn a bonus if they sell 83% of their quota by the end of the third quarter. The company assumes that the remaining 17% of the quota, and probably more, will be met in the last quarter. In this company, 83% is defined in the official spreadsheet as the value 5.0 / 6.0. (On my handheld calculator, 5.0 / 6.0 evaluates to 0.833333333.) However, a legacy program computes 83% as (1.0 / 3.0) × 2.5. At issue, then, is whether (1.0 / 3.0) × 2.5 = 5.0 / 6.0. Here is a segment of C code that makes the comparison , using double values:

if (((1.0 / 3.0) * 2.5) == (5.0 / 6.0)) /* equal? */

printf("Equal ");

else

printf("Not equal "); /** prints **/

A look at the hex values for the two expressions confirms that they are not equal:

3f ea aa aa aa aa aa aa /* (1.0 / 3.0) x 2.5 */

3f ea aa aa aa aa aa ab /* 5.0 / 6.0 */

The two differ in the least significant digit: hex a is 1010 in binary, whereas hex b is 1011 in binary. The two values differ ever so slightly, in the least significant (rightmost) bit of their binary representations. In close-to-the-metal C, the equality operator compares bits; at the bit level, the two expressions differ.

High-level languages provide a way to make approximate comparisons where appropriate. In particular, the header file math.h defines the macro FLT_EPSILON , which represents the difference between 1.0f and the smallest, 32-bit floating-point value greater than 1.0f. The value of FLT_EPSILON should be no greater than 1.0e-5f. On my desktop computer:

FLT_EPSILON == 1.192092895508e-07 /** e or E for scientific notation **/

C has similar constants for other floating-point types (e.g., DBL_EPSILON).

float f1 = 5.0f / 6.0f;

float f2 = (1.0f / 3.0f) * 2.5f;

if (fabs(f1 - f2) < FLT_EPSILON) /* fabs for floating-point absolute value */

printf("fabs(f1 - f2) < FLT_EPSILON "); /* prints */

Listing 2-4

Approximate equality

The comp code segment (see Listing 2-4) shows how a comparison can be made using FLT_EPSILON. The library function fabs returns the absolute value of the difference between f1 and f2. This value is less than FLT_EPSILON ; hence, the two values might be considered equal because their difference is less than FLT_EPSILON.

The next two examples reinforce the risks that come with floating-point types. The goal is to show various familiar programming contexts in which floating-point issues arise. Following each example is a short discussion.

/* 1.010000

2.020000

...

7.070001 ;; rounding up now evident

...

10.100001

float incr = 1.01f;

float num = incr;

int i = 0;

while (i++ < 10) { /* i++ is the post-increment operator */

printf("%12f ", num); /* %12f is field width, not precision */

num += incr;

}

Listing 2-5

Issues with floating-point data types

The rounding program (see Listing 2-5) initializes a variable to 1.01 and then increments this variable by that amount in a loop that iterates ten times. The rounding up becomes evident in the seventh loop iteration: the expected value is 7.070000, but the printed value is 7.07001. Note that the formatter is %12f rather than %.12f. In the latter case, the printouts would show 12 decimal places but here show the default places, which happens to be six. Instead, the 12 in %12f sets the field width, which right-justifies the output to make it more readable.

What’s the Difference Between the Pre-Increment and Post-Increment Operators?

The rounding program uses the post-increment operator on loop counter i to check, in the while condition, whether the loop counter is less than ten. C also has a pre-increment operator and both pre- and post-decrement operators. Each operator involves an evaluation and an update. Here is a code segment to illustrate the difference:

int i = 1;

printf("%i ", i++); /* 1 (evaluate, then increment) */

printf("%i ", i); /* 2 (i has been incremented above) */

printf("%i ", ++i); /* 3 (increment, then evaluate) */

#include <stdio.h>

#include <math.h> /* pi and e as macros, M_PI and M_E, respectively */

void main() {

printf("%0.50f ", 10.12); /* 10.11999999999999921840299066388979554176330566406250 */

/* On my handheld calculator: 2.2 * 1234.5678 = 2716.04916 */

double d1 = 2.2, d2 = 1234.5678;

double d3 = d1 * d2;

if (2716.04916 == d3) printf("As expected. "); /* does not print */

else printf("Not as expected: %.16f ", d3); /* 2716.0491600000004837 */

printf(" ");

/* Expected price: $84.83 */

float price = 4.99f;

int quantity = 17;

float total = price * quantity; /* compiler converts quantity to a float value */

printf("The total price is $%f. ", total); /* The total price is $84.829994. */

/* e and pi */

double ans = pow(M_E, M_PI) - M_PI; /* e and pi, respectively */

printf("%lf ", ans); /* 19.999100 prints: expected is 19.99909997 */

}

Listing 2-6

More examples of decimal-to-binary conversion

The d2bconvert program (see Listing 2-6) shows yet again how information may be lost in converting from decimal to binary. In these isolated examples, of course, no harm is done; but these cases underscore that floating-point types such as float and double are not suited for applications involving, for instance, currency.

2.3.2 IEEE 754 Floating-Point Types

This section digs into the details of the IEEE 754 binary floating-point specification ( https://standards.ieee.org/standard/754-2019.html ), using 32-bit floating-point values as the working example. The specification also covers 16-bit and 64-bit binary representations and decimal representations as well. Here is the layout of a 32-bit (single-precision) binary floating-point value under IEEE 754:

+-+--------+-----------------------+

|s|exponent| magnitude | 32 bits

+-+--------+-----------------------+

1 8 23

For reference, the written exponent comprises the 8 bits depicted previously. In the discussion that follows, the written exponent is contrasted with the actual exponent. Also, the written magnitude comprises the 23 bits shown previously and is contrasted with the actual magnitude.

The IEEE 754 specification categorizes floating-point values as either normalized or denormalized or special. The category depends on the value of the 8-bit exponent:

If the written exponent field contains a mix of 0s and 1s, the value is normalized.
If the written exponent field contains only 0s, the value is denormalized.
If the written exponent field contains only 1s, the value is special.

As the name suggests, normalized values are typical or expected ones such as -118.625, which is -1110110.101 in binary. A normalized value has an implicit leading 1, which means the written magnitude is the fractional part of the actual magnitude:

1.??????...??? ## the question marks ? are the written magnitude

For the sample value -1110110.101 (-118.625 in decimal), the implicit leading 1 is obtained by moving the binary point six places to the left, which yields -1.110110101 × 2⁶. The written magnitude is then the fractional part 110110101.

In the example, the actual exponent is 6, as shown in the expression -1.110110101 × 2⁶. However, the written exponent of 133 (10000101 in binary) is biased, with a bias of 127 for the 32-bit case. The bias is subtracted from the written exponent to get the actual exponent:

actual exponent = written exponent - 127 ## 133 - 127 = 6

In summary, the decimal value -118.625 has a written exponent of 133 in IEEE 754, but an actual exponent of 6.

Finally, the sample value is negative, which means the most significant (leftmost) bit is a 1. The 32-bit representation for the decimal value -188.625 is

1 10000101 11011010100000000000000 ## 14 zeros pad to make 23 bits

The middle field alone, the 8-bit exponent, indicates that this value is indeed normalized: the written exponent contains a mix of 0s and 1s.

Denormalized values cover two representations of zero and evenly spaced values in the vicinity of zero. Zero can represented as either a negative or a nonnegative value under the IEEE specification, which the C compiler honors:

if (-0.0F == 0.0F) puts("yes!"); /* prints */

The IEEE representation of zero is intuitive in that every bit—except, perhaps, the sign bit—is a 0. A denormalized value does not have an implicit leading 1, and the actual exponent has a fixed value of -126 in the 32-bit case. The written exponent is always all 0s.

What motivates the denormalized category beyond the two representations of zero? Consider the three values in Table 2-3, in particular the binary column. In the first row, the value has a single 1—the least significant bit of the written exponent. Yet this exponent still contains a mix of 0s and 1s and so is normalized: it is the smallest positive normalized value in 32 bits.

Table 2-3

Positive denormalized and normalized values

Binary	Decimal
0 00000001 00000000000000000000000	1.175494350822e-38
0 00000000 11111111111111111111111	1.175494210692e-38
0 00000000 00000000000000000000001	1.401298464325e-45

The value in the middle row has all 0s in the exponent, which makes the value denormalized. This value is the largest denormalized value in 32 bits, but this value is still smaller than the very small normalized value above it. The smallest denormalized value, the bottom row in the table, has a single 1 as the least significant bit: all the rest are 0s. Between the smallest and the largest denormalized values are many more, all differing in the bit pattern of the written magnitude. Although the denormalized values shown so far are positive, there are negative ones as well: the sign bit is 1 for such values.

In summary, denormalized values cover the two representations of zero, as well as evenly spaced values that are close to zero. The preceding examples show that the gap between the smallest positive normalized value and positive zero is considerable and filled with denormalized values.

The third IEEE category covers special values , three in particular: NaN (Not a Number), positive infinity, and negative infinity. A written exponent of all 1s signals a special value. If the written magnitude contains all 0s, then the value is either negative or positive infinity, with the sign bit determining the difference. If the written magnitude contains at least one 1, the value is NaN. A short code segment clarifies.

#include <stdio.h>

#include <math.h>

/** gcc -o specVal specVal.c -lm **/

void main() {

printf("Sqrt of -1: %f ", sqrt(-1.0F)); /* 1 11111111 10000000000000000000000 */

printf("Neg. infinity: %f ", 1.0F / -0.0F); /* 1 11111111 00000000000000000000000 */

printf("Pos. infinity: %f ", 1.0F / 0.0F); /* 0 11111111 00000000000000000000000 */

}

Listing 2-7

Special values under the IEEE 754 specification

The specVal program (see Listing 2-7) has the following output, with comments introduced by ##:

Sqrt of -1: -nan ## minus sign because -1.0F is negative

Neg. infinity: -inf ## negative zero as divisor

Pos. infinity: inf ## non-negative zero as divisor

The floating-point units (FPUs) of modern computers commonly follow the IEEE specification; modern languages, including C, do so in any case. There are heated discussions within the computing community on the merits of the IEEE specification, but there is little doubt that this specification is now a de facto standard across programming languages and systems.

How Does Linking Work in the Compilation Process?

Compiling the specVal program into an executable requires an explicit link flag:

% gcc -o specVal specVal.c -lm

In the flag -lm (lowercase L followed by m), the -l stands for link, and the m identifies the standard mathematics library libm, which resides in a file such as libm.so on the compiler/linker search path (e.g., in a directory such as /usr/lib or /usr/local/lib). Note that the prefix lib and the file extension so fall away in a link specification, leaving only the m for the mathematics library.

The linking is needed because the specVal program calls the sqrt function from the mathematics library. A compilation command may contain several explicit link flags in same style shown previously: -l followed by the name of the library without the prefix lib and without the library extension such as so.

During compilation , libraries such as the standard C library and the input/output library are linked in automatically. Other libraries, such as the mathematics and cryptography libraries, must be linked in explicitly. In Chapter 8, the section on building libraries goes into more detail on linking.

2.4 Arithmetic, Bitwise, and Boolean Operators

C has the usual arithmetic, bitwise, and boolean (relational) operators. Recall that even the character types char and wchar_t, and the makeshift-boolean type (zero for false, nonzero for true), are fundamentally arithmetic types. However, some operators are ill-suited for some types. For example, floating-point values should not be bit-shifted, left or right.

Recall the layout for a 32-bit floating-point value under IEEE 754 :

+-+--------+-----------------------+

|s|exponent| magnitude |

+-+--------+-----------------------+

1 8 23

Bit-shifting a floating-point type, either left or right, would cause one or more bits to change fields. On a 2-bit left shift, for instance, magnitude bits would become exponent bits, and an exponent bit would become the sign bit. The following code segment illustrates the peril of shifting floating-point values :

float f = 123.456f;

f = (int) f << 2; /* ERROR without the cast operation (int) */

printf("%f ", f); /* 492.000000 */

The second line uses a cast operation , which is an explicit type-conversion operation; in this case, the floating-point value of variable f is converted to an int value so that the compiler does not complain. (The syntax of casts is covered in the following sidebar.) In the shift operation, << represents a left shift, and >> represents a right shift. To the left of the shift operator is the value (in this case, variable f) to be shifted, and to the right is the number of bit places to shift. On left shifts, the vacated positions are filled with 0s.

If the preceding example were to omit the cast operation, the compiler would complain, with an error rather than just a warning, that the left operand to << should be an int, not a float. To get by the compiler, the code segment thus includes the cast operation.

It should be emphasized that a cast operation is not an assignment operation. In this example, the casted value 123.456 is still stored in variable f. The salient point is that floating-point values, in general, should not be shifted at all. The shift operation is intended for integer values only, and even then caution is in order—as later examples illustrate.

How Do Cast Operations Work?

A cast operation consists of a data type enclosed in parentheses immediately to the left of a value:

int n = (int) 1234.5678f; /* cast float value to int value, which is assigned to n */

float f = (float) n; /* compiler would do the conversion in any case */

n = (int) 1234.5678F << 2; /* cast required: float values should not be shifted */

A cast is not an assignment: in the second example shown previously, the cast (float) does not change what is stored in n but rather creates a new value then assigned to variable f. A cast is thus an explicit conversion of one type to another. The compiler regularly does such conversions automatically:

int n = 1234.567f; /* compiler assigns 1234 to n: automatic conversion */

For convenience, the following subsections divide the operators into the traditional categories of arithmetic, bitwise, and boolean (relational). Miscellaneous operators such as sizeof and the cast will continue to be clarified as needed.

2.4.1 Arithmetic Operators

C has the usual unary and binary arithmetic operators , and C uses the standard symbols to represent these operators. For operations such as exponentiation and square roots, C relies upon library routines, in this case the pow and sqrt functions, respectively. Table 2-4 clarifies the binary arithmetic operators with sample expressions.

Table 2-4

Binary arithmetic operators

Operation	C	Example
Addition	+	12 + 3
Subtraction	-	12 - 3
Multiplication	*	12 * 3
Division	/	12 / 3
Modulus	%	12 % 3

The plus and minus signs also designate the unary plus and unary minus operators, respectively:

int k = 5;

printf("%i %i ", +k, -k); /* 5 -5 */

The binary arithmetic operators associate left to right, with multiplication, division, and modulus having a higher precedence than addition and subtraction. For example, the expression

8 + 2 * 3

evaluates to 14 rather than 30. Of course, parentheses can be used to ensure the desired association and precedence—and to make the arithmetic expressions easier to read.

#include <stdio.h>

void main() {

int n1 = 4, n2 = 11, n3 = 7;

printf("%i ", n1 + n2 * n3); /* 81 */

printf("%i ", (n1 + n2) * n3); /* 105 */

printf("%i ", n3 * n2 % n1); /* 1 */

printf("%i ", n3 * (n2 % n1)); /* 21 */

}

Listing 2-8

Operator association and precedence

The assoc program (see Listing 2-8) shows how expressions can be parenthesized in order to get the desired association when mixed operations are in play. The use of parentheses seems easier than trying to recall precedence details, and parenthesized expressions are, in any case, easier to read.

C has variants of the assignment operator (=) that mix in arithmetic and bitwise operators. A few examples should clarify the syntax:

int n = 3;

n += 4; /* n = n + 4 */

n /= 2; /* n = n / 2 */

n <<= 1; /* n = n << 1 */

2.4.2 Boolean Operators

The boolean or relational operators are so named because the expressions in which they occur evaluate to the boolean values true or false. Although any integer value other than zero is true in C, true boolean expressions in C evaluate to the default value for true, 1. Here are some sample expressions to illustrate the boolean operators:

/** equals and not-equals **/

2 == (16 - 14) /* true: == is 'equals' */

2 != (16 / 8) /* false: != is 'not equals' */

/** greater, lesser **/

!(2 < 3) /* false: ! is 'negation' */

3 > 2 /* true: > is 'greater than' */

3 >= 3 /* true: >= is 'greater than or equal to' */

3 < 2 /* false: < is 'less than' */

3 <= 3 /* true: <= is 'less than or equal to */

/** logical-and, logical-or **/

(2 < 3) && (4 < 5) /* true: && is logical-and */

(2 < 3) || (5 < 4) /* true: || is logical-or */

A few cautionary notes are in order. Note that the operators for equality (==) and inequality (!=) both have two symbols in them. The equality operator can be tricky because it is so close to the assignment operator (=). Consider this code segment, the stuff of legend among C programmers whose code has gone awry because of some variation of the problem:

int n = 2;

if (n = 1)

printf("yep "); /** prints: presumably meant n == 1 **/

An assignment in C is an expression and so has a value—the value of the expression on the right-hand side of the = operator. Accordingly, the if test both assigns 1 to n and evaluates to 1, true; hence, the printf statement executes. Whenever a constant is to be compared against a variable, it is best to put the constant on the left. If the assignment operator = is then typed by mistake instead of the equality operator ==, the compiler catches the problem:

if (1 = n) /** won't compile **/

The logical and and logical or operators are efficient because they short-circuit. For example, in the expression

(3 < 2) && (4 > 2) /* only (3 < 2), the 1st conjunct, is evaluated */

the second conjunct (4 > 2) is not evaluated: a conjunction is true only if each of its conjuncts is true, and the first conjunct (3 < 2) is false, thereby making the entire expression false.

The boolean operators occur regularly in loop and other tests. Simple examples have been seen already:

int i = 0;

while (i < 10) { /* loop while i is less than 10 */

/* ... */

i += 1; /* increment loop counter: i++ or ++i would work, too */

}

Richer examples are yet to come.

2.4.3 Bitwise Operators

As the name suggests, the bitwise operators work on the underlying bit-string representation of data. These operators thus deserve caution, as it may be hard to visualize the outcome of bit manipulation. Bitwise operations are fast, usually requiring but a single clock tick to execute. For example, an optimizing compiler might transform a source-code expression such as

n = n * 2; /* n is an unsigned int variable: double n arithmetically */

to a left shift, shown here at the source level:

n = n << 1; /* double n by left-shifting one place */

Here are some more examples of the bitwise operators in expressions, using 4-bit values for readability :

~(0101) == 1010 /* invert bits: complement */

(0101 & 1110) == 0100 /* bitwise-and */

(0101 | 1110) == 1110 /* bitwise-inclusive-or */

(0101 ^ 1110) == 1011 /* bitwise-exclusive-or */

(0111 << 2) == 1100 /* left shift */

(0111 >> 2) == 0001 /* right shift */

The complement or bit inversion operator is tied to the unary minus operator considered earlier. Given an underlying 2’s complement representation of signed integers, recall that the unary minus operator can be viewed as a combination of two operations: complement and increment by 1. Another example illustrates:

int n = 5;

if (-n == (~n + 1))

printf("yep "); /* prints */

The shift operators require caution because overshifting in either direction is a misstep. As noted earlier, the compiler intervenes in case floating-point values are shifted left or right. At issue now are shifts of integer values. With signed integer values, left shifts can be risky because they may change the sign. Consider this example:

int n = 0x70000000; /* 7 in binary is 0111 */

printf("%i %i ", n, n << 1); /* 1879048192 -536870912 */

The bit-level representation of n starts out 01110..., with the leftmost bit as the sign bit 0 for nonnegative. The 1-bit left shift moves a 1 into the sign position, which accounts for change in sign from 1,879,048,192 to -536,870,912. Recall that, in left shifts, the vacated bit positions are filled with 0s.

Right shifts can be even trickier. Consider the signed integer value 0xffffffff in hex, which is all 1s in binary; in decimal, this is -1. Even in a 1-bit right shift, the sign could change to 0—if the shift is logical, that is, if the vacated bit is filled with a 0. If the shift is sign preserving, it is an arithmetic shift : the sign bit becomes the filler for the vacated positions. Whether a right is logical or arithmetic is platform dependent. In general, it is best to shift only unsigned integer values. Even in this case, of course, overshifting is possible; but at least the issue of sign preservation does not arise.

unsigned int endian_reverse32(unsigned int n) { /* designed for 32 bits, or 4 bytes */

return (n >> 24) | /* leftmost byte becomes rightmost */

((n << 8) & 0x00FF0000) | /* swap the two inner bytes */

((n >> 8) & 0x0000FF00) | /* ditto */

(n << 24); /* rightmost byte becomes leftmost */

}

Listing 2-9

Reversing the endian-ness of a multibyte data item

The endian code segment (see Listing 2-9) uses bitwise operators in a utility function that reverses the endian-ness of a 4-byte integer. Modern machines are still byte addressable in that an address is that of a single byte. For multibyte entities such as a 4-byte integer, an address thus points to a byte at one end or the other in the sequence of 4 bytes. Given this 4-byte integer

+----+----+----+----+

| B1 | B2 | B3 | B4 | ## B1 is high-order byte, B4 is low-order byte

+----+----+----+----+

the integer’s address would be either that of B1 (high-order byte) or that of B4 (low-order byte). Standard network protocols are big endian, with the integer’s address that of the big (high-order) byte B1; Intel machines are little endian, with the integer’s address that of the little (low-order) byte B4. (ARM machines are little endian by default but can be configured, as needed, to be big endian.) Given the preceding depiction, the endian program would reverse the byte order to yield:

+----+----+----+----+

| B4 | B3 | B2 | B1 | ## B4 is high-order byte, B1 is low-order byte

+----+----+----+----+

A short code example illustrates, with integer n initialized to a hex value for clarity:

unsigned n = 0x1234abcd;

printf("%x %x ", n, endian_reverse(n)); /* 1234abcd cdab3412 */

Recall that each hex digit is 4 bits. Accordingly, the leftmost byte in variable n is 12, and the rightmost is cd.

C has a header file endian.h that declares various functions for transforming little-endian formats to big-endian formats, and vice versa. These functions specify the bit sizes on which they work: 16 (2 bytes), 32 (4 bytes), and 64 (8 bytes).

What is an lvalue and an rvalue?

An rvalue is one that does not persist. For example, in the statement

printf("%i ", 444); /* 444 does not persist, and is thus an rvalue */

the rvalue 444 does not persist beyond the printf statement. By contrast, an lvalue does persist as the target of an assignment:

int n = 444; /* 444 persists in n beyond the assignment */

The variable n is the symbolic name of a memory location or CPU register, and a value assigned to n is thus an lvalue.

2.5 What’s Next?

The examples so far have focused mostly on scalar variables: there is an identifier for a single variable, not a collection of variables. A typical example is

int n = -1234; /* n identifies a single variable */

C also supports aggregates, a collection of variables under a single name. Here is one example:

char* str = "abcd"; /* string literal abcd is a null-terminated array of chars */

printf("%c ", str[0]); /* string[0] = 1st of 5 variables, %c for character */

The pointer variable str identifies a collection (in this case, an array) of five characters: the ones shown and the null terminator. The expression str[0] refers to the first of the variables that hold a character, lowercase a in this example. Pointer str thus identifies an aggregate rather than just a single variable.

Arrays and structures are the primary aggregates in C. Pointers also deserve a closer look because they dominate in efficient, production-grade programming. The next chapter focuses on aggregates and pointers.

What’s the Relationship Between C and C++?

C is a small, strictly procedural or imperative language. C++ is a large language that can be used in procedural style but also includes object-oriented features (e.g., classes, inheritance, and polymorphism) not found in C. C++, unlike C, has generic collection types. A C++ program can include orthodox C code, but much depends on the compiler; further, header files and the corresponding libraries may differ in name and location between the two languages. The two languages share history and features but are distinct.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 2. Basic Data Types

Create new playlist

Sign In

Sign Up

2. Basic Data Types

2.1 Overview

2.2 Integer Types

2.2.1 A Caution on the 2’s Complement Representation

2.2.2 Integer Overflow

2.3 Floating-Point Types

2.3.1 Floating-Point Challenges

2.3.2 IEEE 754 Floating-Point Types

2.4 Arithmetic, Bitwise, and Boolean Operators

2.4.1 Arithmetic Operators

2.4.2 Boolean Operators

2.4.3 Bitwise Operators

2.5 What’s Next?

Table of Contents for
2. Basic Data Types