Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 1
Introduction to Block Ciphers

1.1 Block Cipher in Cryptology

1.1.1 Introduction

Information includes our private data that we desire to protect from unwilling leakage depending on the application. Cryptology is a field of research that offers appropriate solutions for the data protection by exploring how to construct a secure communication for fair information exchange. Modern cryptology often deals with digitalized data rather than analog data that cannot be expressed simply with a series of 0s and 1s. In our daily life, information is exchanged by digital devices such as radio frequency identification (RFID) tags, smart cards, and smart phones, where a computational resource is limited. Therefore, it is one of the most important challenges in cryptology to realize an efficient implementation of cryptosystems.

c1f001 — **Figure 1.1** Basic model for a symmetric-key cryptosystem

1.1.2 Symmetric-Key Ciphers

There are various ways to realize encryption that is a kind of computational process for information to be protected. In a symmetric-key cipher, information is encrypted with a secret key, and it is expected that the owner of the secret key can decrypt the encrypted information correctly. For instance, let us see the situation, where Alice would like to send a message to Bob in a secure way. If the secret key, K, is shared only with Alice and Bob, only Bob can decrypt the message from the encrypted message. The original and the encrypted messages are called plaintext and ciphertext, respectively. Figure 1.1 illustrates the encryption and decryption processes.

The encryption by Alice can be written as

1.1

The ciphertext is decrypted by Bob as

1.2

Only Bob can decrypt and read the message, and Eve, who does not own the secret key, cannot decrypt it.

Alice and Bob need to compute the cryptographic operations based on the functions, $c1-math-0004$ and $c1-math-0005$ . The simpler the functions are, the more efficiently they can compute. For instance, Vernam cipher, invented in 1917, uses just XOR operations as

1.3

to convert plaintext and ciphertext. The XOR operation is explained in Section 1.2.1.

However, in order to guarantee the security, that is, in order that Eve cannot obtain any information of message from C, the secret key needs to be refreshed with a random number for each encryption/decryption. In other words, in order to communicate securely with the Vernam cipher, a very long key, which is the same size as M, is required. This is significantly inefficient. In general, encryption anddecryption processes are based on the trade-offs between cost, performance, and security.

1.1.3 Efficient Block Cipher Design

The fundamental idea to achieve an efficient encryption scheme is designing a fixed-input size encryption scheme, and iteratively applying this scheme to encrypt arbitrary length messages. Such a fixed-input size encryption scheme is called block cipher, and the group of bits with the fixed-input size is called block. If the unit of operation is small enough, for example, 1 bit or 1 byte, such a symmetric-key cipher is called stream cipher. As block ciphers are expected to compute encryption and decryption efficiently, they have an iterated structure, and repeat the same function several times. Such a function is called round function. The iterated structure contributes to achieving a small program code in software and implementing a compact circuit design in hardware.

Modern block ciphers are mainly categorized into two kinds: Feistel structure and substitution-permutation network (SPN) structure. Feistel structure was employed in data encryption standard (DES) block cipher proposed in 1977. Including FEAL and Camellia, the Feistel structure has been employed by many block ciphers.

On the contrary, Advanced Encryption Standard (AES) employed SPN structure. AES is the main target of this book as it is one of the most widely used block ciphers, and it contains fundamental ideas of SPN structure. The basic mathematics to understand SPN structure and AES specification will be explained later in this chapter.

1.2 Boolean Function and Galois Field

Boolean functions are used in most of the block ciphers including AES. A Boolean function, f, is described as

1.4

where $c1-math-0011$ is called Boolean domain and $c1-math-0012$ is the set of all n-tuples $c1-math-0014$ , where $c1-math-0015$ are all in Boolean domain.¹

1.2.1 INV, OR, AND, and XOR Operators

The most simple Boolean function is inversion or the INV operation that is a bit complement. It operates as

1.5a, 1.5b equation

where $c1-math-0018$ is used for representing the INV operation. Alternatively, the logic symbol, $c1-math-0019$ , is also used for INV. In this book, we allow both usage, that is, $c1-math-0020$ .

For the case of $c1-math-0021$ , representative Boolean functions are OR, AND, and XOR. OR is defined as

1.6a, 1.6b equation

Likewise, AND and XOR are defined, respectively, as

1.7a, 1.7b equation

1.8a, 1.8b equation

“ $c1-math-0024$ ,” “ $c1-math-0025$ ,” and “ $c1-math-0026$ ” are used for representing OR, AND, and XOR operations.

The truth table for OR, AND, and XOR is described in Table 1.1.

Table 1.1 Truth table for basic operators

x	y	$c1-math-0029$	$c1-math-0030$	$c1-math-0031$
0	0	0	0	0
0	1	1	0	1
1	0	1	0	1
1	1	1	1	0

1.2.2 Galois Field

Finite filed or Galois field deals with a finite number of elements. Over a Galois filed, addition, subtraction, multiplication, and division are defined. Galois field with the smallest order is called a binary field or $c1-math-0032$ . For instance, addition, multiplication, additive inverse, and multiplicative inverse over $c1-math-0033$ are defined in Table 1.2.

Table 1.2 Operations over $c1-math-0034$

x	y	$c1-math-0037$	$c1-math-0038$	$c1-math-0039$	$c1-math-0040$
0	0	0	0	0	–
0	1	1	0	0	–
1	0	1	0	1	1
1	1	0	1	1	1

As can be found from Tables 1.1 and 1.2, addition and multiplication over $c1-math-0041$ are realized, respectively, with XOR and AND.

Table 1.3 Operations over $c1-math-0043$

x	y	$c1-math-0046$	$c1-math-0047$	$c1-math-0048$	$c1-math-0049$
0	0
0	1
0	2
0	3
0	4
1	0
1	1
1	2
1	3
1	4
2	0
2	1
2	2
2	3
2	4
3	0
3	1
3	2
3	3
3	4
4	0
4	1
4	2
4	3
4	4

1.2.3 Extended Binary Field and Representation of Elements

Binary field, $c1-math-0050$ , can be extended to a large field size called extended binary field, $c1-math-0051$ , where n is a positive integer. Especially, in the case of AES, operations in $c1-math-0053$ are of special interest. The number of elements of $c1-math-0054$ is $c1-math-0055$ . There are several different representations for the elements, which affect the cost and speed performance of software and hardware implementations.

1.2.3.1 Polynomial Basis Representation

As the number of elements of $c1-math-0056$ is a power of 2, each bit of the binary representation can be used for each coefficient of a polynomial whose degree is $c1-math-0057$ . Any element in $c1-math-0058$ can be expressed with the so-called polynomial basis as

1.9

where $c1-math-0060$ . For instance, 16 elements in $c1-math-0061$ can be expressed with the binary representation, $c1-math-0062$ . By assigning each bit to the coefficient of a polynomial of x, we have $c1-math-0064$ . Addition of two field elements, for example, $c1-math-0065$ , can be calculated as

1.10

as $c1-math-0067$ over $c1-math-0068$ .

Multiplication of the two field elements, for example, $c1-math-0069$ , needs modular reduction with an irreducible polynomial, for example, $c1-math-0070$ , which specifies the field.² Therefore, the multiplication result becomes as

1.11

1.2.3.2 Normal Basis Representation

Alternatively, elements in $c1-math-0142$ are described using normal basis as

1.12

where $c1-math-0144$ and $c1-math-0145$ are roots of an irreducible polynomial, $c1-math-0146$ , that is,

1.13

Furthermore,

1.14

This can be confirmed by Fermat little theorem.

For the case of $c1-math-0149$ , suppose that $c1-math-0150$ , that is, $c1-math-0151$ . Addition in the normal basis representation of $c1-math-0152$ can be calculated simply by XORing each coefficient of two elements in the form of Equation ((1.12) ). That is,

1.15

where the normal basis representations of $c1-math-0154$ and $c1-math-0155$ can be found in Table 1.4.

**Table 1.4** Representations of elements for irreducible polynomial $c1-math-0073$ in $c1-math-0074$

This is correct as $c1-math-0156$ . By using the fact of $c1-math-0157$ , multiplication in $c1-math-0158$ , for example, $c1-math-0159$ is calculated as

1.16

The most advantageous point to use the normal basis representation lies in the fact that squaring is easy to compute in $c1-math-0161$ . As can be found in Table 1.4, squaring for $c1-math-0162$ is $c1-math-0163$ . More precisely, in squaring, the elements in the normal basis representation are derived as

1.17

1.18

1.19

This merit is often used in both software and hardware implementations. However, in general, implementing modular multiplication in the normal basis requires more computation than that in the polynomial basis. Hereafter, we mainly use polynomial basis representation.

1.3 Linear and Nonlinear Functions in Boolean Algebra

1.3.1 Linear Functions

Addition and multiplication by a constant are linear functions in $c1-math-0167$ . Suppose that $c1-math-0168$ and $c1-math-0169$ , where $c1-math-0170$ . Addition of $c1-math-0171$ and $c1-math-0172$ is

1.20

From the fact that $c1-math-0174$ , it is confirmed that addition in $c1-math-0175$ is a linear function.

For multiplication by a constant B, there exist $c1-math-0177$ such that

1.21

Therefore, we know that such multiplication in $c1-math-0179$ is also a linear function. It can be easily understood considering the fact that multiplication by a constant can be computed with multiple additions of $c1-math-0180$ in $c1-math-0181$ .

1.3.2 Nonlinear Functions

On the contrary, (normal) modular multiplication and multiplicative inverse in $c1-math-0193$ are nonlinear functions. The AES block cipher uses a nonlinear function in a part of the design that is based on modular multiplicative inversion in $c1-math-0194$ . The multiplicative inverse computation can be done with Fermat's (little) theorem as

1.22

for $c1-math-0196$ . In AES, multiplicative inverse of 0 is mapped to 0.

One of the most optimal ways to compute the inversion is to find addition chain. On the basis of the Itoh–Tsujii algorithm, the computation can be performed with four multiplications and seven modular squarings as

1.23a, 1.23b, 1.23c, 1.23d, 1.23d, 1.23e, 1.23f equation

Itoh–Tsujii algorithm utilizes the relationship of

1.24

c1f002 — **Figure 1.2** Block cipher design strategy. Nonlinear operations and linear operations are alternately applied

1.4 Linear and Nonlinear Functions in Block Cipher

As discussed in Section 1.3, logical operations are classified into linear operations and nonlinear operations. Composition of linear operations is also linear. Hence, if all the cipher's operations are linear, the resulting cipher is also linear, which is insecure. In order to break the linearity of the cipher, nonlinear operations need to be introduced. However, in general, the cost of implementing nonlinear operations is more expensive than the one for linear operations.

The strategy of the block cipher design is alternately applying nonlinear and linear operations several times. To avoid the heavy cost, nonlinear operation is designed to be weak but its cost is small. In many cases, a nonlinear operation is designed to be operated on a smaller size than the block size, and the operation is applied in parallel to all the data. Then, in order to compensate the weak nonlinear computations, a linear operation mixes the entire block. The strategy is depicted in Figure 1.2. In the following, each of the nonlinear layer and linear layer is further detailed.

1.4.1 Nonlinear Layer

In order to reduce the implementation cost, a nonlinear operation is designed to work on a fraction of the data. Typical choices of the size are 64 bits, 32 bits, 8 bits (called byte), 4 bits (called nibble), and 1 bit. The size of the nonlinear operation is determined depending on the following two aspects.

type of nonlinear operation
target platform in which the cipher is implemented.

1.4.1.1 Modular Operation

When the cipher is designed for being used in high-end CPUs, the implementation cost is not a big issue but the operation should be optimized for instructions adopted in such a CPU. Currently, many CPUs operate on 64 or 32 bits, thus the size of the nonlinear operation is also adjusted to 64 or 32 bits. The high-end CPUs can perform the modular addition or subtraction efficiently. The nonlinearity is often introduced by addition or subtraction on modulo $c1-math-0199$ or $c1-math-0200$ .

1.4.1.2 Substitution Table (S-box)

When the cipher is designed for more resource-constrained hardwares such as micro-controllers, the balance of the implementation cost and the computation efficiency is important. When the CPU register size is smaller than 32 bits, the 32- or 64-bit modular addition cannot be performed efficiently. The hardware implementation also faces some problems for those operations. Typical choices of the size of the nonlinear operation are 8 or 4 bits. Because the size is small, using the substitution table is a popular approach to introduce the nonlinearity. The substitution table, or S-box, is a pre-specified mapping from the input values to the output values. An example of 4-bit to 4-bit S-box is given in Table 1.5.

Table 1.5 An example of 4-bit to 4-bit S-box, $c1-math-0201$

Input x 0 1 2 3 4 5 6 7 8 9 a b c d e f

Output $c1-math-0203$ c 0 f a 2 b 9 5 8 3 d 7 1 e 6 4

All values are described in the hexadecimal format.

In this S-box, the input value 5 is transformed to b according to the table. A 4-bit to 4-bit S-box is implemented only with $c1-math-0210$ bits of memory, which is very small. An 8-bit to 8-bit S-box is implemented only with $c1-math-0211$ bits of memory, which is bigger than the 4-bit to 4-bit S-box but can mix the data faster than the 4-bit to 4-bit S-box.

1.4.1.3 Boolean Function

A Boolean function is the smallest tool to introduce the nonlinearity. By using an AND or OR operation, the nonlinearity is introduced in 1 bit. When the cipher is designed to be a very resource constraint environment such as RFID, a Boolean function is a typical choice as a source of the nonlinearity. A Boolean function can also fit the high-end CPUs. Thirty two-bit CPUs can operate bit-wise for each of the 32 bits in parallel. If this is combined with modular additions (not bit-wise), the nonlinearity can be introduced quickly.

It is also a popular approach to specify the input and output correspondence of some Boolean functions as an S-box. If the cipher is implemented with some memory, the S-box can be implemented, and the nonlinearity of several bits can be introduced with 1 table look-up. If the cipher is implemented with small hardware, the logic of the Boolean function is implemented to minimize the implementation cost.

1.4.1.4 Balanced Choice

Unfortunately, there is no obvious choice that shows the overwhelming performance in any implementation environment. When the cipher is designed in multi-platforms, that is, both the high- and low-end environment, an S-box maybe chosen as the source of nonlinearity that shows a relatively good performance in both the environments. The popular choices of the nonlinear operations are summarized in Figure 1.3.

c1f003 — **Figure 1.3** Substitution-permutation network. Popular choices of size and type of nonlinear operations

Note that the data is mixed by alternately applying a nonlinear operation and a linear operation. The choice of the nonlinear operation also depends on the choice of the linear operation.

1.4.2 Linear Layer

The purpose of the linear layer is mixing all the output data from the nonlinear layer in which the data is updated in a small part independently. The linear layer is required to be performed efficiently and implemented lightly.

One of the simplest linear operations is XOR. A part of the nonlinear layer output is XORed to another part to mix the data from different parts. The XOR operation can be performed several times between different parts to mix the data more.

The bit-rotation and bit-shift are also simple linear operations. For example, by applying the 1-bit rotation to the entire data, 1-bit from each part will be moved to the next part. The XOR, bit-shift, and bit-rotation can be implemented efficiently in various platforms, thus they are suitable for the block cipher design.

Another important example is a multiplication over a finite field or modular multiplication. Suppose that the size of the nonlinear operation is n bits and each bit of n-bit value represents each coefficient of a polynomial whose degree is $c1-math-0214$ . As explained in Section 1.3, multiplication over a finite field with some irreducible polynomial $c1-math-0215$ is a linear function. Suppose that the entire data consists of m parts of n-bit data, that is, its size is $c1-math-0218$ bits. The purpose of the linear function is mixing m independent outputs from the nonlinear layer. In order to mix all the m outputs, $c1-math-0221$ matrices are often used.

For instance, when $c1-math-0222$ , four n-bit values $c1-math-0224$ , $c1-math-0225$ , $c1-math-0226$ , $c1-math-0227$ are updated to four n-bit values $c1-math-0229$ , $c1-math-0230$ , $c1-math-0231$ , $c1-math-0232$ by the following matrix operation:

1.25

where each $c1-math-0234$ is a constant number.

Any combination of linear operations is a linear operation. A popular design approach is combining different types of light linear operations to introduce a strong mixing effect. An example of the linear layer is depicted in Figure 1.4.

c1f004 — **Figure 1.4** An example of linear layer consisting of three linear operations. Nonlinear layer is supposed to update data in eight parts independently

1.4.2.1 Maximum Distance Separable Matrix (MDS Matrix)

A maximum distance separable matrix (in short MDS matrix) is a matrix with some special property useful for block cipher's design. Considering the usage in block cipher AES, only the case with the same input and output size is discussed here. Let x be the m-component input to the matrix, M, and y be the m-component output from the matrix, that is, $c1-math-0240$ . The matrix M is called MDS if no distinct input-output pairs $c1-math-0242$ collide in m or more components.

For the application to cryptology, the fact that at least $c1-math-0244$ components differ in distinct pairs of $c1-math-0245$ is important. In other words, the MDS matrix guarantees a certain amount of change in different input and output values. For instance, suppose that the value of x is slightly modified to $c1-math-0247$ , which differs only 1 bit from x, and the corresponding output value $c1-math-0249$ is computed. The multiplication by the MDS matrix can guarantee that all the m components of the outputs y and $c1-math-0252$ have different values, meaning that the 1-bit change of the input always changes all the m components of the output.

1.4.3 Substitution-Permutation Network (SPN)

Substitution-permutation network, which is often called SPN, is a design approach to mix a fixed-length input data. SPN is a special form of the iterative application of nonlinear and linear computations.

The substitution layer (or S-layer), which applies a nonlinear operation, is supposed to be an S-box application in a small size. The permutation layer (or P-layer) applies a linear operation to mix the results of the S-layer efficiently.

The SPN structure is adopted in many block ciphers. AES, which is a main target of this book, also adopts the SPN structure.

1.5 Advanced Encryption Standard (AES)

AES is the most widely used block cipher in present time in both governmental and commercial purposes. AES is standardized internationally, and a lot of academic researches and industrial developments have been proposed about AES. This section explains the specification of AES.

The block cipher AES supports three different key sizes: 128 bits, 192 bits, and 256 bits. The corresponding AES algorithms are called AES-128, AES-192, and AES-256, respectively. AES supports a fixed block size: 128 bits. That is to say, when the key is determined, AES provides a bijective map from 128-bit plaintext to 128-bit ciphertext, that is, for a key K, AES-128 $c1-math-0255$ , AES-192 $c1-math-0256$ , AES-256 $c1-math-0257$ : $c1-math-0258$ (Figure 1.5).

c1f005 — **Figure 1.5** Three algorithms of AES

1.5.1 Specification of AES-128 Encryption

In high level, the 128-bit key K is expanded to eleven 128-bit subkeys $c1-math-0260$ according to the key schedule function, or KSF.

The 128-bit key K is set to the first 128-bit subkey $c1-math-0262$ .
The KSF is computed to update 128-bit subkey $c1-math-0263$ to another 128-bit subkey $c1-math-0264$ .
Similarly, the KSF is iterated nine times. In each time, 128-bit subkey $c1-math-0265$ is updated to another 128-bit subkey $c1-math-0266$ for $c1-math-0267$ .

Then, a plaintext is encrypted to a ciphertext as follows:

An XOR of the plaintext and the first subkey $c1-math-0268$ is computed, and this value is set to a 128-bit internal state value $c1-math-0269$ . This operation is often called whitening.
The 128-bit internal state value $c1-math-0270$ is updated to $c1-math-0271$ by computing a round function, which also takes as input subkey $c1-math-0272$ . This operation is called round 1 or the first round.
The round function is iterated nine times to update the internal state value $c1-math-0273$ to $c1-math-0274$ . In round i, where $c1-math-0276$ , the round function takes as input $c1-math-0277$ and outputs $c1-math-0278$ . Note that the round function in the last round is slightly different from the other rounds. The last state that is $c1-math-0279$ is the ciphertext.

The computation structure of AES-128 in a function level is described in Figure 1.6.

c1f006 — **Figure 1.6** High-level computation structure of the encryption of AES-128. $c1-math-0280$ and $c1-math-0281$ denote the round function and KSF, respectively. $c1-math-0282$ is the last round function, which is different from the other rounds

In practice, it is not necessary to compute all the 11 subkeys at the very beginning. For example, the last subkey will not be used until the very end of the encryption process. Thus, generating the last subkey and keeping it in a register is a waste of computation resource. In order to minimize the computation resource, the KSF and the round function updates are computed in parallel round by round. The AES-128 encryption algorithm in the function level can be described as Algorithm 1.1.

Algorithm 1.1 AES-128 Encryption Algorithm in the Function Level

Input: Plaintext , 128-bit key , round function , the last round function , key
        schedule function KSF
Output: Ciphertext 
 1: ;
 2: ;
 3: for  do
 4:     ;
 5:     ;
 6: end for
 7: ;
 8: return ;

1.5.1.1 Preliminaries to Describe Computation Details

In AES, byte represents 8-bit values. AES is a byte-oriented cipher. All operations are defined at byte level. Let v be a byte value and $c1-math-0296$ be its bit-wise representation, of which the corresponding vector representation is $c1-math-0297$ . In AES, each bit of a byte represents coefficients of polynomial of $c1-math-0298$ :

1.26

A byte value can be represented in hexadecimal. For example, the byte $c1-math-0300$ represents the polynomial $c1-math-0301$ .

Addition

Addition of two bytes, $c1-math-0302$ and $c1-math-0303$ , returns

1.27

Multiplication

Multiplication in $c1-math-0305$ corresponds with multiplication of polynomials modulo, an irreducible binary polynomial of degree 8. The irreducible polynomial of AES is defined as

1.28

Because the multiplication by $c1-math-0307$ and $c1-math-0308$ is later introduced inside the round function, more details of the operation $c1-math-0309$ are explained here. $c1-math-0310$ is written as

1.29

1.30

When $c1-math-0313$ , the result is $c1-math-0314$ according to the definition of byte. When $c1-math-0315$ , the irreducible polynomial $c1-math-0316$ is subtracted from the result. Subtraction is the inverseof the addition. Because the addition is the XOR, the subtraction is also a simple application of the XOR operations. Hence, the result is

1.31

1.32

According to the definition of byte, the result is $c1-math-0319$ .

1.5.1.2 S-box

AES uses a substitution-box (S-box) to mix the data. The S-box is used in both of the round function and the KSF, and thus is defined here. The S-box used in AES is a pre-determined bijective mapping from an 8-bit value to an 8-bit value. The definition of the AES S-box is shown in Table 1.6. Hereafter, the S-box transformation is described as $c1-math-0320$ . For example, $c1-math-0321$ returns 2f, and $c1-math-0322$ returns 03.

Table 1.6 AES S-box

Lower four digits

0 1 2 3 4 5 6 7 8 9 a b c d e f

0 63 7c 77 7b f2 6b 6f c5 30 01 67 2b fe d7 ab 76

1 ca 82 c9 7d fa 59 47 f0 ad d4 a2 af 9c a4 72 c0

2 b7 fd 93 26 36 3f f7 cc 34 a5 e5 f1 71 d8 31 15

3 04 c7 23 c3 18 96 05 9a 07 12 80 e2 eb 27 b2 75

4 09 83 2c 1a 1b 6e 5a a0 52 3b d6 b3 29 e3 2f 84

5 53 d1 00 ed 20 fc b1 5b 6a cb be 39 4a 4c 58 cf

6 d0 ef aa fb 43 4d 33 85 45 f9 02 7f 50 3c 9f a8

Upper 7 51 a3 40 8f 92 9d 38 f5 bc b6 da 21 10 ff f3 d2

four digits 8 cd 0c 13 ec 5f 97 44 17 c4 a7 7e 3d 64 5d 19 73

9 60 81 4f dc 22 2a 90 88 46 ee b8 14 de 5e 0b db

a e0 32 3a 0a 49 06 24 5c c2 d3 ac 62 91 95 e4 79

b e7 c8 37 6d 8d d5 4e a9 6c 56 f4 ea 65 7a ae 08

c ba 78 25 2e 1c a6 b4 c6 e8 dd 74 1f 4b bd 8b 8a

d 70 3e b5 66 48 03 f6 0e 61 35 57 b9 86 c1 1d 9e

e e1 f8 98 11 69 d9 8e 94 9b 1e 87 e9 ce 55 28 df

f 8c a1 89 0d bf e6 42 68 41 99 2d 0f b0 54 bb 16

¹ All the numbers in this table are in hexadecimal.

Note that the S-box and the inverse S-box transformations are not identical. As explained later, AES decryption algorithm requires the look-up table for the inverse of $c1-math-0323$ , that is $c1-math-0324$ .

1.5.1.3 State

The block size of AES is 128 bits. In AES, 128-bit data is called state. The 128-bit state consists of 16 bytes, and is represented as a $c1-math-0325$ two-dimensional array as depicted in Figure 1.7.

c1f007 — **Figure 1.7** AES state. Each cell denotes a byte

1.5.1.4 Key Schedule Function (KSF)

The 128-bit key K is loaded into a 128-bit subkey $c1-math-0327$ . Then, $c1-math-0328$ is computed for $c1-math-0329$ . The input $c1-math-0330$ is represented as a state. The state is further divided into four columns: $c1-math-0331$ , $c1-math-0332$ , $c1-math-0333$ , and $c1-math-0334$ . The output $c1-math-0335$ is computed column by column. At first, a temporary 4-byte value tmp is generated by using the value of $c1-math-0336$ .

$c1-math-0337$ .
Apply the S-box defined in Table 1.6 to each of the 4 bytes in tmp.
Rotate tmp by 1 byte. Precisely, let $c1-math-0338$ be the 4 bytes of tmp. Then, tmp is updated to $c1-math-0339$ .
XOR the pre-specified 1-byte constant $c1-math-0340$ to the first byte of tmp.

Then, by using the 4-byte value tmp, the next subkey $c1-math-0341$ is generated as follows.

$c1-math-0342$ .
$c1-math-0343$ .
$c1-math-0344$ .
$c1-math-0345$ .

The key schedule procedure for AES-128 is depicted in Figure 1.8.

c1f008 — **Figure 1.8** Key schedule function of AES-128. The key schedule function is iterated for $c1-math-0346$

**Figure 1.8** Key schedule function of AES-128. The key schedule function is iterated for $c1-math-0346$

1.5.1.5 Round Function (RF)

The round function takes as input the previous 128-bit state $c1-math-0347$ and subkey $c1-math-0348$ , and generates the next 128-bit state $c1-math-0349$ . The round function consists of four transformations called SubBytes, ShiftRows, MixColumns, and AddRoundKey. It updates the state by following Algorithm 1.2.

SubBytes $c1-math-0360$

SubBytes is a byte-wise operation. It updates the state by applying the S-box defined in Table 1.6 to each of the 16 bytes of the state. It is worth noting that the $c1-math-0361$ operation is the only nonlinear one in the AES round function.

ShiftRows $c1-math-0362$

ShiftRows is a row-wise byte positions swap. The state consists of four rows: row 0, row 1, row 2, and row 3. Each row of the state consists of 4 bytes. The ShiftRows operation applies a left cyclic shift by i bytes to the 4 bytes of row i. The ShiftRows operation is depicted in Figure 1.9.

c1f009 — **Figure 1.9** ShiftRows operation

MixColumns $c1-math-0365$

MixColumns is a column-wise data mixing operation. It takes as input 4 bytes in a column and computes another 4-byte value. The same computation is applied to all of the four columns. Let $c1-math-0366$ and $c1-math-0367$ be the 4-byte input and 4-byte output, respectively. The $c1-math-0368$ is computed by the following matrix operation:

1.33

Each element in the matrix is written in hexadecimal.

The $c1-math-0370$ operation was designed to satisfy the MDS property explained in Section 1.4.2.1. The impact of modifying 1 input byte always expands to all the 4 output bytes. More generally, the sum of the number of modified input bytes and the number of modified output bytes is always greater than or equal to 5.

AddRoundKey $c1-math-0371$

AddRoundKey updates the state by XORing the subkey $c1-math-0372$ to the state.

Last Round Function $c1-math-0373$

In the last round (Round 10 for AES-128), the round function is different from the middle rounds. The $c1-math-0374$ operation is not computed that is, only the $c1-math-0375$ , and $c1-math-0376$ operations are performed.

1.5.2 AES-128 Decryption

To decrypt ciphertext C to P, the round function is applied in reverse order. The KSF is exactly the same. Eleven subkeys $c1-math-0385$ are generated from K. Different from the encryption algorithm, $c1-math-0387$ is firstly used, and then the decryption is processed with $c1-math-0388$ , and finally with $c1-math-0389$ .

Inside the round function, four operations are computed in reverse order. The inverse of the $c1-math-0390$ operation is exactly the same as the original $c1-math-0391$ operation because the XOR operation is involution.

For the inverse of the $c1-math-0392$ operation, the inversion matrix is required. Let $c1-math-0393$ and $c1-math-0394$ be the 4-byte input and 4-byte output to the inverse $c1-math-0395$ operation, respectively. The $c1-math-0396$ is computed by the following matrix operation:

1.34

Each element in the matrix is again written in hexadecimal.

The inverse of the $c1-math-0398$ operation is relatively simple. It applies a right cyclic shift by i bytes to the 4 bytes of row i.

The inverse of the $c1-math-0401$ operation requires another table to substitute each byte value. The inverse S-box, denoted by $c1-math-0402$ , is defined in Table 1.7.

Table 1.7 AES inverse S-box

Lower four digits

0 1 2 3 4 5 6 7 8 9 a b c d e f

0 52 09 6a d5 30 36 a5 38 bf 40 a3 9e 81 f3 d7 fb

1 7c e3 39 82 9b 2f ff 87 34 8e 43 44 c4 de e9 cb

2 54 7b 94 32 a6 c2 23 3d ee 4c 95 0b 42 fa c3 4e

3 08 2e a1 66 28 d9 24 b2 76 5b a2 49 6d 8b d1 25

4 72 f8 f6 64 86 68 98 16 d4 a4 5c cc 5d 65 b6 92

5 6c 70 48 50 fd ed b9 da 5e 15 46 57 a7 8d 9d 84

6 90 d8 ab 00 8c bc d3 0a f7 e4 58 05 b8 b3 45 06

Upper 7 d0 2c 1e 8f ca 3f 0f 02 c1 af bd 03 01 13 8a 6b

four digits 8 3a 91 11 41 4f 67 dc ea 97 f2 cf ce f0 b4 e6 73

9 96 ac 74 22 e7 ad 35 85 e2 f9 37 e8 1c 75 df 6e

a 47 f1 1a 71 1d 29 c5 89 6f b7 62 0e aa 18 be 1b

b fc 56 3e 4b c6 d2 79 20 9a db c0 fe 78 cd 5a f4

c 1f dd a8 33 88 07 c7 31 b1 12 10 59 27 80 ec 5f

d 60 51 7f a9 19 b5 4a 0d 2d e5 7a 9f 93 c9 9c ef

e a0 e0 3b 4d ae 2a f5 b0 c8 eb bb 3c 83 53 99 61

f 17 2b 04 7e ba 77 d6 26 e1 69 14 63 55 21 0c 7d

All the numbers in this table are in hexadecimal.

1.5.3 Specification of AES-192 and AES-256

AES supports not only the 128-bit key but also the 192-bit and the 256-bit keys. For all the key sizes, round function is identical. The differences are the number of rounds computed and the KSF.

AES-128 generates eleven 128-bit subkeys $c1-math-0403$ from 128-bit K, and the number of rounds is 10.
AES-192 generates thirteen 128-bit subkeys $c1-math-0405$ from 192-bit K, and the number of rounds is 12.
AES-256 generates fifteen 128-bit subkeys $c1-math-0407$ from 256-bit K, and the number of rounds is 14.

1.5.3.1 The Key Schedule Function for AES-192

The 192-bit key K is loaded into a $c1-math-0410$ array of bytes, which is denoted by $c1-math-0411$ . Then, $c1-math-0412$ is computed for $c1-math-0413$ . The state is further divided into six columns: $c1-math-0414$ , $c1-math-0415$ , $c1-math-0416$ , $c1-math-0417$ , $c1-math-0418$ , and $c1-math-0419$ . The output $c1-math-0420$ is computed column by column. At first, a temporary 4-byte value tmp is generated by using the value of $c1-math-0421$ .

$c1-math-0422$ .
Apply the S-box defined in Table 1.6 to each of the 4 bytes in tmp.
Rotate tmp by 1 byte. Precisely, let $c1-math-0423$ be the 4 bytes of tmp. Then, tmp is updated to $c1-math-0424$ .
XOR the pre-specified 1-byte constant $c1-math-0425$ to the first byte of tmp.

Then, by using the 4-byte value tmp, the next subkey $c1-math-0426$ is generated as follows.

$c1-math-0427$ .
$c1-math-0428$ .
$c1-math-0429$ .
$c1-math-0430$ .
$c1-math-0431$ .
$c1-math-0432$ .

Among the 192-bit of the $c1-math-0433$ , the first four columns (128 bits) are set to $c1-math-0434$ , and the remaining two columns (64 bits) are set to the left half of $c1-math-0435$ . Among the 192-bit of the $c1-math-0436$ , the first two columns (64 bits) are set to the right half of $c1-math-0437$ , and the remaining four columns (128 bits) are set to $c1-math-0438$ . Similarly, $c1-math-0439$ are obtained.

Note that $c1-math-0440$ is the last four columns of $c1-math-0441$ , and then $c1-math-0442$ is the first four columns of $c1-math-0443$ . The last two columns of $c1-math-0444$ are never used. Thus, in order to omit the redundant computations, the KSF should be processed up to the first four columns of $c1-math-0445$ .

The key schedule procedure for AES-192 is depicted in Figure 1.10.

c1f010 — **Figure 1.10** Key schedule function of AES-192. The key schedule function is iterated until 13 subkeys are generated

1.5.3.2 The Key Schedule Function for AES-256

The KSF for AES-256 can be similarly defined. The size of the key state is 256 bits consisting of $c1-math-0446$ -byte array. Each key state produces two subkeys, and 15 subkeys $c1-math-0447$ are generated.

The update computation is very similar to the ones for AES-128 and AES-192. In order to mix the data quickly, another S-box layer is introduced between columns 3 and 4. The detailed procedure is omitted. The key schedule procedure for AES-256 is depicted in Figure 1.11.

c1f011 — **Figure 1.11** Key schedule function of AES-256. The key schedule function is iterated until 15 subkeys are generated

Note that $c1-math-0448$ is the first four columns of $c1-math-0449$ . The last four columns of $c1-math-0450$ are never used. Thus, in order to omit the redundant computations, the KSF should be processed up to the first four columns of $c1-math-0451$ .

c1f012 — **Figure 1.12** Notations for each state of AES-128

1.5.4 Notations to Describe AES-128

The computation of AES-128 with all the operations is described in Figure 1.12. The state after the first XOR of the plaintext and $c1-math-0452$ is denoted by $c1-math-0453$ . Similarly in round i, where $c1-math-0455$ ,

the state at the beginning of the round is denoted by $c1-math-0456$ ;
the state after the $c1-math-0457$ operation is denoted by $c1-math-0458$ ;
the state after the $c1-math-0459$ operation is denoted by $c1-math-0460$ ;
the state after the $c1-math-0461$ operation is denoted by $c1-math-0462$ ;
the state after the $c1-math-0463$ operation is denoted by $c1-math-0464$ , which is equivalent to $c1-math-0465$ .

As explained before, the state is represented by a $c1-math-0466$ -byte array. Using two subindices often causes misunderstanding, and thus each byte position is also denoted by a single sequence $c1-math-0467$ . For state S, the byte $c1-math-0469$ , where $c1-math-0470$ is converted to the byte $c1-math-0471$ . The byte positions in the single sequence are shown in Figure 1.13.

c1f013 — **Figure 1.13** Notations for inside AES state

Byte values of state S in several different byte positions $c1-math-0473$ are often denoted by $c1-math-0474$ . For example, the 4-byte value in the column 0 of state S is denoted by $c1-math-0476$ .

The first column, or column 0, of state S is denoted by $c1-math-0478$ , which is equivalent to $c1-math-0479$ .
The second column, or column 1, of state S is denoted by $c1-math-0481$ , which is equivalent to $c1-math-0482$ .
The third column, or column 2, of state S is denoted by $c1-math-0484$ , which is equivalent to $c1-math-0485$ .
The fourth column, or column 3, of state S is denoted by $c1-math-0487$ , which is equivalent to $c1-math-0488$ .
The first row, or row 0, of state S is denoted by $c1-math-0490$ , which is equivalent to $c1-math-0491$ .
The second row, or row 1, of state S is denoted by $c1-math-0493$ , which is equivalent to $c1-math-0494$ .
The third row, or row 2, of state S is denoted by $c1-math-0496$ , which is equivalent to $c1-math-0497$ .
The fourth row, or row 3, of state S is denoted by $c1-math-0499$ , which is equivalent to $c1-math-0500$ .

State $c1-math-0501$ becomes state $c1-math-0502$ after the $c1-math-0503$ operation. During this process, 4 bytes in $c1-math-0504$ are moved to different byte positions in $c1-math-0505$ . The moved positions are denoted by $c1-math-0506$ .

For state S, 4 bytes of $c1-math-0508$ are equivalent to $c1-math-0509$ .
For state S, 4 bytes of $c1-math-0511$ are equivalent to $c1-math-0512$ .
For state S, 4 bytes of $c1-math-0514$ are equivalent to $c1-math-0515$ .
For state S, 4 bytes of $c1-math-0517$ are equivalent to $c1-math-0518$ .

Those 4-byte positions are called diagonal.

Similarly 4 bytes in $c1-math-0519$ are moved to different byte positions in $c1-math-0520$ through the inverse of the $c1-math-0521$ operation. The moved positions are denoted by $c1-math-0522$ .

For state S, 4 bytes of $c1-math-0524$ are equivalent to $c1-math-0525$ .
For state S, 4 bytes of $c1-math-0527$ are equivalent to $c1-math-0528$ .
For state S, 4 bytes of $c1-math-0530$ are equivalent to $c1-math-0531$ .
For state S, 4 bytes of $c1-math-0533$ are equivalent to $c1-math-0534$ .

Those 4-byte positions are called inverse diagonal.

Binary $c1-math-0075$	Bit concatenation	Hex.	Polynomial basis	Power of $c1-math-0076$	Normal basis $c1-math-0077$
$c1-math-0078$	$c1-math-0079$	`0`	$c1-math-0080$	$c1-math-0081$	(0, 0, 0, 0)
$c1-math-0082$	$c1-math-0083$	`1`	$c1-math-0084$	$c1-math-0085$	(1, 1, 1, 1)
$c1-math-0086$	$c1-math-0087$	`2`	x	$c1-math-0089$	(0, 0, 0, 1)
$c1-math-0090$	$c1-math-0091$	`4`	$c1-math-0092$	$c1-math-0093$	(0, 0, 1, 0)
$c1-math-0094$	$c1-math-0095$	`8`	$c1-math-0096$	$c1-math-0097$	(1, 0, 1, 1)
$c1-math-0098$	$c1-math-0099$	`9`	$c1-math-0100$	$c1-math-0101$	(0, 1, 0, 0)
$c1-math-0102$	$c1-math-0103$	`b`	$c1-math-0104$	$c1-math-0105$	(0, 1, 0, 1)
$c1-math-0106$	$c1-math-0107$	`f`	$c1-math-0108$	$c1-math-0109$	(0, 1, 1, 1)
$c1-math-0110$	$c1-math-0111$	`7`	$c1-math-0112$	$c1-math-0113$	(1, 1, 0, 0)
$c1-math-0114$	$c1-math-0115$	`e`	$c1-math-0116$	$c1-math-0117$	(1, 0, 0, 0)
$c1-math-0118$	$c1-math-0119$	`5`	$c1-math-0120$	$c1-math-0121$	(1, 1, 0, 1)
$c1-math-0122$	$c1-math-0123$	`a`	$c1-math-0124$	$c1-math-0125$	(1, 0, 1, 0)
$c1-math-0126$	$c1-math-0127$	`d`	$c1-math-0128$	$c1-math-0129$	(0, 1, 1, 0)
$c1-math-0130$	$c1-math-0131$	`3`	$c1-math-0132$	$c1-math-0133$	(1, 1, 1, 0)
$c1-math-0134$	$c1-math-0135$	`6`	$c1-math-0136$	$c1-math-0137$	(0, 0, 1, 1)
$c1-math-0138$	$c1-math-0139$	`c`	$c1-math-0140$	$c1-math-0141$	(1, 0, 0, 1)