Ext2 Disk Data Structures

The first block in any Ext2 partition is never managed by the Ext2 filesystem, since it is reserved for the partition boot sector (see Appendix A). The rest of the Ext2 partition is split into block groups , each of which has the layout shown in Figure 17-1. As you will notice from the figure, some data structures must fit in exactly one block, while others may require more than one block. All the block groups in the filesystem have the same size and are stored sequentially, thus the kernel can derive the location of a block group in a disk simply from its integer index.

Layouts of an Ext2 partition and of an Ext2 block group

Figure 17-1. Layouts of an Ext2 partition and of an Ext2 block group

Block groups reduce file fragmentation, since the kernel tries to keep the data blocks belonging to a file in the same block group, if possible. Each block in a block group contains one of the following pieces of information:

  • A copy of the filesystem’s superblock

  • A copy of the group of block group descriptors

  • A data block bitmap

  • A group of inodes

  • An inode bitmap

  • A chunk of data that belongs to a file; i.e., a data block

If a block does not contain any meaningful information, it is said to be free.

As can be seen from Figure 17-1, both the superblock and the group descriptors are duplicated in each block group. Only the superblock and the group descriptors included in block group 0 are used by the kernel, while the remaining superblocks and group descriptors are left unchanged; in fact, the kernel doesn’t even look at them. When the e2fsck program executes a consistency check on the filesystem status, it refers to the superblock and the group descriptors stored in block group 0, and then copies them into all other block groups. If data corruption occurs and the main superblock or the main group descriptors in block group 0 becomes invalid, the system administrator can instruct e2fsck to refer to the old copies of the superblock and the group descriptors stored in a block groups other than the first. Usually, the redundant copies store enough information to allow e2fsck to bring the Ext2 partition back to a consistent state.

How many block groups are there? Well, that depends both on the partition size and the block size. The main constraint is that the block bitmap, which is used to identify the blocks that are used and free inside a group, must be stored in a single block. Therefore, in each block group, there can be at most 8×b blocks, where b is the block size in bytes. Thus, the total number of block groups is roughly s/(8×b), where s is the partition size in blocks.

For example, let’s consider an 8 GB Ext2 partition with a 4-KB block size. In this case, each 4-KB block bitmap describes 32K data blocks — that is, 128 MB. Therefore, at most 64 block groups are needed. Clearly, the smaller the block size, the larger the number of block groups.

Superblock

An Ext2 disk superblock is stored in an ext2_super_block structure, whose fields are listed in Table 17-1. The _ _u8, _ _u16, and _ _u32 data types denote unsigned numbers of length 8, 16, and 32 bits respectively, while the _ _s8, _ _s16, _ _s32 data types denote signed numbers of length 8, 16, and 32 bits.

Table 17-1. The fields of the Ext2 superblock

Type

Field

Description

_ _u32

s_inodes_count

Total number of inodes

_ _u32

s_blocks_count

Filesystem size in blocks

_ _u32

s_r_blocks_count

Number of reserved blocks

_ _u32

s_free_blocks_count

Free blocks counter

_ _u32

s_free_inodes_count

Free inodes counter

_ _u32

s_first_data_block

Number of first useful block (always 1)

_ _u32

s_log_block_size

Block size

_ _s32

s_log_frag_size

Fragment size

_ _u32

s_blocks_per_group

Number of blocks per group

_ _u32

s_frags_per_group

Number of fragments per group

_ _u32

s_inodes_per_group

Number of inodes per group

_ _u32

s_mtime

Time of last mount operation

_ _u32

s_wtime

Time of last write operation

_ _u16

s_mnt_count

Mount operations counter

_ _u16

s_max_mnt_count

Number of mount operations before check

_ _u16

s_magic

Magic signature

_ _u16

s_state

Status flag

_ _u16

s_errors

Behavior when detecting errors

_ _u16

s_minor_rev_level

Minor revision level

_ _u32

s_lastcheck

Time of last check

_ _u32

s_checkinterval

Time between checks

_ _u32

s_creator_os

OS where filesystem was created

_ _u32

s_rev_level

Revision level

_ _u16

s_def_resuid

Default UID for reserved blocks

_ _u16

s_def_resgid

Default GID for reserved blocks

_ _u32

s_first_ino

Number of first nonreserved inode

_ _u16

s_inode_size

Size of on-disk inode structure

_ _u16

s_block_group_nr

Block group number of this superblock

_ _u32

s_feature_compat

Compatible features bitmap

_ _u32

s_feature_incompat

Incompatible features bitmap

_ _u32

s_feature_ro_compat

Read-only compatible features bitmap

_ _u8 [16]

s_uuid

128-bit filesystem identifier

char [16]

s_volume_name

Volume name

char [64]

s_last_mounted

Pathname of last mount point

_ _u32

s_algorithm_usage_bitmap

Used for compression

_ _u8

s_prealloc_blocks

Number of blocks to preallocate

_ _u8

s_prealloc_dir_blocks

Number of blocks to preallocate for directories

_ _u16

s_padding1

Alignment to word

_ _u32 [204]

s_reserved

Nulls to pad out 1,024 bytes

The s_inodes_count field stores the number of inodes, while the s_blocks_count field stores the number of blocks in the Ext2 filesystem.

The s_log_block_size field expresses the block size as a power of 2, using 1,024 bytes as the unit. Thus, 0 denotes 1,024-byte blocks, 1 denotes 2,048-byte blocks, and so on. The s_log_frag_size field is currently equal to s_log_block_size, since block fragmentation is not yet implemented.

The s_blocks_per_group, s_frags_per_group, and s_inodes_per_group fields store the number of blocks, fragments, and inodes in each block group, respectively.

Some disk blocks are reserved to the superuser (or to some other user or group of users selected by the s_def_resuid and s_def_resgid fields). These blocks allow the system administrator to continue to use the filesystem even when no more free blocks are available for normal users.

The s_mnt_count, s_max_mnt_count, s_lastcheck, and s_checkinterval fields set up the Ext2 filesystem to be checked automatically at boot time. These fields cause e2fsck to run after a predefined number of mount operations has been performed, or when a predefined amount of time has elapsed since the last consistency check. (Both kinds of checks can be used together.) The consistency check is also enforced at boot time if the filesystem has not been cleanly unmounted (for instance, after a system crash) or when the kernel discovers some errors in it. The s_state field stores the value 0 if the filesystem is mounted or was not cleanly unmounted, 1 if it was cleanly unmounted, and 2 if it contains errors.

Group Descriptor and Bitmap

Each block group has its own group descriptor, an ext2_group_desc structure whose fields are illustrated in Table 17-2.

Table 17-2. The fields of the Ext2 group descriptor

Type

Field

Description

_ _u32

bg_block_bitmap

Block number of block bitmap

_ _u32

bg_inode_bitmap

Block number of inode bitmap

_ _u32

bg_inode_table

Block number of first inode table block

_ _u16

bg_free_blocks_count

Number of free blocks in the group

_ _u16

bg_free_inodes_count

Number of free inodes in the group

_ _u16

bg_used_dirs_count

Number of directories in the group

_ _u16

bg_pad

Alignment to word

_ _u32 [3]

bg_reserved

Nulls to pad out 24 bytes

The bg_free_blocks_count, bg_free_inodes_count, and bg_used_dirs_count fields are used when allocating new inodes and data blocks. These fields determine the most suitable block in which to allocate each data structure. The bitmaps are sequences of bits, where the value 0 specifies that the corresponding inode or data block is free and the value 1 specifies that it is used. Since each bitmap must be stored inside a single block and since the block size can be 1,024, 2,048, or 4,096 bytes, a single bitmap describes the state of 8,192, 16,384, or 32,768 blocks.

Inode Table

The inode table consists of a series of consecutive blocks, each of which contains a predefined number of inodes. The block number of the first block of the inode table is stored in the bg_inode_table field of the group descriptor.

All inodes have the same size: 128 bytes. A 1,024-byte block contains 8 inodes, while a 4,096-byte block contains 32 inodes. To figure out how many blocks are occupied by the inode table, divide the total number of inodes in a group (stored in the s_inodes_per_group field of the superblock) by the number of inodes per block.

Each Ext2 inode is an ext2_inode structure whose fields are illustrated in Table 17-3.

Table 17-3. The fields of an Ext2 disk inode

Type

Field

Description

_ _u16

i_mode

File type and access rights

_ _u16

i_uid

Owner identifier

_ _u32

i_size

File length in bytes

_ _u32

i_atime

Time of last file access

_ _u32

i_ctime

Time that inode last changed

_ _u32

i_mtime

Time that file contents last changed

_ _u32

i_dtime

Time of file deletion

_ _u16

i_gid

Group identifier

_ _u16

i_links_count

Hard links counter

_ _u32

i_blocks

Number of data blocks of the file

_ _u32

i_flags

File flags

union

osd1

Specific operating system information

_ _u32 [EXT2_N_BLOCKS]

i_block

Pointers to data blocks

_ _u32

i_generation

File version (used when the file is accessed by a network filesystem)

_ _u32

i_file_acl

File access control list

_ _u32

i_dir_acl

Directory access control list

_ _u32

i_faddr

Fragment address

union

osd2

Specific operating system information

Many fields related to POSIX specifications are similar to the corresponding fields of the VFS’s inode object and have already been discussed in Section 12.2.2. The remaining ones refer to the Ext2-specific implementation and deal mostly with block allocation.

In particular, the i_size field stores the effective length of the file in bytes, while the i_blocks field stores the number of data blocks (in units of 512 bytes) that have been allocated to the file.

The values of i_size and i_blocks are not necessarily related. Since a file is always stored in an integer number of blocks, a nonempty file receives at least one data block (since fragmentation is not yet implemented) and i_size may be smaller than 512 × i_blocks. On the other hand, as we shall see in Section 17.6.4 later in this chapter, a file may contain holes. In that case, i_size may be greater than 512 × i_blocks.

The i_block field is an array of EXT2_N_BLOCKS (usually 15) pointers to blocks used to identify the data blocks allocated to the file (see Section 17.6.3 later in this chapter).

The 32 bits reserved for the i_size field limit the file size to 4 GB. Actually, the highest-order bit of the i_size field is not used, so the maximum file size is limited to 2 GB. However, the Ext2 filesystem includes a “dirty trick” that allows larger files on 64-bit architectures like Hewlett-Packard’s Alpha. Essentially, the i_dir_acl field of the inode, which is not used for regular files, represents a 32-bit extension of the i_size field. Therefore, the file size is stored in the inode as a 64-bit integer. The 64-bit version of the Ext2 filesystem is somewhat compatible with the 32-bit version because an Ext2 filesystem created on a 64-bit architecture may be mounted on a 32-bit architecture, and vice versa. On a 32-bit architecture, a large file cannot be accessed, unless opening the file with the O_LARGEFILE flag set (see Section 12.6.1).

Recall that the VFS model requires each file to have a different inode number. In Ext2, there is no need to store on disk a mapping between an inode number and the corresponding block number because the latter value can be derived from the block group number and the relative position inside the inode table. For example, suppose that each block group contains 4,096 inodes and that we want to know the address on disk of inode 13,021. In this case, the inode belongs to the third block group and its disk address is stored in the 733rd entry of the corresponding inode table. As you can see, the inode number is just a key used by the Ext2 routines to retrieve the proper inode descriptor on disk quickly.

How Various File Types Use Disk Blocks

The different types of files recognized by Ext2 (regular files, pipes, etc.) use data blocks in different ways. Some files store no data and therefore need no data blocks at all. This section discusses the storage requirements for each type, which are listed in Table 17-4.

Table 17-4. Ext2 file types

File_type

Description

0

Unknown

1

Regular file

2

Directory

3

Character device

4

Block device

5

Named pipe

6

Socket

7

Symbolic link

Regular file

Regular files are the most common case and receive almost all the attention in this chapter. But a regular file needs data blocks only when it starts to have data. When first created, a regular file is empty and needs no data blocks; it can also be emptied by the truncate( ) or open( ) system calls. Both situations are common; for instance, when you issue a shell command that includes the string >filename, the shell creates an empty file or truncates an existing one.

Directory

Ext2 implements directories as a special kind of file whose data blocks store filenames together with the corresponding inode numbers. In particular, such data blocks contain structures of type ext2_dir_entry_2. The fields of that structure are shown in Table 17-5. The structure has a variable length, since the last name field is a variable length array of up to EXT2_NAME_LEN characters (usually 255). Moreover, for reasons of efficiency, the length of a directory entry is always a multiple of 4 and, therefore, null characters () are added for padding at the end of the filename, if necessary. The name_len field stores the actual file name length (see Figure 17-2).

Table 17-5. The fields of an Ext2 directory entry

Type

Field

Description

_ _u32

inode

Inode number

_ _u16

rec_len

Directory entry length

_ _u8

name_len

Filename length

_ _u8

file_type

File type

char [EXT2_NAME_LEN]

name

Filename

The file_type field stores a value that specifies the file type (see Table 17-4). The rec_len field may be interpreted as a pointer to the next valid directory entry: it is the offset to be added to the starting address of the directory entry to get the starting address of the next valid directory entry. To delete a directory entry, it is sufficient to set its inode field to 0 and suitably increment the value of the rec_len field of the previous valid entry. Read the rec_len field of Figure 17-2 carefully; you’ll see that the oldfile entry was deleted because the rec_len field of usr is set to 12+16 (the lengths of the usr and oldfile entries).

An example of the EXT2 directory

Figure 17-2. An example of the EXT2 directory

Symbolic link

As stated before, if the pathname of the symbolic link has up to 60 characters, it is stored in the i_block field of the inode, which consists of an array of 15 4-byte integers; no data block is therefore required. If the pathname is longer than 60 characters, however, a single data block is required.

Device file, pipe, and socket

No data blocks are required for these kinds of files. All the necessary information is stored in the inode.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.35.229