Filesystem Mounting

Each filesystem has its own root directory . The filesystem whose root directory is the root of the system’s directory tree is called root filesystem . Other filesystems can be mounted on the system’s directory tree; the directories on which they are inserted are called mount points . A mounted filesystem is the child of the mounted filesystem to which the mount point directory belongs. For instance, the /proc virtual filesystem is a child of the root filesystem (and the root filesystem is the parent of /proc).

In most traditional Unix-like kernels, each filesystem can be mounted only once. Suppose that an Ext2 filesystem stored in the /dev/fd0 floppy disk is mounted on /flp by issuing the command:

mount -t ext2 /dev/fd0 /flp

Until the filesystem is unmounted by issuing a umount command, any other mount command acting on /dev/fd0 fails.

However, Linux 2.4 is different: it is possible to mount the same filesystem several times. For instance, issuing the following command right after the previous one will likely succeed in Linux:

mount -t ext2 -o ro /dev/fd0 /flp-ro

As a result, the Ext2 filesystem stored in the floppy disk is mounted both on /flp and on /flp-ro; therefore, its files can be accessed through both /flp and /flp-ro (in this example, accesses through /flp-ro are read-only).

Of course, if a filesystem is mounted n times, its root directory can be accessed through n mount points, one per mount operation. Although the same filesystem can be accessed by several paths, it is really unique. Thus, there is just one superblock object for all of them, no matter of how many times it has been mounted.

Mounted filesystems form a hierarchy: the mount point of a filesystem might be a directory of a second filesystem, which in turn is already mounted over a third filesystem, and so on.[86]

It is also possible to stack multiple mounts on a single mount point. Each new mount on the same mount point hides the previously mounted filesystem, although processes already using the files and directories under the old mount can continue to do so. When the topmost mounting is removed, then the next lower mount is once more made visible.

As you can imagine, keeping track of mounted filesystems can quickly become a nightmare. For each mount operation, the kernel must save in memory the mount point and the mount flags, as well as the relationships between the filesystem to be mounted and the other mounted filesystems. Such information is stored in data structures named mounted filesystem descriptors ; each descriptor is a data structure that has type vfsmount, whose fields are shown in Table 12-11.

Table 12-11. The fields of the vfsmount data structure

Type

Field

Description

struct list_head

mnt_hash

Pointers for the hash table list

struct vfsmount *

mnt_parent

Points to the parent filesystem on which this filesystem is mounted on

struct dentry *

mnt_mountpoint

Points to the dentry of the mount directory of this filesystem

struct dentry *

mnt_root

Points to the dentry of the root directory of this filesystem

struct super_block *

mnt_sb

Points to the superblock object of this filesystem

struct list_head

mnt_mounts

Head of the parent list of descriptors (relative to this filesystem)

struct list_head

mnt_child

Pointers for the parent list of descriptors

(relative to the parent filesystem)

atomic_t

mnt_count

Usage counter

int

mnt_flags

Flags

char *

mnt_devname

Device file name

struct list_head

mnt_list

Pointers for global list of descriptors

The vfsmount data structures are kept in several doubly linked circular lists:

  • A circular doubly linked “global” list including the descriptors of all mounted filesystems. The head of the list is a first dummy element, which is represented by the vfsmntlist variable. The mnt_list field of the descriptor contains the pointers to adjacent elements in the list.

  • An hash table indexed by the address of the vfsmount descriptor of the parent filesystem and the address of the dentry object of the mount point directory. The hash table is stored in the mount_hashtable array, whose size depends on the amount of RAM in the system. Each item of the table is the head of a circular doubly linked list storing all descriptors that have the same hash value. The mnt_hash field of the descriptor contains the pointers to adjacent elements in this list.

  • For each mounted filesystem, a circular doubly linked list including all child mounted filesystems. The head of each list is stored in the mnt_mounts field of the mounted filesystem descriptor; moreover, the mnt_child field of the descriptor stores the pointers to the adjacent elements in the list.

The mount_sem semaphore protects the lists of mounted filesystem objects from concurrent accesses.

The mnt_flags field of the descriptor stores the value of several flags that specify how some kinds of files in the mounted filesystem are handled. The flags are listed in Table 12-12.

Table 12-12. Mounted filesystem flags

Name

Description

MNT_NOSUID

Forbid setuid and setgid flags in the mounted filesystem

MNT_NODEV

Forbid access to device files in the mounted filesystem

MNT_NOEXEC

Disallow program execution in the mounted filesystem

The following functions handle the mounted filesystem descriptors:

alloc_vfsmnt( )

Allocates and initializes a mounted filesystem descriptor

free_vfsmnt(mnt)

Frees a mounted filesystem descriptor pointed by mnt

lookup_mnt( parent,mountpoint)

Looks up a descriptor in the hash table and returns its address

Mounting the Root Filesystem

Mounting the root filesystem is a crucial part of system initialization. It is a fairly complex procedure because the Linux kernel allows the root filesystem to be stored in many different places, such as a hard disk partition, a floppy disk, a remote filesystem shared via NFS, or even a fictitious block device kept in RAM.

To keep the description simple, let’s assume that the root filesystem is stored in a partition of a hard disk (the most common case, after all). While the system boots, the kernel finds the major number of the disk that contains the root filesystem in the ROOT_DEV variable. The root filesystem can be specified as a device file in the /dev directory either when compiling the kernel or by passing a suitable “root” option to the initial bootstrap loader. Similarly, the mount flags of the root filesystem are stored in the root_mountflags variable. The user specifies these flags either by using the rdev external program on a compiled kernel image or by passing a suitable rootflags option to the initial bootstrap loader (see Appendix A).

Mounting the root filesystem is a two-stage procedure, shown in the following list.

  1. The kernel mounts the special rootfs filesystem, which just provides an empty directory that serves as initial mount point.

  2. The kernel mounts the real root filesystem over the empty directory.

Why does the kernel bother to mount the rootfs filesystem before the real one? Well, the rootfs filesystem allows the kernel to easily change the real root filesystem. In fact, in some cases, the kernel mounts and unmounts several root filesystems, one after the other. For instance, the initial bootstrap floppy disk of a distribution might load in RAM a kernel with a minimal set of drivers, which mounts as root a minimal filesystem stored in a RAM disk. Next, the programs in this initial root filesystem probe the hardware of the system (for instance, they determine whether the hard disk is EIDE, SCSI, or whatever), load all needed kernel modules, and remount the root filesystem from a physical block device.

The first stage is performed by the init_mount_tree( ) function, which is executed during system initialization:

struct file_system_type root_fs_type;
root_fs_type.name = "rootfs";
root_fs_type.read_super = rootfs_read_super;
root_fs_type.fs_flags = FS_NOMOUNT;
register_filesystem(&root_fs_type);
root_vfsmnt = do_kern_mount("rootfs", 0, "rootfs", NULL);

The root_fs_type variable stores the descriptor object of the rootfs special filesystem; its fields are initialized, and then it is passed to the register_filesystem( ) function (see the earlier section Section 12.3.2). The do_kern_mount( ) function mounts the special filesystem and returns the address of a new mounted filesystem object; this address is saved by init_mount_tree( ) in the root_vfsmnt variable. From now on, root_vfsmnt represents the root of the tree of the mounted filesystems.

The do_kern_mount( ) function receives the following parameters:

type

The type of filesystem to be mounted

flags

The mount flags (see Table 12-13 in the later section Section 12.4.2)

name

The device file name of the block device storing the filesystem (or the filesystem type name for special filesystems)

data

Pointers to additional data to be passed to the read_super method of the filesystem

The function takes care of the actual mount operation by performing the following operations:

  1. Checks whether the current process has the privileges for the mount operation (the check always succeeds when the function is invoked by init_mount_tree( ) because the system initialization is carried on by a process owned by root).

  2. Invokes get_fs_type( ) to search in the list of filesystem types and locate the name stored in the type parameter; get_fs_type( ) returns the address of the corresponding file_system_type descriptor.

  3. Invokes alloc_vfsmnt( ) to allocate a new mounted filesystem descriptor and stores its address in the mnt local variable.

  4. Initializes the mnt->mnt_devname field with the content of the name parameter.

  5. Allocates a new superblock and initializes it. do_kern_mount( ) checks the flags in the file_system_type descriptor to determine how to do this:

    1. If FS_REQUIRES_DEV is on, invokes get_sb_bdev( ) (see the later section Section 12.4.2)

    2. If FS_SINGLE is on, invokes get_sb_single( ) (see the later section Section 12.4.2)

    3. Otherwise, invokes get_sb_nodev( )

  6. If the FS_NOMOUNT flag in the file_system_type descriptor is on, sets the MS_NOUSER flag in the superblock object.

  7. Initializes the mnt->mnt_sb field with the address of the new superblock object.

  8. Initializes the mnt->mnt_root and mnt->mnt_mountpoint fields with the address of the dentry object corresponding to the root directory of the filesystem.

  9. Initializes the mnt->mnt_parent field with the value in mnt (the newly mounted filesystem has no parent).

  10. Releases the s_umount semaphore of the superblock object (it was acquired when the object was allocated in Step 5).

  11. Returns the address mnt of the mounted filesystem object.

When the do_kern_mount( ) function is invoked by init_mount_tree( ) to mount the rootfs special filesystem, neither the FS_REQUIRES_DEV flag nor the FS_SINGLE flag are set, so the function uses get_sb_nodev( ) to allocate the superblock object. This function executes the following steps:

  1. Invokes get_unnamed_dev( ) to allocate a new fictitious block device identifier (see the earlier section Section 12.3.1).

  2. Invokes the read_super( ) function, passing to it the filesystem type object, the mount flags, and the fictitious block device identifier. In turn, this function performs the following actions:

    1. Allocates a new superblock object and puts its address in the local variable s.

    2. Initializes the s->s_dev field with the block device identifier.

    3. Initializes the s->s_flags field with the mount flags (see Table 12-13).

    4. Acquires the sb_lock spin lock.

    5. Initializes the s->s_type field with the filesystem type descriptor of the filesystem.

    6. Inserts the superblock in the global circular list whose head is super_blocks.

    7. Inserts the superblock in the filesystem type list whose head is s->s_type->fs_supers.

    8. Releases the sb_lock spin lock.

    9. Acquires for writing the s->s_umount read/write semaphore.

    10. Acquires the s->s_lock semaphore.

    11. Invokes the read_super method of the filesystem type.

    12. Sets the MS_ACTIVE flag in s->s_flags.

    13. Releases the s->s_lock semaphore.

    14. Returns the address s of the superblock.

  3. If the filesystem type is implemented by a kernel module, increments its usage counter.

  4. Returns the address of the new superblock.

The second stage of the mount operation for the root filesystem is performed by the mount_root( ) function near the end of the system initialization. For the sake of brevity, we consider the case of a disk-based filesystem whose device files are handled in the traditional way (we briefly discuss in Chapter 13 how the devfs virtual filesystem offers an alternative way to handle device files). In this case, the function performs the following operations:

  1. Allocates a buffer and fills it with a list of filesystem type names. This list is either passed to the kernel in the rootfstype boot parameter or is built by scanning the elements in the simply linked list of filesystem types.

  2. Invokes the bdget( ) and blkdev_get( ) functions to check whether the ROOT_DEV root device exists and is properly working.

  3. Invokes get_super( ) to search for a superblock object associated with the ROOT_DEV device in the super_blocks list. Usually none is found because the root filesystem is still to be mounted. The check is made, however, because it is possible to remount a previously mounted filesystem. Usually the root filesystem is mounted twice during the system boot: the first time as a read-only filesystem so that its integrity can be safely checked; the second time for reading and writing so that normal operations can start. We’ll suppose that no superblock object associated with the ROOT_DEV device is found in the super_blocks list.

  4. Scans the list of filesystem type names built in Step 1. For each name, invokes get_fs_type( ) to get the corresponding file_system_type object, and invokes read_super( ) to attempt to read the corresponding superblock from disk. As described earlier, this function allocates a new superblock object and attempts to fill it by using the method to which the read_super field of the file_system_type object points. Since each filesystem-specific method uses unique magic numbers, all read_super( ) invocations will fail except the one that attempts to fill the superblock by using the method of the filesystem really used on the root device. The read_super( ) method also creates an inode object and a dentry object for the root directory; the dentry object maps to the inode object.

  5. Allocates a new mounted filesystem object and initializes its fields with the ROOT_DEV block device name, the address of the superblock object, and the address of the dentry object of the root directory.

  6. Invokes the graft_tree( ) function, which inserts the new mounted filesystem object in the children list of root_vfsmnt, in the global list of mounted filesystem objects, and in the mount_hashtable hash table.

  7. Sets the root and pwd fields of the fs_struct table of current (the init process) to the dentry object of the root directory.

Mounting a Generic Filesystem

Once the root filesystem is initialized, additional filesystems may be mounted. Each must have its own mount point, which is just an already existing directory in the system’s directory tree.

The mount( ) system call is used to mount a filesystem; its sys_mount( ) service routine acts on the following parameters:

  • The pathname of a device file containing the filesystem, or NULL if it is not required (for instance, when the filesystem to be mounted is network-based)

  • The pathname of the directory on which the filesystem will be mounted (the mount point)

  • The filesystem type, which must be the name of a registered filesystem

  • The mount flags (permitted values are listed in Table 12-13)

  • A pointer to a filesystem-dependent data structure (which may be NULL)

Table 12-13. Mount flags

Macro

Description

MS_RDONLY

Files can only be read

MS_NOSUID

Forbid setuid and setgid flags

MS_NODEV

Forbid access to device files

MS_NOEXEC

Disallow program execution

MS_SYNCHRONOUS

Write operations are immediate

MS_REMOUNT

Remount the filesystem changing the mount flags

MS_MANDLOCK

Mandatory locking allowed

MS_NOATIME

Do not update file access time

MS_NODIRATIME

Do not update directory access time

MS_BIND

Create a “bind mount,” which allows making a file or directory visible at another point of the system directory tree

MS_MOVE

Atomically move a mounted filesystem on another mount point

MS_REC

Should recursively create “bind mounts” for a directory subtree (still unfinished in 2.4.18)

MS_VERBOSE

Generate kernel messages on mount errors

The sys_mount( ) function copies the value of the parameters into temporary kernel buffers, acquires the big kernel lock, and invokes the do_mount( ) function. Once do_mount( ) returns, the service routine releases the big kernel lock and frees the temporary kernel buffers.

The do_mount( ) function takes care of the actual mount operation by performing the following operations:

  1. Checks whether the sixteen highest-order bits of the mount flags are set to the “magic” value 0xce0d; in this case, they are cleared. This is a legacy hack that allows the sys_mount( ) service routine to be used with old C libraries that do not handle the highest-order flags.

  2. If any of the MS_NOSUID, MS_NODEV, or MS_NOEXEC flags passed as a parameter are set, clears them and sets the corresponding flag (MNT_NOSUID, MNT_NODEV, MNT_NOEXEC) in the mounted filesystem object.

  3. Looks up the pathname of the mount point by invoking path_init( ) and path_walk( ) (see the later section Section 12.5).

  4. Examines the mount flags to determine what has to be done. In particular:

    1. If the MS_REMOUNT flag is specified, the purpose is usually to change the mount flags in the s_flags field of the superblock object and the mounted filesystem flags in the mnt_flags field of the mounted filesystem object. The do_remount( ) function performs these changes.

    2. Otherwise, checks the MS_BIND flag. If it is specified, the user is asking to make visible a file or directory on another point of the system directory tree. Usually, this is done when mounting a filesystem stored in a regular file instead of a physical disk partition (loopback ). The do_loopback( ) function accomplishes this task.

    3. Otherwise, checks the MS_MOVE flag. If it is specified, the user is asking to change the mount point of an already mounted filesystem. The do_move_mount( ) function does this atomically.

    4. Otherwise, invokes do_add_mount( ). This is the most common case. It is triggered when the user asks to mount either a special filesystem or a regular filesystem stored in a disk partition. do_add_mount( ) performs the following actions:

      1. Invokes do_kern_mount( ) passing, to it the filesystem type, the mount flags, and the block device name. As already described in Section 12.4.1, do_kern_mount( ) takes care of the actual mount operation.

      2. Acquires the mount_sem semaphore.

      3. Initializes the flags in the mnt_flags field of the new mounted filesystem object allocated by do_kern_mount( ).

      4. Invokes graft_tree( ) to insert the new mounted filesystem object in the global list, in the hash table, and in the children list of the parent-mounted filesystem.

      5. Releases the mount_sem semaphore.

  5. Invokes path_release( ) to terminate the pathname lookup of the mount point (see the later section Section 12.5).

The core of the mount operation is the do_kern_mount( ) function, which we already described in the earlier section Section 12.4.1. Recall that this function checks the filesystem type flags to determine how the mount operation is to be done. For a regular disk-based filesystem, the FS_REQUIRES_DEV flag is set, so do_kern_mount( ) invokes the get_sb_bdev( ) function, which performs the following actions:

  1. Invokes path_init( ) and path_walk( ) to look up the pathname of the block device (see Section 12.5).

  2. Invokes blkdev_get( ) to open the block device storing the regular filesystem.

  3. Searches the list of superblock objects; if a superblock relative to the block device is already present, returns its address. This means that the filesystem is already mounted and will be mounted again.

  4. Otherwise, allocates a new superblock object, initializes its s_dev, s_bdev, s_flags, and s_type fields, and inserts it into the global lists of superblocks and the superblock list of the filesystem type descriptor.

  5. Acquires the s_lock spin lock of the superblock.

  6. Invokes the read_super method of the filesystem type to access the superblock information on disk and fill the other fields of the new superblock object.

  7. Sets the MS_ACTIVE flag of the superblock.

  8. Releases the s_lock spin lock of the superblock.

  9. If the filesystem type is implemented by a kernel module, increments its usage counter.

  10. Invokes path_release( ) to terminate the mount point lookup operation.

  11. Returns the address of the new superblock object.

Unmounting a Filesystem

The umount( ) system call is used to unmount a filesystem. The corresponding sys_umount( ) service routine acts on two parameters: a filename (either a mount point directory or a block device filename) and a set of flags. It performs the following actions:

  1. Invokes path_init( ) and path_walk( ) to look up the mount point pathname (see the next section). Once finished, the functions return the address d of the dentry object corresponding to the pathname.

  2. If the resulting directory is not the mount point of a filesystem, returns the -EINVAL error code. This check is done by verifying that d->mnt->mnt_root contains the address of the dentry object d.

  3. If the filesystem to be unmounted has not been mounted on the system directory tree, returns the -EINVAL error code. (Recall that some special filesystems have no mount point.) This check is done by invoking the check_mnt( ) function on d->mnt.

  4. If the user does not have the privileges required to unmount the filesystem, returns the -EPERM error code.

  5. Invokes do_umount( ), which performs the following operations:

    1. Retrieves the address of the superblock object from the mnt_sb field of the mounted filesystem object.

    2. If the user asked to force the unmount operation, interrupts any ongoing mount operation by invoking the umount_begin superblock operation.

    3. If the filesystem to be unmounted is the root filesystem and the user didn’t ask to actually detach it, invokes do_remount_sb( ) to remount the root filesystem read-only and terminates.

    4. Acquires the mount_sem semaphore for writing and the dcache_lock dentry spin lock.

    5. If the mounted filesystem does not include mount points for any child mounted filesystem, or if the user asked to forcibly detach the filesystem, invokes umount_tree( ) to unmount the filesystem (together with all children).

    6. Releases mount_sem and dcache_lock.



[86] Quite surprisingly, the mount point of a filesystem might be a directory of the same filesystem, provided that it was already mounted before. For instance:

mount -t ext2 /dev/fd0 /flp; touch /flp/foo
mkdir /flp/mnt; mount -t ext2 /dev/fd0 /flp/mnt

Now, the empty foo file on the floppy filesystem can be accessed both as flp.foo and flp/mnt/foo.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.103.30