The Fast File System

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

The Fast File System

OpenBSD’s filesystem, FFS, is an improved version of the filesystem shipped with BSD 4.4. FFS is sometimes called UFS (for Unix File System), and many system utilities still use UFS.^[18]

FFS is designed to be fast, reliable, and able to handle the most common situations effectively while still supporting weird configurations. By default, OpenBSD tunes FFS for general use, but you can optimize it to fit your needs—whether you need to hold trillions of tiny files or a half dozen 30GB files. You don’t need to know much about FFS internals, but you should at least understand blocks, fragments, and inodes.

FFS Versions

The original FFS was written in the 1980s and included hard-coded limits that were ample for the day. Filesystems could have up to 231-1 blocks, or just under a terabyte (TB). In 1983, a 1TB filesystem was unthinkable. In 2013, 1TB drives are cheap.

For larger file systems, we have FFS version 2. FFS2 can support filesystems up to 8 zettabytes—unthinkable by 2013 standards. (FFS2 is likely to reach other limits before hitting the filesystem size limit, mind you.) OpenBSD supports both FFS and FFS2.

The i386 and amd64 boot floppies support only FFS, not FFS2. The installation CD, however, supports both. Most machines that need to boot from floppy don’t need FFS2, and probably don’t have a BIOS that can support 2TB drives anyway. The filesystem creation program newfs(1) is smart enough to use FFS2 on filesystems large enough to need it, so for most installations, you don’t need to worry about the difference between FFS and FFS2.

Note

In the exceedingly unlikely event that you actually require FFS2 on a machine that must be installed via floppy, be sure to format the critical system partitions of root (/), /var, and /usr as FFS, not FFS2. Use FFS2 only for partitions that are not critical to the system. Otherwise, you won’t be able to use the installation disk for upgrades or emergency repairs.

Blocks, Fragments, and Inodes

Both FFS and FFS2 are managed through blocks, fragments, and inodes. This arrangement isn’t unique to FFS and FFS2; filesystems such as NTFS use data blocks and index nodes, too. The indexing system used by each filesystem is largely unique, however.

Blocks

Blocks are sections of disk that contain data. Files are placed in one or more blocks. OpenBSD’s FFS uses a default block size of 16KB, or eight times the fragment size, whichever is smaller. Not all files are even multiples of 16KB, so leftover bits go in fragments. A fragment is one-eighth of the block size, or 2KB by default. A 20KB file fills one block and two fragments.

Inodes

Inodes, or index nodes, contain basic data about files, such as the file’s size, permissions, and the list of blocks that contain the file. Collectively, the data in an inode is known as metadata, or data about data.

Superblocks

You’ll also see references to superblocks, which are blocks that contain vital information about the filesystem’s size and specifications. Superblocks are so important that FFS makes many backup copies of them. If you need to meddle with superblocks, you’ve probably done something wrong or your filesystem is FUBAR.

Creating FFS Filesystems

Use newfs(8) to create FFS and FFS2 filesystems and make sure that the disk has a disklabel. The newfs command takes one argument: the partition device node.

# newfs wd1a
/dev/rwd1a: 16383.9MB in 33554304 sectors of 512 bytes
81 cylinder groups of 202.47MB, 12958 blocks, 25984 inodes each
super-block backups (for fsck -b #) at:
 32, 414688, 829344, 1244000, 1658656, 2073312, 2487968, 2902624, 3317280, 3731936,
 …

You’ll see details about the filesystem size, how many blocks it includes, and so on. The location of each superblock backup is printed as newfs proceeds. (When computers and disks were much slower, this told the operator that the computer was actually doing something and hadn’t seized up.)

The partition size determines which filesystem newfs uses. Partitions smaller than 1TB are formatted with FFS; larger partitions with FFS2. If you want to specify a particular filesystem format (yes, you can even specify the old-fashioned 4.3BSD format if you like), use the -O flag. It makes no sense to demand an FFS filesystem on a large partition, but you might have a reason to use FFS2 on a small partition.

# newfs -O 2 wd1a

If you think you need to specify which filesystem format to use on a new filesystem, you’re probably wrong.

FFS Mount Options

OpenBSD can handle FFS partitions in several special ways, controlling what sorts of changes the filesystem supports and what sorts of files may exist. These are called mount options. You can specify mount options either when you mount partitions on the command line, as we’ll discuss in Mounting and Unmounting Partitions, or in /etc/fstab.

Mount Options and /etc/fstab

Specify a filesystem’s mount options in a comma-separated list in the fourth field of the filesystem’s /etc/fstab entry. For example, here’s an /etc/fstab entry for the partition that contains my /home directory:

244f6d3acd6374ad.k /home ffs rw,nodev,nosuid,softdep 1 2

I’ve specified the options rw (read-write), nodev (device nodes forbidden), nosuid (setuid programs forbidden), and softdep (soft updates). I’ll cover these and other common mount options, and explain why you might want to use them.

Read-Only Mounts

If you only want to read the contents of a partition, and never write to it, you can mount the partition as read-only. In most cases, this is the safest way to mount a disk because you cannot alter the data on the disk or write any new data. If a filesystem should never change, mounting it read-only might make sense.

Read-only mounts are especially valuable when a particular filesystem is damaged. While OpenBSD won’t let you perform a standard read-write mount on a damaged or dirty filesystem, it can often mount those filesystems read-only. This gives you a chance to recover some data from the partition. (Not a large chance, but a chance.)

To mount a filesystem read-only, use the option rdonly or ro.

Read-Write Mounts

If you want to both read from and write to the disk, you’ll want to mount the partition as read-write. By default, OpenBSD mounts all partitions as read-write.

Use the option rw to explicitly configure read-write mounts.

On modern hardware, I recommend using soft updates in conjunction with read-write mounts.

Synchronous Mounts

Using a synchronous mount is the safest way to mount a filesystem. OpenBSD can read data from a synchronous-mounted partition as fast as the hardware permits. Whenever you write to the disk, however, the kernel feeds a chunk of data to the disk, waits to receive confirmation that the disk has accepted the data and written it to disk, and then tells the program that requested the write that the data is now on disk.

You should know that even if you’re using a synchronous mount, most hard drives lie about whether they have actually written the data to disk. These drives perform write caching, where writes are cached in a small flash or RAM buffer on the disk itself before the drive actually writes the data. This raises the question: Is a synchronous mount really synchronous? Hard drive vendors usually claim that in the event of a power failure, these disks retain just enough power to write the cache to disk.

Although they provide the greatest data integrity in the case of a crash, synchronous mounts are slow. You might use synchronous mounts when data integrity is crucial, but in most cases, it’s overkill and you have little ability to verify that the mount is truly synchronous.

Activate synchronous mounts with the sync keyword.

Asynchronous Mounts

To write data quickly, but with a higher risk of data loss, mount partitions asynchronously. When using asynchronous mounts, the kernel informs software that all disk writes are successful before the disk confirms that the data was written. This is fast, but a system failure can leave inconsistent data on your disk.

Asynchronous mounts are useful when restoring a filesystem from backup, because if you get a power failure halfway through the restore procedure, you’ll need to start over anyway. Don’t use asynchronous mounts in production if you care about your data or would object to re-creating the filesystem.

Activate asynchronous mounts with the async keyword.

Soft Update Mounts

Soft update mounts organize and arrange disk writes so that filesystem metadata remains consistent at all times. This gives performance similar to that of an asynchronous mount with the reliability of a synchronous mount. While that doesn’t mean that all data will be written to disk—a power failure at the wrong moment will result in lost data—using soft updates prevents a lot of filesystem integrity problems caused by that lost data. It’s not the default because some older, smaller hardware doesn’t have enough memory to support it, but if you’re using modern i386 and amd64 hardware, I recommend enabling soft updates for all FFS partitions.

To mount a filesystem with soft updates, use the softdep option.

“Don’t Track Access Time” Mounts

FFS records the last time a file was read, executed, or otherwise viewed. Updating these access times consumes a small but measurable amount of disk I/O and performance. You can use the noatime mount option to tell OpenBSD to not update the access time on any file.

Using noatime makes sense on laptops, where minimizing power usage is critical. If you’re tempted to use this option on your server to squeeze out a little extra performance, you should buy a faster disk instead. Some software, such as the Mutt mail client, will break if run on filesystems mounted noatime.

No Device Nodes Permitted Mount

By using the nodev mount option, you can tell OpenBSD to not interpret any device nodes on any given filesystem. Intruders can try to create “rogue” device nodes and use them to write files or attack the network, but if the kernel won’t recognize those device nodes, it cuts off this whole category of attacks.

This type of mount is also useful if you have hard drives from multiple operating systems on your computer. For example, if you dual-boot OpenBSD and Linux on your computer, but you don’t want to accidentally access a Linux device node when using OpenBSD, the nodev option will prevent you from doing so. (You might think you would notice that you had typed /linux/dev/hda rather than /dev/wd1, but never underestimate your ability to screw up.) In most cases, the partition containing /dev is the only one that should contain device nodes.

Execution Forbidden Mounts

The noexec mount option prevents any binaries on the partition from being executed. Mounting /home with the noexec option helps prevent users from installing and running their own programs, but for it to be effective, you’ll need to make sure users can’t install binaries in any shared areas, such as /tmp and /var/tmp.

Note that forbidding execution of binaries doesn’t prevent users from running interpreted scripts from that partition. Maybe the users can’t run a compiled C program, but if they can run perl $HOME/rootkit.pl, then noexec won’t slow them down very much.

setuid Forbidden

The nosuid option disallows setuid behavior from programs on this filesystem. Many partitions should not have setuid files, and setting this is an easy way to disrupt them. OpenBSD sets this on partitions such as /home and /tmp by default. You must carefully place this option on all user-writable filesystems for it to prevent undesired behavior.

Do Not Automatically Mount This Filesystem

noauto isn’t actually a mount option, but rather a way of telling OpenBSD to not mount a given partition listed in /etc/fstab at boot. I frequently make /etc/fstab entries for removable media drives, but the system should not try to mount these at boot. The boot will hang if a partition required by /etc/fstab is not available, and I don’t want my computer to refuse to boot just because I unplugged a flash drive.

Filesystem Integrity

Both versions of FFS go to a great deal of trouble to ensure that the data on disk is correct and intact. The blocks that contain a file should be recorded in an inode, the inodes should all be referenced by directory entries, and so on. When you remove a file, all references to that file should be removed.

After a system failure, however, data might not be consistent. Metadata might reference blocks that were previously erased; a file might be in a different location than the inode record specifies; and the filesystem might have all kinds of references pointing to things that have moved, changed, or disappeared. These inconsistent, or dirty, filesystems cannot be trusted and must be rationalized, or cleaned, before you can mount them read-write. If you mount a dirty filesystem read-only, it might only panic your system, but if you force OpenBSD to mount a dirty filesystem read-write, you will damage the dirty filesystem even more.

At boot, OpenBSD performs a minimal inspection and cleaning, or preening, of the filesystems and will automatically correct any minor problems found. If preening cannot fully clean the filesystem, the boot will hang until you intervene.

When confronted with a dirty filesystem, you have a few options: use the filesystem checking tool fsck(8), debug the filesystem with fsdb(8) and clri(8), or throw the filesystem away and run newfs(8). Most of the time, you’ll attempt to repair the filesystem with fsck. Using fsdb successfully requires more knowledge about FFS innards than I possess, so I recommend it to only those who really want to develop an in-depth knowledge of FFS and have a whole bunch of time to devote to it. Rebuilding the filesystem with newfs destroys everything on the filesystem, but it’s a decent choice for partitions that contain only ephemeral data, such as /usr/obj.

You can use dump(8) to copy the damaged filesystem before trying any of the repairs. This gives you the option to fall back to the current state if attempts at repairing the disk fail. (If you have to do this, though, you should probably reevaluate your backup strategy.)

Running fsck

If you try to mount a dirty filesystem either at boot time or during routine operation, you’ll see a message that looks like this:

/dev/rwd1a: UNEXPECTED INCONSISTENCY; RUN fsck_ffs MANUALLY

The fsck(8) program is a frontend for several filesystem-specific integrity-checking programs. When you run it, fsck identifies the type of filesystem and calls the correct integrity checker for you. Run fsck by giving it the device name of the filesystem you want to check:

# fsck /dev/wd1a

You can use either the raw or cooked device name; fsck is smart enough to use the raw node even if you give the cooked device name.

Examining the filesystem can take quite a while, so be patient.

When run on a dirty filesystem, fsck will probably find a number of problems: blocks that have become disassociated from their inodes, inodes that reference empty blocks, and so on. It can often make a good guess as to how everything fits together.

When fsck finds a problem that it isn’t absolutely sure about, it will suggest a fix and ask if you want to make the change. If you answer y, fsck makes the change. If you answer n, fsck leaves the filesystem unchanged. If you tell fsck not to make the change it suggests, the filesystem will still be dirty, and you’ll need to fire up fsdb or clri and make the change you think more appropriate.

Sometimes, fsck can’t identify the name or directory of a file recovered from a damaged filesystem. These files go into the partition’s lost+found directory (for example, /usr/lost+found). You’ll need to use programs such as grep and strings to try to identify these files by their contents.

Blindly Trusting fsck

Those of us who lack the skills to debug a filesystem find ourselves in a difficult situation, where we can either accept that fsck(8) knows what’s best or just restore from backup. If your filesystem was performing a lot of disk I/O just before system failure, fsck might need to make dozens or hundreds of changes. You could spend an hour sitting at the console pressing y repeatedly.

If you decide to trust fsck and hope it’s right, run fsck -y. This means “answer y to every question.” You might wind up with the entire contents of the filesystem in the lost+found directory, or you might lose every file on the filesystem. But unless you’re intimately familiar with the innards of FFS, you would need to restore from backup anyway.

If you run fsck and realize partway through that you would like to answer y to all the questions that follow, enter F. That tells fsck to answer y to all remaining questions.

At the end of the procedure, you’ve either recovered your system or need to restore from backup.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for The Fast File System

Create new playlist

Sign In

Sign Up

The Fast File System

FFS Versions

Note

Blocks, Fragments, and Inodes

Blocks

Inodes

Superblocks

Creating FFS Filesystems

FFS Mount Options

Mount Options and /etc/fstab

Read-Only Mounts

Read-Write Mounts

Synchronous Mounts

Asynchronous Mounts

Soft Update Mounts

“Don’t Track Access Time” Mounts

No Device Nodes Permitted Mount

Execution Forbidden Mounts

setuid Forbidden

Do Not Automatically Mount This Filesystem

Filesystem Integrity

Running fsck

Blindly Trusting fsck

Table of Contents for
The Fast File System