© Damian Wojsław 2017

Damian Wojsław, Introducing ZFS on Linux, https://doi.org/10.1007/978-1-4842-3306-1_5

5. Advanced Setup

Damian Wojsław

(1)ul. Duńska 27i/8, Szczecin, 71-795 Zachodniopomorskie, Poland

As mentioned previously, you can assign a hot spare disk to your pool. If the ZFS pool loses a disk, the spare will be automatically attached and the resilvering process will be started.

Let’s consider a mirrored pool consisting of two vdevs and two drives each. Just for clarity, it will be four hard drives. They will be grouped in pairs and each pair will mirror the contents internally. If we have drives A, B, C and D, drives A and B will be one mirrored pair and drives C and D will be the second mirrored pair:

trochej@ubuntuzfs:~$ sudo zpool status

  pool: datapool
 state: ONLINE
  scan: none requested
 config:


    NAME        STATE     READ WRITE CKSUM
    datapool    ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        sdb     ONLINE       0     0     0
        sdc     ONLINE       0     0     0
      mirror-1  ONLINE       0     0     0
        sdd     ONLINE       0     0     0
        sde     ONLINE       0     0     0


errors: No known data errors

You add a hot spare device by running the zpool add spare command:

trochej@ubuntuzfs:~$ sudo zpool add datapool -f spare /dev/sdf

Next, confirm the disk has been added by querying the pool’s status:

trochej@ubuntuzfs:~$ sudo zpool status datapool

  pool: datapool
 state: ONLINE
  scan: none requested
 config:


    NAME        STATE     READ WRITE CKSUM
    datapool    ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        sdb     ONLINE       0     0     0
        sdc     ONLINE       0     0     0
      mirror-1  ONLINE       0     0     0
        sdd     ONLINE       0     0     0
        sde     ONLINE       0     0     0
    spares
      sdf       AVAIL


errors: No known data errors

If you want to remove the spare from the pool, use the zpool remove command:

trochej@ubuntuzfs:~$ sudo zpool remove datapool /dev/sdf

You can use zpool status here too to confirm the change.

You can have a hot spare shared among more than one pool. You could create a mirrored pool that hosts very important data or data that needs to be streamed very quickly. You could then create a second pool RAIDZ that needs more space but is not that very critical (still redundant, but can only lose one disk). You can then have a hot spare assigned with both pools. The one that has failed will claim the hot spare device and then the device will not be usable for the second pool until it’s freed.

Note

Using hot spares comes with one important caveat. If you plan drives in the pool in a way to minimize hardware failure impact, the hot spare may not be placed in the best way to let you keep that quality. This is true especially for shared hot spares. Many real-life installations that I have seen used spare drives. They were placed in the chassis in a way to ensure the best hardware fault resiliency in most cases. When a drive in a pool failed, the system administrator would get an alert from the monitoring system and then would replace the drive manually.

ZIL Device

ZIL stands for ZFS Intent Log. It is the portion of data blocks that persistently store the write cache. Normally, ZFS will allocate some blocks from the storage pool itself. However, due to the pool being busy and on a spinning disk, the performance may not be satisfying.

To better accommodate performance requirements, the ZIL (called also a SLOG ) can be moved to a separate device. That device must be boot persistent, so that sudden power failure does not mean transaction data loss. In the case of RAM-based devices, they must be battery- or capacitor-powered. You can also use an SSD device.

The ZFS Admin Guide suggests that the ZIL be no less than 64 MB (it is the hard requirement for any device to be used by ZFS) and at most half of the available RAM. So for 32 GB of RAM, a 16 GB ZIL device should be used. In reality, I have rarely seen anything bigger than 32 GB, and 8 or 16 GB is the most common scenario. The reason is that this is a write buffer. Writes that would be flushed to the hard drive get grouped in the ZIL to allow for fewer physical operations and less fragmentation. Once the threshold is met, those grouped changes are written to the physical drives. Giving it a fast device, ideally a RAM device, allows for those operations to be very fast and speed writes considerably. This also allows you to divert the I/O (writing to ZIL) that would normally utilize pool bandwidth, giving the pool itself some extra performance.

To add the ZIL device , first confirm that your pool is healthy. It will also remind you which drives are part of the ZFS pool:

root@xubuntu:~# zpool status
 pool: data
state: ONLINE
 scan: none requested
config:


       NAME        STATE     READ WRITE CKSUM
       data        ONLINE       0     0     0
         mirror-0  ONLINE       0     0     0
           sdb     ONLINE       0     0     0
           sdc     ONLINE       0     0     0
         mirror-1  ONLINE       0     0     0
           sdd     ONLINE       0     0     0
           sde     ONLINE       0     0     0
         mirror-2  ONLINE       0     0     0
           sdf     ONLINE       0     0     0
           sdg     ONLINE       0     0     0


errors: No known data errors

 pool: rpool
state: ONLINE
 scan: none requested
config:


       NAME          STATE     READ WRITE CKSUM
       rpool         ONLINE       0     0     0
         root_crypt  ONLINE       0     0     0


errors: No known data errors

Add the /dev/sdh and /dev/sdi drives as mirrored log devices:

root@xubuntu:~# zpool add -f data log mirror /dev/sdh /dev/sdi

While the contents of L2ARC (described in the next section) are not critical, the ZIL holds information about how your data changes on the disks. Losing ZIL will not make the ZFS file system corrupted, but it may cause some changes to be lost. Thus mirroring.

Confirm that the change is in effect by running zpool status:

root@xubuntu:~# zpool status data
 pool: data
state: ONLINE
 scan: none requested
config:


       NAME        STATE     READ WRITE CKSUM
       data        ONLINE       0     0     0
         mirror-0  ONLINE       0     0     0
           sdb     ONLINE       0     0     0
           sdc     ONLINE       0     0     0
         mirror-1  ONLINE       0     0     0
           sdd     ONLINE       0     0     0
           sde     ONLINE       0     0     0
         mirror-2  ONLINE       0     0     0
           sdf     ONLINE       0     0     0
           sdg     ONLINE       0     0     0
       logs
         mirror-3  ONLINE       0     0     0
           sdh     ONLINE       0     0     0
           sdi     ONLINE       0     0     0


errors: No known data errors

Your new log device is mirror-3.

L2ARC Device (Cache)

ZFS employs a caching technique called Adaptive Replacement Cache. In short it is based on the Least Recently Used (LRU) algorithm, which keeps track of access times of each cached page. It then orders them from most recently used to least recently used. The tail of the list is evicted as the new head is added.

ARC improves this algorithm by tracking pages on two lists—most recently used and most frequently used. The technical details are not as important here, but it suffice it to say, efficiency of ARC-based caches is usually much better over LRU.

ARC always exists in the memory of the operating system when the pool is imported. As a side note, if you monitor your RAM and see that most of it is being used, do not panic. There’s this saying, “unused RAM is wasted RAM”. Your operating system is trying to cram as much in the memory as possible, to lower the disk operations. As you know, disks are the slowest parts of the computer, even with the modern SSD drives. What you should pay attention to is how much of this utilized RAM is cache and buffers and how much is gone to running processes.

With very busy servers, it makes lots of sense to load as much data from the drives to the memory as possible, as it can speed up operations considerably .

Reading data from RAM is at least 10 times faster than reading it from hard drive. What happens, however, if you have limited memory resources and still want to cache as much as possible?

Put some SSD drives into your server and use them as L2ARC devices. L2ARCs are level-2 ARCs. Those are pages that would normally get evicted from cache, because the RAM is too small. But since there’s still a very high chance of them being requested again, they may be placed in the intermediate area, on fast SSD drives.

For this reason, placing L2ARCs on mirrored SSDs makes a lot of sense.

To put /dev/sdi as a cache device into your pool, run the following:

root@xubuntu:~# zpool add -f data cache /dev/sdi

Confirm it worked:

root@xubuntu:~# zpool status data
 pool: data
state: ONLINE
 scan: none requested
config:


       NAME        STATE     READ WRITE CKSUM
       data        ONLINE       0     0     0
         mirror-0  ONLINE       0     0     0
           sdb     ONLINE       0     0     0
           sdc     ONLINE       0     0     0
         mirror-1  ONLINE       0     0     0
           sdd     ONLINE       0     0     0
           sde     ONLINE       0     0     0
         mirror-2  ONLINE       0     0     0
           sdf     ONLINE       0     0     0
           sdg     ONLINE       0     0     0
       logs
         sdh       ONLINE       0     0     0
       cache
         sdi       ONLINE       0     0     0


errors: No known data errors

Quotas and Reservations

In normal operations, every file system in a pool can take free space freely up to the full pool’s capacity, until it ends. The only limitation is the other file systems also taking the space. In that regard, with ZFS you should not think in the file systems’ capacities, but in the total pool space.

There are, however, situations when you need to emulate the traditional file system behavior, when they are limited to some space or guaranteed to have it for their own use.

Let’s consider a traditional file system created on top of a normal disk partition. If the partition was created as 3 GB, the file system will have no less and no more than 3 GB for itself. If you mount it as, say, /var/log, then the logs in your system will have all 3 GB of space for themselves and no more than that. They will also be separate from other file systems. Thus, logs filling the /var/log directory will not make your root partition full, because they live in a separate space.

Not so with ZFS! Consider a root directory mounted on ZFS file system. Let’s say the pool has 16 GB of space, total. This applies to file systems for /home, for /var, and for /var/log. After the installation of the system, suppose you’re left with 11 GB of free space. Each file system can consume this space. If, for some reason, the logs go wild—maybe some application switched to debug mode and you forgot about it—they may fill this 11 GB of space, starving all other file systems. In the worst case, you won’t be able to log in as root.

There are two possible actions that you can take, depending on how you wish to approach this problem: using quotas and using reservations.

Quotas are like traditional Linux quotas, except they are set against file systems and not system users. By setting up a quota, you prevent the given file system from growing beyond the set limit. So if you want /var/log to never exceed 3 GB, you will set a quota on it.

Reservations, on the other hand, are guarantees given to the file system. By setting a 3 GB reservation, you guarantee that this given file system will have at least 3 GB of space and other file systems in the pool will be prevented from claiming too much space.

To make matters a little bit more complicated, there are two versions of each: quotas and refquotas, and reservations and refreservations. The difference is quite important, as my experience taught me.

Quotas and reservations account the storage used by both the file system and its descendants. It means that this 3 GB of space will be the limit for the file system and its snapshots and clones . Refquotas, on the other hand, will only track space used by the file system itself. It opens the way for interesting scenarios, where you can separately set limits for the file system and its snapshots. But the quota comes with important twist: snapshots grow as you change the data. You must pay attention to the size of your snapshots and the rate at which they grow, or you may hit the quota before you expect it.

The same flavor distinction comes with reservations and refreservations. The reservation will guarantee space for file system and its descendants and the refreservation will only keep this space for the file system itself. Again, pay attention, as the end result of your settings may not be what you wished for nor what you expected.

Let’s work through some examples based on a real-life scenario.

The server you are running has pool data. The total capacity of this pool is 30 TB. This pool will be shared by finances, engineering, and marketing. There will be also the shared space that people can use to exchange documents and silly cat pictures.

All three departments have given you the size to which their directories can grow in the future. Finances and marketing said it’s going to be approximately 5 TB each and engineering said they expect it to grow up to 10 TB. Together, it gives 20 TB, leaving you with 10 TB of free space to do other things.

Now, 30 TB of space may look like a great number and for most small organizations, it probably is. On the other hand, engineering data or raw pictures and videos (in graphic studios, for example) can outgrow it quickly.

Snapshots are the subject of the next subsection, but let’s just shortly introduce them here. The snapshot of a file system can be compared to the still image of file system at a given time—namely at a time of taking the snapshot. In ZFS, it can be treated like any other file system, except it’s read-only. It means, looking into this file system you will see files and directories in state at exactly the moment the zfs snapshot command was run. No matter what happens with those files after you run this command, you can always retrieve them in the previous state from the snapshot.

The amount of space a snapshot consumes is equal to the size of changes introduced to the data. Sounds complicated, so let’s demystify it. Engineering has a big CAD file size of 5 GB. It’s an imported project that will be worked on. After it was copied over to the ZFS, a snapshot was taken just in case. The engineer opened the file and changed a few things. After saving it, most of the file stays the same, but some places are different. The size of those differences summed up is 300 MB. And that’s the size of the snapshot. If someone deleted the file, the snapshot would grow to 5 GB, because that’s the difference between the actual file system and the snapshotted moment. The mechanism behind this is explained in the next section. For now, just acknowledge this as a fact.

This space consumption by snapshots plays important role when setting up both reservations and quotas. Let’s look back at the engineering department file system. The department estimated that the amount of data they will store in the file system will reach 10 TB. But they only estimated “raw” data. Files themselves, not their snapshots. Assume the daily amount of changes introduced to project files adds up to 5 GB. That is the amount of space ONE snapshot will take each day, unless it’s destroyed. For simplicity, assume there’s only going to be one snapshot and it will be held forever. Within a year this will amount to almost 2 TB of space taken from the pool! Now assume you create a reservation for the engineering file system and give them 10 TB. You also add a quota, 11 TB, so that they have a breathing space, but so that they won’t starve other users. As assumed, their space consumption starts to near 9 TB in a year and suddenly, whole 2 TB short of target, they get an out of space error when trying to write anything. To quickly resolve the situation, they delete some old files known to be last edited a long time ago and present in several backups. Apparently, they have freed 3 TB of space, except they keep getting the out of space error. At some point they can’t even delete files, because of this error!

This is the the reservation kicking in. The first part of the problem is that snapshot quietly takes space from the quota as it grows. It is only evident once you analyze space consumption using the zpool -o space command (explained elsewhere). But the other part of the problem, the counterintuitive out of space error when deleting things, comes from the nature of the snapshot itself. When you remove the files from the file system, those files are added to the snapshot. The only way to free this space is to destroy the snapshot using this command:

zfs destroy pool/engineering@snapshot

Now let’s consider other departments. If you put a quota on them and they edit the files enough, they may soon reach their quotas due to file system snapshots. Also, most often there is more than one snapshot. It’s entirely up to the policy maker, but most often there are monthly, weekly, and daily snapshots. Sometimes there are also hourly snapshots, depending on how much the data changes during the day.

Now come back to the difference between quotas and reservations and refquotas and refreservations. The first ones track whole usage, including snapshots. The latter only the file systems. For the engineering department, you could set up refquota to 11 TB and the quota to, say, 13 TB. This would open space for the snapshot to grow as files were deleted, allowing for a temporary solution. Nothing beats space utilization monitoring, though.

Quotas, reservations, refquotas, and refreservations are file system properties. It means they are set and checked using the zfs set and zfs get commands.

root@xubuntu:~# zfs list
NAME                USED  AVAIL  REFER  MOUNTPOINT
data                179K  2.86G    19K  /data
data/engineering     19K  2.86G    19K  /data/engineering

To check the current values of quota, refquota, reservation, and refreservation on the data/engineering file system , run the following:

root@xubuntu:~# zfs get quota,refquota,reservation,refreservation data/engineering
NAME              PROPERTY        VALUE      SOURCE
data/engineering  quota           none       default
data/engineering  refquota        none       default
data/engineering  reservation     none       default
data/engineering  refreservation  none       default

They are not set by default, as you can see. Since my test pool is much smaller than the considered scenario, let’s set the reservation to 1 GB and the quota to 1.5 GB with a bit lower refquota and refreservation :

root@xubuntu:~# zfs set quota=1.5G data/engineering
root@xubuntu:~# zfs set refquota=1G data/engineering
root@xubuntu:~# zfs set reservation=800M data/engineering
root@xubuntu:~# zfs get quota,refquota,reservation data/engineering
NAME              PROPERTY     VALUE     SOURCE
data/engineering  quota        1.50G     local
data/engineering  refquota     1G        local
data/engineering  reservation  800M      local

Snapshots and Clones

Here we come to discuss snapshots and clones, two powerful features of ZFS. They were already discussed a bit earlier, so here is the time for detailed explanation.

As explained, snapshots are a way of “freezing” the file system contents at a given time. Due to the Copy on Write nature of ZFS, creating snapshots is fast (takes usually a fraction of a second) and takes very little processing power. It is thus common to create snapshots as a basis for long-running jobs that require contents to be static, like for example backup jobs. Running a backup job from a large file system may archive files at different times. Running it off a snapshot guarantees that all files will be captured at the same exact time, even if the backup is running for hours. Additionally, if backed up files consist of state files of an application that needs to be shut down for the duration of the backup process, the down time of this application can be reduced to mere fractions of a second.

One additional property of a snapshot is the ability to roll back the current file system to the snapshot. It means that the administrator can rewind all the files to the moment of snapshot creation.

ZFS writes changed blocks in a new location in the pool. Thus it leaves old blocks untouched unless the pool is filled and the old space needs to be reclaimed. Due to this, snapshots are automatically mounted into the .zfs/snapshot subdirectory of a snapped file system. As an example, for the data/documents ZFS file system, if there is a snapshot data/documents@initial, the contents of this snapshot can be accessed by looking into /data/documents/.zfs/snapshot/initial.

Snapshot contents can be accessed either looking into the directory above or by running a rollback command, which effectively rewinds the file system to the moment of snapshot creation. The process is very fast. It only takes as much time as updating some metadata. The administrator needs to exercise some caution though—once rolled back the file system can’t be fast forwarded to its current state .

There are situations where a read-only snapshot is not enough and it might be useful to be able to use it as a normal file system. ZFS has such a feature and it’s called a clone. A clone is a read-write copy of a snapshot. Initially, clone and snapshot refer the same set of bytes, thus the clone does not consume any disk space. When changes are introduced to the clone’s contents, it starts to take space.

A clone and a snapshot it was created from are related in a parent-child manner. As long as clone is in use, the snapshot cannot be destroyed.

Why are snapshots useful? They can guard against files corruptions by faulty software or accidental deletions. They can also provide a means of looking into the file before some edit. They can be used as a snapshot of file system prepared to be backed up.

Why are clones useful? One interesting use of clones is to create one before important updates of an operating system. Long known in the world of illumos and FreeBSD, boot environments are root file system clones that can be booted into. This allows for a quick reboot to a known working operating system after a broken upgrade. They have been also used as means of cloning containers and virtual machines. The uses are limited by imagination.

Now, after this introduction, onto the usage itself.

ZFS ACLs

Linux is an operating system from the Unix tradition. The Unix operating systems are multi-user systems, allowing many users to operate the same computer. This brought a standard model of file and directory permissions control. In this model, there are three types of actions and three types of actors. The actions are read, write, and execute and the actors are owner, group, and all others. Both can be combined, giving a simple, yet quite effective way of restricting and granting access to certain directories and files in Linux. This model is known as discretionary access control (DAC) .

DAC allows for flexible control of who can utilize certain system areas and how. However, the more users and the more complex organizational structure, the more difficult it is to express them using the DAC model. At some point, it becomes impossible. Thus, a new way of representing access control method was invented.

Linux adopted POSIX ACLs. ACL means access control list and is exactly that: a list of access controlling entries that can create much more fine-grained policies about who can read, write, or execute a given file and how they do so.

ZFS on its default operating system—illumos—supports separate sets of ACLs, conformant with NTFS ACLs. They are set and listed by extended ls and chmod commands. Unfortunately, those commands are different from their Linux counterparts and thus on Linux, standard ZFS ACLs are unsupported. This means that if the system administrator wants to go beyond the DAC model, they have to utilize POSIX ACLs and standard commands: setfacl for specifying the list and getfacl for listing it. The upside is that every other major Linux file system uses those commands, thus you only need to learn once. The downside is, if you ever have a pool imported from illumos or FreeBSD, ACLs may go missing.

DAC Model

Before I explain POSIX ACLs, I first need to explain the DAC model using a simple scenario.

Assume there’s a server that has three users: Alice, John, and Mikey. Alice is a project manager, John is a programmer, and Mikey works in accounting. There are three directories on the server that are accessible to users:

  • Code: it’s contains what it says: the source code for the project that Alice manages and John codes. Company policy says that both Alice and John should be able to access the contents of this directory, but only John can add new files or edit existing ones. Mikey should not see the contents of this directory.

  • Documents: This directory contains typical project documentation. Architecture analysis, project overview, milestones, customer signoffs, etc. Company policy says Mikey and John should be able to read these files, but not edit them, and Alice should be able to both read and edit files.

  • Accounts: This directory contains financial data: time accounting from John and Alice, invoices for customers and from contractors related to the project, budget, etc. Mikey has full control over these files. Alice should be able to read them all, but edit only some, and John should not be able to do either.

This, obviously, doesn’t reflect a real-life programming project, but it is sufficient for our purposes. Traditional DAC model tools that we have are:

  • System users and groups

  • Directory and file access controls

  • Each directory and file has an owner (system user) and a group (system group that also owns the directory or the file)

Having those three allows us to do quite a lot regarding management in this small scenario.

Let’s start by creating ZFS file systems for each of these directories. Assume the directory is called data. For better data accessibility, the pool is mirrored.

$ sudo zpool create data mirror /dev/sdb1 /dev/sdc1

Now that we have a pool, we create file systems for the three directories:

$ sudo zfs create data/Code
$ sudo zfs create data/Documents
$ sudo zfs create data/Accounts

Assume that system users for Alice, John, and Mickey already exist and their logins are, surprise, alice, john, mickey, accordingly. Additionally, three groups have been defined: projmgmt for project managers, devel for developers, and accnt for accounting. Before we set up permissions, let’s create a table that will exactly describe who should be able to do what. It’s a good practice when setting up file server structure to prepare such a matrix. It helps tidy up and visualize things.

Access control uses three letters to denote the rights assigned to user or group :

  • r – read

  • w – write

  • x – execute. This bit set on directory means that the user or group can see its contents. You actually can’t execute a directory. To differentiate between execute and access, x is used for the first and X is used for the latter.

Table 1-1 quickly makes it obvious that groups have the same rights as the users that belong to them. It may then seem like overkill to duplicate access rights for both. At this point in time it certainly is, but we should always plan for the future. It’s not a lot of work to manage both group and user rights and each directory needs to have its owning group specified anyway. And, if in the future, any of those groups gains another user, giving them privileges will be as easy as adding them to the group to which they should belong.

Table 1-1 Project Directories, Users, Groups, and Access Rights

User/Group/Directory

Alice

John

Mickey

projmgmt

devel

accnt

Code

rX

rwX

---

rX

rwX

---

Documents

rwX

rX

rX

rwX

rX

rX

Accounts

rX

---

rwX

rX

---

rwX

This doesn’t account for separate users who will run backup daemons and should at least be able to read all directories to back up their contents and maybe write, to recreate them if need be. In this example, backups can be done by snapshotting the directories and using zfs send|zfs recv to store them on a separate pool, where special daemons can put them on tapes.

For now, the following commands will be sufficient, if we want to apply just the user and owner’s group rights.

$ sudo chown -R alice:projmgmt data/Documents
$ sudo chown -R john:devel data/Code
$ sudo chown -R mickey:accnt data/Accounts
$ sudo chmod -R =0770 data/Documents
$ sudo chmod -R =0770 data/Code
$ sudo chmod -R =0770 data/Accounts

The =0770 is an octal mode of setting permissions. The equals sign means we want to set permissions exactly as in the string, the leading zero is of no interest at this point, and the second, third, and fourth digits are the permissions for owner, owning group, and all others accordingly. The permissions set are represented by numbers and their sum: 4 – read, 2 – write, and 1 – execute. Any sum of those will create a unique number: 5 means read and execute, 6 means read and write, and 7 means all of above. The octal mode is a very convenient way of setting all bits at once. If we wanted to use named mode, user, group, or others, we would have to run this command once for each:

$ sudo chmod -R ug=rwX data/Documents
$ sudo chmod -R o-rwX data/Documents

This command creates the set of permissions reflected in Table 1-2.

Table 1-2 Project Directories, Users, Groups, and Access Rights After First Commands

User/Group/Directory

Alice

John

Mickey

projmgmt

devel

accnt

Code

---

rwX

---

---

rwX

---

Documents

rwX

---

---

rwX

---

---

Accounts

---

---

rwX

---

---

rwX

Obviously, this is not the set we wanted to achieve. One way to tackle it is to change the owning group to the one that needs read access :

$ sudo chown -R john:projmgmt data/Code
$ sudo chmod -R =0750 data/Code

This gives Alice access to read the Code directory; however, it doesn’t solve the problem of another person joining the project management or accounting group. Let’s assume that Susan joins the PM team and needs to have the same set of permissions as Alice. With the current model, this is impossible to achieve. This is where ACLs come in to play.

ACLs Explained

ZFS doesn’t allow the use of Linux ACLs (or rather POSIX ACLs) out of the box. It needs to be told to do this. The command to run is :

$ sudo zfs set acltype=posixacl data

This command turns on POSIX ACLs for a given file system. This property is by default inherited by all the child file systems, so if it’s set on a root of ZFS, it will be propagated all the way down. You can verify it by running the zfs get command:

$ sudo zfs get acltype data
NAME  PROPERTY  VALUE     SOURCE
lxd   acltype   posixacl  local

How do ACLs help solve the problem above? It’s simple. They allow developers to store more than those three DAC entries used previously. It is possible to have a separate permission set per additional user or group. There are two tools used to administer ACLs: setfactl and getfacl.

$ setfacl -m g:projmgmt:r /data/Code
$ setfacl -m g:devel:r /data/Documents
$ setfacl -m g:accnt:r /data/Documents
$ setfacl -m g:projmgmt:r /data/Accounts

Remember that ACL commands operate on directories, not on ZFS file systems!

These commands will give additional groups exact rights, as in Table 1-1, just as expected. We can confirm that by running getfacl for each directory, as follows :

$ getfacl /data/Documents
getfacl: Removing leading '/' from absolute path names
# file: data/Documents
# owner: alice
# group: projmgmt
user::rwx
group::rwx
group:devel:r--
group:accnt:r--
mask::r-x
other::r-x

The syntax for setfacl mode is as follows:

setfacl -mode:user|group:permissions directory[/file]

The setfacl command works in two modes: add ACL entry or remove ACL entry. Use the -m and -x switches accordingly. In the previous example, the -m switch was used to add an ACL list entry to a specific group. To remove an entry, you need to run the command with the -x switch:

$ setfacl -x g:devel /data/Documents

This will remove all ACL entries for the devel group added to the /data/Documents directory.

Replacing Drive

There are many scenarios in which you may need to replace drives. Most common is drive failure. Either your monitoring systems warned you about upcoming drive failure or the drive has failed. Either way you need to add new drive to the pool and remove old one.

There is another reason for replacing a drive. This is one of methods, slow and cumbersome, of growing ZFS pool without adding drives - by replacing old ones, one by one, with larger disks.

Consider first scenario. You pool is reported as healthy in zpool status output, but you know one of drives is going to fail soon. Assume that in pool printed below drive to fail is sdb.

NAME         STATE    READ   WRITE   CKSUM
tank         ONLINE      0       0       0
  mirror-0   ONLINE      0       0       0
    sdb      ONLINE      0       0       0
    sdc      ONLINE      0       0       0

Assume you have added drive, sdd, to the system. You can either run zpool replace command:

sudo zpool replace tank sdb sdd

which will attach sdd to sdb forming a mirror for short time and then remove sdb from the pool. Or you can do it in two steps, first attach sdd to sdb manually, wait until resilver is complete and then remove sdb yourself:

sudo zpool attach tank sdb sdd

sudo zpool status
  pool: tank
state: ONLINE
  scan: resilvered 114K in 0h0m with 0 errors on Tue Nov 7 21:35:58 2017
config:


NAME         STATE    READ   WRITE   CKSUM
tank         ONLINE      0       0       0
  mirror-0   ONLINE      0       0       0
    sdb      ONLINE      0       0       0
    sdc      ONLINE      0       0       0
    sdd      ONLINE      0       0       0

You can see this has effectively turned mirror-0 into three way mirror. Monitor the resilver process and when it’s done issue:

sudo zpool detach tank sdb

which will remove sdb device from your pool.

In case when the drive has already failed steps are similar as above, except you will see

NAME         STATE      READ   WRITE   CKSUM
tank         DEGRADED      0       0       0
  mirror-0   DEGRADED      0       0       0
    sdb      UNAVAIL       0       0       0
    sdc      ONLINE        0       0       0

Follow the steps are previously:

sudo zpool replace tank sdb sdd

This will replace the failed drive with new one.

Growing the pool without adding new drives means replacing every disk in a pool with new one, bigger. Assume you would want to make the pool tank something bigger than current 2 GB:

sudo zpool list
NAME SIZE  ALLOC FREE  EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
tank 1.98G 152K  1.98G -        0%   0%  1.00x ONLINE -

The steps would mean:

  1. Add new drives into the chassis. They have to be the same geometry and size.

  2. Attach new, bigger drive to the mirror and wait until it finishes the resilver process.

  3. Remove old drive.

  4. Attach next bigger drive. Wait for resilver, remove.

Instead of attaching and removing you can run replace command. It will do all the steps above for you:

sudo zpool replace tank sdb sdd

If you have pool built of more than one vdev, you can run replace command for each vdev. This will speed things a bit.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.161.225