15. Operating a Swift Cluster

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 15. Operating a Swift Cluster

Martin Lanner

In this chapter we move from installation to the everyday operations of a Swift cluster. We’ll cover best practices for conducting day-to-day operational tasks, such as planning capacity additions and monitoring—whether you choose to do these in Swift or through SwiftStack. The recommendations and best practices in this chapter are based on our experiences building and operating both large and small clusters for a variety of workloads. By the end of this chapter, you’ll have a good understanding of how to operate and monitor a Swift cluster and how SwiftStack automates many of these processes through the SwiftStack Controller.

Operational Considerations

Because Swift is a distributed system that is controlled by software, does not rely on RAID, and writes multiple copies of each object (file), operating a Swift cluster is fundamentally different from operating traditional storage systems such as storage area networks (SAN) or using network-attached storage (NAS) equipment.

When dealing with SANs and NASes, if a disk dies, the operator should make sure the bad disk is replaced right away to ensure that the RAID is rebuilt and is returned to full parity in the shortest time possible. With Swift, if a disk or even an entire node goes bad, it usually isn’t a huge problem. The only time losing a drive or node can present issues is when a Swift cluster is small and very full. For example, if you have a three-node cluster that is 90% full, losing a few large disks or a node could cause problems because Swift might not have enough space to relocate data within the cluster.

How Swift Distributes Data

To understand how best to add and remove capacity in a Swift cluster, it is important to remember how Swift places data (discussed in more detail in Locating the Data).

Swift distributes data across disks in the cluster based on an as-unique-as-possible model. This data placement algorithm prefers locations that are in different regions, zones, nodes, and disks in order to minimize the risk of all replicas being lost. All data stored in Swift also has handoff locations defined, which are alternative data placement locations in the cluster, should one of the replicas not be available due to hardware or other failure.

When you add more capacity to a Swift cluster, data is redistributed evenly across cluster resources, including newly added drives. For instance, if the cluster is 80% full and you add a new node or a new rack without any data, the Swift cluster will reorganize itself so all drives will have a roughly equal percentage of data stored on them based on the disk weight. For a large cluster, this additional replication traffic might not be noticeable, but for a small cluster, the traffic needs to be managed to ensure it does not negatively affect performance of the cluster. It is therefore important to let the data flow into the new capacity you’ve added at a moderate rate, and not at the bottleneck rate of the network.

Keeping Track of the Rings and Builder Files

As we discussed earlier in the book (see Chapter 3), keeping the rings the same on all nodes is critical for Swift to work properly. Every time capacity is adjusted in a cluster, the rings will need to be updated on every node. With only a few nodes, the task of manually keeping the rings the same is manageable. However, with more than a handful of nodes, it can easily become a time-consuming task. All changes affecting Swift’s rings will need to be kept track of and in sync.

Generating rings can be done from any server node with Swift installed. It is best, however, to dedicate one server as the one on which all ring generation is done. This server can be a node in the cluster or a standalone machine that does not participate in the cluster. For tracking, scalability, and backup reasons, keeping the builder files and rings on a dedicated, standalone machine is considered best practice. As rings are generated and updated through the lifecycle of a Swift cluster; the rings will then be distributed out to the Swift nodes every time they change.

Losing your builder files or rings would be a huge problem, so make sure you have them backed up properly.

Managing Capacity

Data is always growing and disk capacity is always increasing. Sometimes disks go bad and need to be replaced. When storage capacity is running low, new disks or nodes will need to be added, and occasionally nodes might need to be removed due to obsolescence or relocation of capacity to another cluster. These are all events that will trigger Swift ring changes. Capacity adjustments and ring changes go hand in hand and are integral parts of the lifecycle of a Swift cluster.

To keep track of the size of disks and to be able to gradually add or remove capacity in the cluster, we recommended that you create a convention for setting the disk weight. The disk weight is a number that defines the amount of data to be stored on a disk relative to other disks in the system. Although the disk weight is an arbitrary number, it is essential that the weight is kept to a sane, meaningful number and set in a consistent way throughout the cluster.

SwiftStack bases disk weight on the number of gigabytes per disk. For example, the disk weight of a 4 TB disk is 4,000. This helps make it clearer to an operator what the full size of the disk is. It also establishes a relative weighting system for the cluster, which is useful for capacity adjustments.

If you want to gradually increase the weight of a disk in the system, you can do so by setting the weight of a disk to a number lower than its relative full weight, as compared to other disks in the system. Hence, if you want to gradually add a 4 TB disk into the cluster, let’s say by 25% of its full weight (4,000) at a time, then you would add it to the rings by a weight of 1,000 at a time. As discussed earlier, this would require the rings to be pushed four times (4×1,000) to reach the disk’s full weight in the system. In between the rings being pushed to the cluster, it is prudent to wait until the cluster has rebalanced itself based on its latest rings. After rebalancing has happened, the next incremental push can take place.

So why do we have to do all this tedious work to add or remove new disks in the cluster? Why not just add a disk in with 100% of its weight? The reason for gradually adding or removing disks is that you don’t want to overwhelm the cluster, which can negatively affect the experience of end users. When adjusting capacity, the replicators and auditors will be working harder, thus using more CPU, and the network traffic between nodes will increase as data is relocated between nodes to balance the cluster.

For example, maybe you have a 1 PB cluster that is starting to get full and you have just purchased and installed equipment with another 1 PB of storage. Let’s say your existing 1 PB is 90% full. If you were to add the new 1 PB immediately, what will happen as soon as you push the new rings out is that half of the existing data will start flowing onto the disks in the new 1 PB of equipment, in an attempt to rebalance the cluster to make every drive in the system become approximately 45% full (half of 90%). That’s 450 TB of data that needs to move, and if unleashed all at one time, it could saturate the network links in the cluster and hurt availability for end users.

As you are probably figuring out, capacity management can be tricky. It is important to do it carefully and with a plan. In the next few sections we will go over examples of how capacity adjustments can be made. At the end of this section of the chapter, we will outline how capacity adjustments are handled when using the SwiftStack Controller.

What to Avoid

Just as with any other storage device or system, completely filling up a Swift cluster should be avoided. Full disks, nodes, zones, and regions can be challenging and time-consuming to deal with.

At the end of this chapter we will cover some of the monitoring integration procedures we see commonly used to ensure that disks and clusters don’t fill up.

The only real way out of a full cluster is to add more capacity. Adding a few extra drives, if you have empty drive bays, is fast and can alleviate the immediate problem of a full cluster. However, adding new nodes can take a long time, especially if you don’t have spare, unused nodes sitting around in inventory. Procuring new hardware can sometimes take weeks, and if the cluster is already getting very full and more data is continuously being added, the cluster will reject new data when it fills up. The existing data itself will be safe, but the cluster might become become slow as it traverses handoff locations attempting to find a place to put incoming data.

Adding Capacity

Based on the convention mentioned earlier of assigning a sensible weight to disks, throughout this chapter we will be using a full disk weight of 1,000, which corresponds to a 1 TB disk, for all disks. In the first example that follows, we start with one node with three disks in it, already provisioned. This is a single-node cluster. From there we will be adding capacity to our cluster by performing the following tasks:

Existing cluster: initial ring on node
- Adding disks immediately
- Adding disks gradually
Adding nodes
- Adding a node immediately
- Adding a node gradually

To make capacity adjustments, we will use the swift-ring-builder program. If you recall, we used it in Chapter 9 to build our first Swift cluster. It is used for all additions and removals of disks and nodes. swift-ring-builder is used with all the rings, but in this chapter, because we will mainly be focusing on adding object capacity, we will predominantly use the object.builder subcommand. Of course, if you were to add account or container capacity, you would use the account.builder or container.builder command instead.

Existing Cluster: Initial Ring on Node

Starting off, we have one node in our cluster. The following swift-ring-builder object.builder command shows our original, single-node cluster, including three disks (Swift devices) with a perfectly balanced object ring.

Lines in output have been broken over multiple lines to fit the book’s page size. The “Devices” line is a long heading, and it is followed by three lines, one for each device.

root@swift01:/etc/swift# swift-ring-builder object.builder
object.builder, build version 3
1024 partitions, 3.000000 replicas, 1 regions, 1 zones, 3 devices, 0.00
balance
The minimum number of hours before a partition can be reassigned is 1
Devices:    id  region  zone      ip address  port  replication ip
replication port      name weight  partitions balance meta
             0       1     1       127.0.0.1  6000       127.0.0.1
  6000      n1d0 1000.00       1024    0.00
             1       1     1       127.0.0.1  6000       127.0.0.1
  6000      n1d1 1000.00       1024    0.00
             2       1     1       127.0.0.1  6000       127.0.0.1
  6000      n1d2 1000.00       1024    0.00

Adding disks immediately

To add devices to the ring, we will need to use the swift-ring-builder command to specify the name of the new disk. In the following command, we add a device named n1d3 (short for node 1; device 3), with a weight of 1,000:

root@swift01:/etc/swift# swift-ring-builder object.builder add
r1z1-127.0.0.1:6000/n1d3 1000
Device d3r1z1-127.0.0.1:6000R127.0.0.1:6000/n1d3 with 1000.0 weight got id 3

If we run the swift-ring-builder command again without arguments making a request, you will see that device 3 is now listed in the last row at the bottom. Also, note that the balance of the disk is –100.00 before we rebalance the cluster with the new disk:

root@swift01:/etc/swift# swift-ring-builder object.builder
object.builder, build version 4
1024 partitions, 3.000000 replicas, 1 regions, 1 zones, 4 devices, 100.00
balance
The minimum number of hours before a partition can be reassigned is 1
Devices:    id  region  zone      ip address  port  replication ip
replication port      name weight  partitions balance meta
             0       1     1       127.0.0.1  6000       127.0.0.1
  6000      n1d0 1000.00       1024   33.33
             1       1     1       127.0.0.1  6000       127.0.0.1
  6000      n1d1 1000.00       1024   33.33
             2       1     1       127.0.0.1  6000       127.0.0.1
  6000      n1d2 1000.00       1024   33.33
             3       1     1       127.0.0.1  6000       127.0.0.1
  6000      n1d3 1000.00          0 -100.00

Now it’s time to rebalance the object ring to assign partitions to the device. Rebalancing can be done by issuing the following command:

root@swift01:/etc/swift# swift-ring-builder object.builder rebalance
Reassigned 768 (75.00%) partitions. Balance is now 0.00.

After rebalancing, if you rerun the swift-ring-builder object.builder command, you should see something similar to:

root@swift01:/etc/swift# swift-ring-builder object.builder
object.builder, build version 5
1024 partitions, 3.000000 replicas, 1 regions, 1 zones, 4 devices, 0.00
balance
The minimum number of hours before a partition can be reassigned is 1
Devices:    id  region  zone      ip address  port  replication ip
replication port      name weight  partitions balance meta
             0       1     1       127.0.0.1  6000       127.0.0.1
  6000      n1d0 1000.00        768    0.00
             1       1     1       127.0.0.1  6000       127.0.0.1
  6000      n1d1 1000.00        768    0.00
             2       1     1       127.0.0.1  6000       127.0.0.1
  6000      n1d2 1000.00        768    0.00
             3       1     1       127.0.0.1  6000       127.0.0.1
  6000      n1d3 1000.00        768    0.00

Notice how the partitions are now evenly spread across all the devices and the balance is 0.00.

Adding disks gradually

After adding one disk immediately, as in our earlier example, the swift01 node and its object ring now have four disks, mapped to devices 0-3. In the following output you can see that we have added a fifth disk, device 4 (d4). Our previous operation added a disk with 100% of its capacity. The next step will be to add a disk to a node with a weight of 100, or 10% of its total capacity weight of 1,000 (1 TB). You can see that partitions have also been assigned to the disk. The final columns show that the partitions roughly correspond to 10%, compared to the other drives with a total weight of 1,000.

root@swift01:/etc/swift# swift-ring-builder object.builder
object.builder, build version 5
1024 partitions, 3.000000 replicas, 1 regions, 1 zones, 5 devices, 1.24
balance
The minimum number of hours before a partition can be reassigned is 1
Devices:    id  region  zone      ip address  port  replication ip
replication port      name weight  partitions balance meta
             0       1     2       127.0.0.1  6000       127.0.0.1
  6000      n1d0 1000.00        750    0.10
             1       1     2       127.0.0.1  6000       127.0.0.1
  6000      n1d1 1000.00        749   -0.04
             2       1     2       127.0.0.1  6000       127.0.0.1
  6000      n1d2 1000.00        749   -0.04
             3       1     2       127.0.0.1  6000       127.0.0.1
  6000      n1d3 1000.00        750    0.10
             4       1     2       127.0.0.1  6000       127.0.0.1
  6000      n1d4  100.00         74   -1.24

The distribution of partitions across disks nicely illustrates how Swift uses partitions to distribute data across the cluster and how disks can be of different sizes, allowing Swift to proportionally fill up disks based on their size and the number of partitions on each disk.

Subsequently, if you later want to increase the weight of the fifth disk (ID 4, n1d4) on node 1 in the system from 100 to 500, you would run the following command:

root@swift01:/etc/swift# swift-ring-builder object.builder set_weight d4 500
d4r1z2-127.0.0.1:6000R127.0.0.1:6000/n1d4_"" weight set to 500.0

After you run the set_weight command, if you look in the weight column, you will see that weight of d4 has now been increased to 500.0:

root@swift01:/etc/swift# swift-ring-builder object.builder
object.builder, build version 8
1024 partitions, 3.000000 replicas, 1 regions, 1 zones, 5 devices, 78.32
balance
The minimum number of hours before a partition can be reassigned is 1
Devices:    id  region  zone      ip address  port  replication ip
replication port      name weight  partitions balance meta
             0       1     2       127.0.0.1  6000       127.0.0.1
  6000      n1d0 1000.00        750    9.86
             1       1     2       127.0.0.1  6000       127.0.0.1
  6000      n1d1 1000.00        749    9.72
             2       1     2       127.0.0.1  6000       127.0.0.1
  6000      n1d2 1000.00        749    9.72
             3       1     2       127.0.0.1  6000       127.0.0.1
  6000      n1d3 1000.00        750    9.86
             4       1     2       127.0.0.1  6000       127.0.0.1
  6000      n1d4  500.00         74  -78.32

However, due to the min_part_hours limit described in The create command, it might not be possible to rebalance the cluster immediately. If so, you will see a message similar to this:

root@swift01:/etc/swift# swift-ring-builder object.builder rebalance
No partitions could be reassigned.
Either none need to be or none can be due to min_part_hours [1].

If you wait a bit and later run swift-ring-builder again, you will see that d4 has new partitions on it and that the balance is very close to 0, which is a good sign:

root@swift01:/etc/swift# swift-ring-builder object.builder
object.builder, build version 8
1024 partitions, 3.000000 replicas, 1 regions, 1 zones, 5 devices, 0.20
balance
The minimum number of hours before a partition can be reassigned is 1
Devices:    id  region  zone      ip address  port  replication ip
replication port      name weight  partitions balance meta
             0       1     1       127.0.0.1  6000       127.0.0.1
  6000      n1d0 1000.00        683    0.05
             1       1     1       127.0.0.1  6000       127.0.0.1
  6000      n1d1 1000.00        683    0.05
             2       1     1       127.0.0.1  6000       127.0.0.1
  6000      n1d2 1000.00        682   -0.10
             3       1     1       127.0.0.1  6000       127.0.0.1
  6000      n1d3 1000.00        682   -0.10
             4       1     1       127.0.0.1  6000       127.0.0.1
  6000      n1d4  500.00        342    0.20

To ensure the new disk’s capacity is fully used, you would need to repeat the steps just shown until d4 reaches a weight of 1,000, or 100%, of its capacity. At this point, finally, the ring is balanced and has an equal amount of partitions across all the 1 TB disks in the node:

root@swift01:/etc/swift# swift-ring-builder object.builder
object.builder, build version 10
1024 partitions, 3.000000 replicas, 1 regions, 1 zones, 5 devices, 0.10
balance
The minimum number of hours before a partition can be reassigned is 1
Devices:    id  region  zone      ip address  port  replication ip
replication port      name weight  partitions balance meta
             0       1     1       127.0.0.1  6000       127.0.0.1
  6000      n1d0 1000.00        615    0.10
             1       1     1       127.0.0.1  6000       127.0.0.1
  6000      n1d1 1000.00        615    0.10
             2       1     1       127.0.0.1  6000       127.0.0.1
  6000      n1d2 1000.00        614   -0.07
             3       1     1       127.0.0.1  6000       127.0.0.1
  6000      n1d3 1000.00        614   -0.07
             4       1     1       127.0.0.1  6000       127.0.0.1
  6000      n1d4 1000.00        614   -0.07

Adding Nodes

Adding nodes is essentially the same process as adding disks, except because disks are now added to the cluster through another node, the IP addresses will be different. To illustrate how we go about adding a node, we will start with the same example. Again, let’s use the swift-ring-builder object.builder command to take a look at our starting point: a single node cluster with three disks.

root@swift01:/etc/swift# swift-ring-builder object.builder
object.builder, build version 3
1024 partitions, 3.000000 replicas, 1 regions, 1 zones, 3 devices, 0.00
balance
The minimum number of hours before a partition can be reassigned is 1
Devices:    id  region  zone      ip address  port  replication ip
replication port      name weight  partitions balance meta
             0       1     1     192.168.1.1  6000     192.168.1.1
  6000      n1d0 1000.00       1024    0.00
             1       1     1     192.168.1.1  6000     192.168.1.1
  6000      n1d1 1000.00       1024    0.00
             2       1     1     192.168.1.1  6000     192.168.1.1
  6000      n1d2 1000.00       1024    0.00

Adding a node immediately

Just as before, adding nodes can be done in two ways: immediately or gradually. We will start by adding them immediately. However, it is worth reiterating that adding nodes immediately, especially if you are adding relatively large amounts of capacity, might trigger a significant amount of data to be moved from existing nodes and disks to the newly added ones, with negative impacts on cluster performance and thus the experience of the end users.

Therefore, if you are planning on adding nodes with many terabytes of new disks, consider adding the nodes gradually. There is no hard-and-fast rule for when to add capacity immediately or gradually, but as a general guideline, if you are increasing cluster capacity by more than 20%, you should probably consider doing it gradually. Conversely, if you are adding less than 20% of capacity, adding a node (or nodes) immediately is usually safe.

In the following example we now have two nodes:

192.168.1.1 (original node)
192.168.1.2 (new node)

You already know how to use the swift-ring-builder object.builder add command. The following example starts after we have already run that command to add the new node and the disks in it as devices. To make it clear which node contains the disks, we have named the new disks n2d0 through n2d2. As you can see in the last three lines of the following output (devices 3-5), the weight, partitions, and balance columns have been added but have yet to be balanced in the cluster:

root@swift01:/etc/swift# swift-ring-builder object.builder
object.builder, build version 6
1024 partitions, 3.000000 replicas, 1 regions, 1 zones, 6 devices, 100.00
balance
The minimum number of hours before a partition can be reassigned is 1
Devices:    id  region  zone      ip address  port  replication ip
replication port      name weight  partitions balance meta
             0       1     1     192.168.1.1  6000     192.168.1.1
  6000      n1d0 1000.00       1024  100.00
             1       1     1     192.168.1.1  6000     192.168.1.1
  6000      n1d1 1000.00       1024  100.00
             2       1     1     192.168.1.1  6000     192.168.1.1
  6000      n1d2 1000.00       1024  100.00
             3       1     1     192.168.1.2  6000     192.168.1.2
  6000      n2d0 1000.00          0 -100.00
             4       1     1     192.168.1.2  6000     192.168.1.2
  6000      n2d1 1000.00          0 -100.00
             5       1     1     192.168.1.2  6000     192.168.1.2
  6000      n2d2 1000.00          0 -100.00

Now let’s go ahead and start rebalancing the cluster, effectively reassigning partitions to the new disks in the second node, by running the rebalance command:

root@swift01:/etc/swift# swift-ring-builder object.builder rebalance
Reassigned 1024 (100.00%) partitions. Balance is now 37.70.

Note

Balance of 37.70 indicates you should push this ring, wait at least 1 hour, and rebalance/repush.

In this case, we are doubling the capacity of the cluster by adding three more disks to the existing three. With more disk capacity added, Swift can’t safely redistribute all the partitions at once across the cluster, so we end up with a partial rebalance on the first rebalance try. Therefore, to fully balance the cluster, it will be necessary to perform another rebalance, pushing out the rings again:

root@swift01:/etc/swift# swift-ring-builder object.builder
object.builder, build version 7
1024 partitions, 3.000000 replicas, 1 regions, 1 zones, 6 devices, 37.70 balance
The minimum number of hours before a partition can be reassigned is 1
Devices:    id  region  zone      ip address  port  replication ip  replication
port      name weight  partitions balance meta
             0       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d0 1000.00        669   30.66
             1       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d1 1000.00        705   37.70
             2       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d2 1000.00        674   31.64
             3       1     1     192.168.1.2  6000     192.168.1.2
6000      n2d0 1000.00        341  -33.40
             4       1     1     192.168.1.2  6000     192.168.1.2
6000      n2d1 1000.00        341  -33.40
             5       1     1     192.168.1.2  6000     192.168.1.2
6000      n2d2 1000.00        342  -33.20

Now that we have the object ring rebalanced with the three new disks, it’s also necessary to distribute the rings to all nodes in the system. Remember, all the nodes must have the same rings, or Swift won’t operate correctly.

After having given the cluster an hour or more (as suggested earlier) to replicate data to the new partitions on the new disks, it’s time to try another rebalance of the object ring. So we repeat the swift-ring-builder object.builder rebalance step:

root@swift01:/etc/swift# swift-ring-builder object.builder rebalance
Reassigned 512 (50.00%) partitions. Balance is now 0.20.

The line of output gives us what we need, and looking at the builder file for the object ring, it should now be similar to:

root@swift01:/etc/swift# swift-ring-builder object.builder
object.builder, build version 8
1024 partitions, 3.000000 replicas, 1 regions, 1 zones, 6 devices, 0.20 balance
The minimum number of hours before a partition can be reassigned is 1
Devices:    id  region  zone      ip address  port  replication ip  replication
port      name weight  partitions balance meta
             0       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d0 1000.00        512    0.00
             1       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d1 1000.00        512    0.00
             2       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d2 1000.00        512    0.00
             3       1     1     192.168.1.2  6000     192.168.1.2
6000      n2d0 1000.00        512    0.00
             4       1     1     192.168.1.2  6000     192.168.1.2
6000      n2d1 1000.00        511   -0.20
             5       1     1     192.168.1.2  6000     192.168.1.2
6000      n2d2 1000.00        513    0.20

Don’t forget that we’re still not 100% ready. You still need to distribute the new rings to all the nodes. Once you have the new object ring (in this example) distributed to both nodes, your small cluster should be balancing itself within a matter of hours. In larger clusters, the redistribution of data in the cluster can take longer. If you’re curious about the process and like to see what’s going on under the hood, you can always run the Swift logs on your nodes through the tail command to watch the replicator, auditor, and other processes do their work.

Adding a node gradually

By now this should all seem very familiar to you. Adding capacity, whether nodes or just single disks, is very similar. Following the previous examples, we will add capacity by adding a new node gradually. Again, adding nodes gradually is typically the recommended way to add capacity to a cluster in production, because if too much of the cluster’s resources, CPU, RAM, input/output operations per second (IOPS), or networking get tied up in trying to replicate data as fast as possible to the new disks, users might experience a performance slowdown.

In our first round of adding the new node and its new disks to the cluster, we will add 10% of the capacity of devices 3, 4, and 5:

root@swift01:/etc/swift# swift-ring-builder object.builder
object.builder, build version 6
1024 partitions, 3.000000 replicas, 1 regions, 1 zones, 6 devices, 267.38 balance
The minimum number of hours before a partition can be reassigned is 1
Devices:    id  region  zone      ip address  port  replication ip  replication
port      name weight  partitions balance meta
             0       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d0 1000.00        683  -26.63
             1       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d1 1000.00        683  -26.63
             2       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d2 1000.00        682  -26.74
             3       1     1     192.168.1.2  6000     192.168.1.2
6000      n2d0  100.00        342  267.38
             4       1     1     192.168.1.2  6000     192.168.1.2
6000      n2d1  100.00        341  266.31
             5       1     1     192.168.1.2  6000     192.168.1.2
6000      n2d2  100.00        341  266.31

The next step is to distribute the rings to all the nodes and wait for the replicators to move data around to the new disks and settle down. Then we go ahead and add another 10% of the capacity of devices 3, 4, and 5, for a total of 20% (or weight 200):

root@swift01:/etc/swift# swift-ring-builder object.builder
object.builder, build version 10
1024 partitions, 3.000000 replicas, 1 regions, 1 zones, 6 devices, 100.39 balance
The minimum number of hours before a partition can be reassigned is 1
Devices:    id  region  zone      ip address  port  replication ip  replication
port      name weight  partitions balance meta
             0       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d0 1000.00        683  -19.96
             1       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d1 1000.00        683  -19.96
             2       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d2 1000.00        682  -20.08
             3       1     1     192.168.1.2  6000     192.168.1.2
6000      n2d0  200.00        342  100.39
             4       1     1     192.168.1.2  6000     192.168.1.2
6000      n2d1  200.00        341   99.80
             5       1     1     192.168.1.2  6000     192.168.1.2
6000      n2d2  200.00        341   99.80

Again, we distribute the new rings to all nodes. We repeat this process until eventually, we apply the full weight (1,000) to use 100% of the new disks’ capacity in the cluster:

root@swift01:/etc/swift# swift-ring-builder object.builder
object.builder, build version 14
1024 partitions, 3.000000 replicas, 1 regions, 1 zones, 6 devices, 0.20 balance
The minimum number of hours before a partition can be reassigned is 1
Devices:    id  region  zone      ip address  port  replication ip  replication
port      name weight  partitions balance meta
             0       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d0 1000.00        512    0.00
             1       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d1 1000.00        512    0.00
             2       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d2 1000.00        512    0.00
             3       1     1     192.168.1.2  6000     192.168.1.2
6000      n2d0 1000.00        513    0.20
             4       1     1     192.168.1.2  6000     192.168.1.2
6000      n2d1 1000.00        511   -0.20
             5       1     1     192.168.1.2  6000     192.168.1.2
6000      n2d2 1000.00        512    0.00

Then we do our final distribution of the rings.

Of course, this full process will need to be done every time capacity is added to a cluster.

Removing Capacity

Capacity removal is the opposite of adding capacity. It can be done either immediately or gradually, just as when adding capacity. In this section, we will demonstrate how to remove an entire node from the cluster.

There are many reasons why you would remove capacity, including:

You simply don’t need the capacity you currently have in the cluster anymore.
Planned obsolescence: A node is being replaced by another node with newer and better hardware.
The disks in a node are being upgraded with larger disks in order to increase cluster capacity, while still maintaining the same physical hardware footprint.

One critical thing you need to consider when removing capacity from a cluster is how much capacity you will have left once capacity has been removed. If you accidentally remove too much capacity, so that the remaining disks in the cluster are entirely full, you will create problems, as described in What to Avoid. This will make it hard to operate the cluster until you have added additional capacity again. Always keep this in mind when you are removing capacity, as not only could you end up creating a lot of extra work for yourself, but you can also risk severe problems with the end-user experience trying to use the cluster.

With that warning out of the way, let’s continue learning how to remove by capacity. These are the steps we will go through:

Removing nodes
Removing disks
- Removing disks immediately
- Removing disks gradually

Removing Nodes

In this example, the initial object ring consists of a total of six disks in two different nodes, with three disks in each node:

root@swift01:/etc/swift# swift-ring-builder object.builder
object.builder, build version 14
1024 partitions, 3.000000 replicas, 1 regions, 1 zones, 6 devices, 0.20 balance
The minimum number of hours before a partition can be reassigned is 1
Devices:    id  region  zone      ip address  port  replication ip  replication
port      name weight  partitions balance meta
             0       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d0 1000.00        512    0.00
             1       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d1 1000.00        512    0.00
             2       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d2 1000.00        512    0.00
             3       1     1     192.168.1.2  6000     192.168.1.2
6000      n2d0 1000.00        513    0.20
             4       1     1     192.168.1.2  6000     192.168.1.2
6000      n2d1 1000.00        511   -0.20
             5       1     1     192.168.1.2  6000     192.168.1.2
6000      n2d2 1000.00        512    0.00

Removing a node is really the same as removing all disks on it. Hence, to remove node 2 (192.168.1.2), we will need to remove all disks on it from the ring. Disks can be removed one by one, using the swift-ring-builder object.builder remove command. Note that the remove command takes the device ID as the parameter indicating which disk (device) to remove:

root@swift01:/etc/swift# swift-ring-builder object.builder remove d3
d3r1z1-192.168.1.2:6000R192.168.1.2:6000/n2d0_"" marked for removal and will be
removed next rebalance.

root@swift01:/etc/swift# swift-ring-builder object.builder remove d4
d4r1z1-192.168.1.2:6000R192.168.1.2:6000/n2d1_"" marked for removal and will be
removed next rebalance.

root@swift01:/etc/swift# swift-ring-builder object.builder remove d5
d5r1z1-192.168.1.2:6000R192.168.1.2:6000/n2d2_"" marked for removal and will be
removed next rebalance.

As usual, once the ring builder commands have been issued, it’s time to rebalance the ring:

root@swift01:/etc/swift# swift-ring-builder object.builder rebalance
Reassigned 1024 (100.00%) partitions. Balance is now 0.00.
root@swift01:/etc/swift# swift-ring-builder object.builder
object.builder, build version 18
1024 partitions, 3.000000 replicas, 1 regions, 1 zones, 3 devices, 0.00 balance
The minimum number of hours before a partition can be reassigned is 1
Devices:    id  region  zone      ip address  port  replication ip  replication
port      name weight  partitions balance meta
             0       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d0 1000.00       1024    0.00
             1       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d1 1000.00       1024    0.00
             2       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d2 1000.00       1024    0.00

If everything looks good, we need to distribute the rings to all the nodes in the cluster. Obviously, since no devices are left on node 2 anymore, updating the rings on node 2 isn’t strictly necessary. However, if your reason for removing the disks in node 2 is to upgrade the node from the 1 TB disks it was using to 3 TB disks, you could still update the rings on node 2 in order to make sure it has the latest rings for the cluster. On the other hand, if your intention is to not use node 2 again, you could skip updating the rings on it and simply turn the node off.

Removing Disks

In the next two examples we are starting with a single node that has four disks in the object ring, device IDs 0-3:

object.builder, build version 20
1024 partitions, 3.000000 replicas, 1 regions, 1 zones, 4 devices, 0.00 balance
The minimum number of hours before a partition can be reassigned is 1
Devices:    id  region  zone      ip address  port  replication ip  replication
port      name weight  partitions balance meta
             0       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d0 1000.00        768    0.00
             1       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d1 1000.00        768    0.00
             2       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d2 1000.00        768    0.00
             3       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d3 1000.00        768    0.00

Removing disks immediately

To remove device d3, type in the following command:

root@swift01:/etc/swift# swift-ring-builder object.builder remove d3
d3r1z1-192.168.1.1:6000R192.168.1.1:6000/n1d3_"" marked for removal and will be
removed next rebalance.

Inspecting the object ring, it should now have only three devices left in it:

root@swift01:/etc/swift# swift-ring-builder object.builder
object.builder, build version 22
1024 partitions, 3.000000 replicas, 1 regions, 1 zones, 3 devices, 0.00 balance
The minimum number of hours before a partition can be reassigned is 1
Devices:    id  region  zone      ip address  port  replication ip  replication
port      name weight  partitions balance meta
             0       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d0 1000.00       1024    0.00
             1       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d1 1000.00       1024    0.00
             2       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d2 1000.00       1024    0.00

Go ahead and push the new object ring out to your node. Once the new ring has been applied, the auditors and replicators will move data around to ensure that three replicas of every object are stored on the remaining disks in the cluster. In this particular example, because of the as-unique-as-possible principle of Swift, each of the three disks in our single-node cluster will hold one copy of each object.

Removing disks gradually

In our final capacity adjustment example, we will perform a gradual disk removal. We will begin by setting the weight to 500, reducing the capacity of device 3 by 50%, which will drain data off the disk and move it to other disks in the system. To do so, use the following command to reduce the weight of d3 to 500 (half its original weight of 1,000):

root@swift01:/etc/swift# swift-ring-builder object.builder set_weight d3 500
d3r1z1-127.0.0.1:6000R127.0.0.1:6000/n1d3_"" weight set to 500.0

Running the swift-ring-builder object.builder command should display something like:

object.builder, build version 27
1024 partitions, 3.000000 replicas, 1 regions, 1 zones, 4 devices, 1.22 balance
The minimum number of hours before a partition can be reassigned is 1
Devices:    id  region  zone      ip address  port  replication ip  replication
port      name weight  partitions balance meta
             0       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d0 1000.00        881    0.37
             1       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d1 1000.00        867   -1.22
             2       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d2 1000.00        881    0.37
             3       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d3  500.00        443    0.94

Rebalance and push the new object ring to the cluster and wait for the replicators to do their job. When the data has been redistributed to the other drives, repeat the set_weight command to lower the weight of device 3 to 0. The object.builder command should then show a ring like this:

object.builder, build version 30
1024 partitions, 3.000000 replicas, 1 regions, 1 zones, 4 devices, 0.00 balance
The minimum number of hours before a partition can be reassigned is 1
Devices:    id  region  zone      ip address  port  replication ip  replication
port      name weight  partitions balance meta
             0       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d0 1000.00       1024    0.00
             1       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d1 1000.00       1024    0.00
             2       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d2 1000.00       1024    0.00
             3       1     1     192.168.1.1  6000     192.168.1.1
6000      n1d3    0.00          0    0.00

To apply the changes, distribute the newly created object ring and let the cluster do its job.

Finally, remove the device from the ring completely using the remove command. At this point there should be no data left on the disk, as all of it has been moved to other object disks:

root@swift01:/etc/swift# swift-ring-builder object.builder remove d3
d3r1z1-192.168.1.1:6000R192.168.1.1:6000/n1d3_"" marked for removal and will be
removed next rebalance.

Run the swift-ring-builder object.builder rebalance command and push out the object ring, which now doesn’t include the d3 disk.

Managing Capacity Additions with SwiftStack

SwiftStack lets you effortlessly manage your cluster and its rings in a seamless and structured way, without all the pitfalls of manual ring management.

As you have probably come to appreciate by this point, doing capacity adjustments manually can be tedious and prone to human error. If you make mistakes, you can misconfigure your cluster and it can start misbehaving. It is one thing to stand up a cluster the first time and make the occasional capacity addition or removal. System administrators with relatively little Linux and Swift experience can typically read tutorials, copy and paste, and get a Swift cluster up and running at a basic level. But that will go only so far.

Consequently, automating capacity adjustments is an obvious choice for anything more than just a few Swift nodes. After all, Swift is intended to scale out to tens, maybe hundreds or thousands of nodes. Additionally, beyond the original setup of a Swift cluster, managing a cluster on a daily basis and through its lifecycle, including capacity adjustments, troubleshooting, middleware configuration, planned hardware obsolescence, institutional memory, and operator turnover, is not a trivial task. At SwiftStack, we have built a lot of our Swift tooling and system lifecycle processes around what has been covered in this chapter.

In the following sections, we will revisit capacity adjustments, this time seeing how the SwiftStack Controller manages adding and removing drives and nodes.

Adding Capacity

When you add new capacity to a SwiftStack cluster, such as additional disk drives, the SwiftStack Controller automates the process by:

Detecting new devices that are added to a node
Labeling drives with a unique label, so they can be mounted, unmounted, and remounted without losing their identity in the process
Mounting the drives
Formatting the drives with the XFS filesystem
Adding the drives to the ring
Deploying the new ring to all cluster nodes

Although the SwiftStack Controller automates the process of adding additional capacity, the operator still needs to manage and plan for capacity additions, and pay attention to how full existing drives are. A good rule of thumb is to keep at least 10-20% free on all drives to keep enough headroom for future capacity needs and provide ample time to order and install additional drives or nodes as capacity requirements increase.

During capacity adjustments, it’s important to note that other configuration changes that require a configuration push will interrupt an ongoing gradual add or remove cycle and set it back one cycle. This is in itself not a problem, but it means that making other cluster configuration changes while a capacity adjustment is in progress will cause the capacity adjustment to take longer to complete than it otherwise would.

Adding Drives

To add additional drives to an existing node in a SwiftStack cluster, install the physical drives in the node and follow these steps:

In the SwiftStack Controller, go to Configure Cluster and choose Edit for the node to which you have added the drives. Note that the SwiftStack Controller will automatically detect additional drives in the cluster.
Add drives on the node to your cluster through either the Add Gradually or Add Now button.
Select Change.
Lastly, use the Deploy Config to Cluster button to apply the changes.

With SwiftStack, you can automatically add additional capacity, such as new drives, into your existing cluster with two different options: Add Gradually and Add Now. With gradual capacity additions, the SwiftStack Controller will slowly increment the weights on each new device, rebalance the rings, and safely distribute them out to all the nodes. The SwiftStack Controller will also track information about replication activity so it knows when to do the next increment. On the node monitoring page, you can track the percent complete for each device.

When you select Add Now, the SwiftStack Controller adds the new cluster resources immediately, which will result in instant replication traffic between the nodes to evenly redistribute the data across the cluster as fast as possible. Immediately adding drives is a valid method when you have a new cluster, need to add capacity quickly, or are adding only a few disks or nodes to a relatively large cluster with much more capacity than what you are adding.

To determine whether Add Gradually or Add Now is most appropriate, use the following guidelines as a starting point:

If you’re adding more than 20% capacity is, Add Gradually is probably most appropriate.
If you’re adding less than 20% capacity, Add Now might be more appropriate.

This, however, varies based on:

The actual size of the cluster
The percentage added
The cluster workload

For a cluster with more than a few dozen drives, it’s usually acceptable to use the Add Now feature. When adding a new node, however, it might be more appropriate to Add Gradually.

Adding a Node

To add a new node to an existing SwiftStack cluster, install the SwiftStack Node software on the node as described in the latter half of Chapter 10.

To determine whether you should Add Gradually or Add Immediately and for a reminder about how you can monitor the process, see Adding Drives.

The SwiftStack Controller will add the newly available capacity to the cluster.

Removing Capacity

SwiftStack also makes it easy to remove capacity from a cluster, which you’ll need to do when you want to upgrade to larger drives, swap out older drives, or replace a whole node. The process for removing capacity is similar to adding capacity. But instead of Add Gradually, you can Remove Gradually and then wait until all data has been removed from the disk(s) or node(s) so you can remove them from the cluster.

Don’t forget to deploy the new configurations to the cluster.

Removing a Node

The following procedures can be used when it is necessary to remove a node from a cluster. You might need to remove a node when upgrading hardware; when you’ve experienced hardware failure; or when you’re conducting operational and failure testing of a SwiftStack cluster. When conducting operational and failure testing, these procedures will enable you to safely simulate node and disk failures without potentially damaging the hardware by physically removing disks or forcing servers to shut down through a hard power-off event.

To safely remove a node from a SwiftStack cluster, do the following:

Gracefully shut down the node by issuing sudo shutdown -h now. The node will shut down and the SwiftStack Controller will show the node as unreachable.
Note
When a node is down, Swift assumes that the disks are still healthy and that the data on those disks is recoverable. Consequently, if the node is powered back on, Swift will simply bring the node back into the cluster and will start syncing any new data to the node.
Delete the node from the SwiftStack Controller’s GUI. When the node is deleted from the controller, a ring push will be initiated, which will completely remove the node from the cluster.

Removing a Disk

There are times when you will need to remove a disk. For example, if a disk has failed or is having issues, or if you want to replace a disk with a larger disk to increase capacity, you’ll need to remove it. As you already know, disks can be removed in two ways: now (immediately) or gradually (incrementally). Here is what happens behind the scenes in each of the scenarios:

If a disk is removed using the Remove Now button, it is immediately removed from the cluster and the data remains on the disk. As always when using the SwiftStack Controller, the removal or addition of capacity will trigger new rings to be built and a configuration push to be initiated.
If a disk is removed through Remove Gradually, data will be slowly removed from the drive and transferred to other disks in the cluster. The cluster will try to remove 25 GB per hour from the disk. Thus, removing data completely from a drive with 2 TB of data on it will take approximately 80 hours, or close to 3.5 days. However, if the cluster is under heavy load or busy with other processes, it might drain less than 25 GB of data per hour, which would make the total removal process slower. In the meantime, every hour there’s a new ring pushed to the cluster, which triggers a rebalancing of the disks in the cluster.

If for any reason a configuration push cannot be completed and the ring cannot be updated, the gradual removal of data will be interrupted. For example, if a node goes down and is not repaired and reinstated or removed from the cluster, the ring updating cannot continue until the cluster is healthy again. The gradual removal of data will continue once a new ring can be pushed to the cluster.

To immediately remove a disk in the SwiftStack Controller, follow these instructions:

Go to the node from which you want to remove the disk.
Click Manage.
Find the disk you want to remove; for example, sde.
From the dropdown menu, select Remove Now. The disk will be instantly deleted from the cluster and a configuration push will be initiated to rebalance the cluster.

To remove a disk gradually, in the SwiftStack Controller, follow these instructions:

Go to the node from which you want to remove the disk.
Click Manage.
Find the disk you want to remove; for example, sde.
From the dropdown menu, select Remove Gradually.

After removing a disk from a node, if you later want to reuse the disk in the cluster, you should ensure that the data on the disk is removed so that it looks like a new disk to Swift. One way of removing all data is to simply format the disk.

Monitoring Your Cluster

A Swift cluster has many moving parts with many daemons running in a distributed, shared-nothing fashion across nodes, all working together. It is therefore important to be able to tell what’s going on inside the cluster when diagnosing issues and performance or planning capacity changes. Thus, monitoring your Swift cluster infrastructure becomes important. Most organizations have their favorite monitoring tools. Luckily, because Swift runs on top of Linux, almost all the common monitoring tools out there are relatively easy to integrate into your Swift environment.

As has been noted earlier, hardware failures in a Swift cluster are usually not as critical as they tend to be in traditional storage systems. Still, you are surely going to want to have some kind of monitoring set up. When hard drives or servers fail, which they will, you are going to want to know about it. Perhaps more important, although not directly part of Swift itself, all networking and load-balancing equipment supporting access to the Swift cluster and replication links between nodes need to be monitored.

Swift-Specific Metrics: What to Look For

When analyzing Swift monitoring data, the following are the key metrics to keep an eye on:

Async pendings

An async pending is a file created when an object is uploaded while there is contention during an update to a container listing record. If the upload backs up or the container servers gets busy, the operation will throw an error and the Object Server will write to disk a record that says, “Hey, this container needs to be updated to increment this number of bytes by one object.” That file on disk that needs to be written later is called an async pending. These are normal in a Swift cluster. It’s not a crime to have async pendings on your disk, but what you need to watch for is whether they’re accumulating. That’s a problem.

What is important to track here is the number of async pendings over time. If you’re seeing your rate of generation go way up in comparison to the rate at which they’re serviced, that would be something to look into. Perhaps there’s too much contention and the account/container records need to be distributed across more or higher-performing media.

CPU use

Load in general isn’t an interesting metric, but one basic machine stat that is important is CPU utilization. Proxy servers in particular are prone to becoming CPU-bound and causing bottlenecks in your throughput. This would surface if you underprovisioned your proxy server or a heavy workload comes its way. Another way it could surface is with a large number of requests per second that add up to relatively low volumes of data transfer (for example, lots of HEADs, PUTs, or GETs for small objects). If you have a workload like that, then you’ll get a pretty good number of requests per second and your proxy server would be CPU-bound. So watching CPU utilization with proxy servers is particularly important.

The same goes for storage nodes’ CPU use, but it’s generally less of an issue because they’ll generally get I/O-bound before they get CPU-bound. The possible exception might be an SSD-backed account/container server where your I/O capacity and latencies are so good that your CPU has a chance of becoming bottlenecked.

I/O use

Another useful metric is I/O utilization. These statistics can be tracked per drive, but rolling them up into a node and cluster view can be really handy. When you have per-drive stats, you’ll be able to see any hotspots that show up. This might happen if you have requests that are piling up on a particular account or container. In this case, you will see a huge number of concurrent PUTs to objects within the same container in the per-disk I/O.

Relocation activity

If you see spikes, that could indicate that a drive is having trouble.

Timing statistics

Timing stats are reported for each request. Each part of the request is broken down very granularly so you can pinpoint where issues arise, if they do. You can see if the request involved an account, container, or object. You can see the differences between the proxy server handling the request and the Account/Container/Object Server handling the request. Each request also gets broken down by the verb (GET/HEAD/PUT), so you can get a lot of information out of what each service is seeing.

As a Swift operator, even if you have very little knowledge about what specific clients are experiencing, all these metrics can give you a window into the latency they might be subject to and any problems they might be encountering. You can see whether clients are getting a lot of 404 errors or whether all of a sudden 500 errors pop up. If you use a monitoring system that can alert you to these kinds of problems, it will help you detect problems early, so you can address them within the Swift cluster before clients start experiencing issues.

You really want to catch any problem before your clients experience it. All of these internal metrics enable operators to gauge how clients experience the Swift cluster.

Monitoring and Logging Tools

Some of the most frequently used monitoring and logging systems with Swift are:

Monitoring
- Nagios
- Zabbix
- Ganglia
- Zenoss
Logging
- Elasticsearch (Logstash, Kibana)
- Splunk

There are surely others, too. The point here is that whatever your organization is already using or is familiar with it, will likely work well to keep track of your Swift environment. Of course, the ultimate choice of what monitoring tool or tools to use, and how they alert an operator, is up to each operations team. Setting up some form of monitoring is critical, though, because you really want to know if and when problems occur.

SwiftStack Tools

If you are using SwiftStack, you can get lots of cluster and node statistics from the SwiftStack Controller. Additionally, on nodes installed with the SwiftStack management agent, you also get Swift-specific Nagios plug-ins on every node. Although the SwiftStack Controller provides extensive monitoring and reporting, you might want to consider integrating Swift with other upstream monitoring and alerting systems. Other monitoring tools might be able to detect anomalies that the SwiftStack system won’t report on or can’t specifically pinpoint in your environment.

The Nagios plug-ins installed on each individual SwiftStack node provide the following Swift-specific information:

Unmounted SwiftStack devices
Drive capacity use
Background Swift daemons
Most recent Swift backend daemon sweep time

Using the SwiftStack Controller, you can also configure syslogging to the logging facility of your choice and redirect events to your logging tool. This is as easy as specifying the facility in the Controller configuration and providing the location of the logging server.

Lastly, if you use a monitoring tool that can receive SNMP traps, that’s also an option.

Cluster-level metrics

The SwiftStack Controller highlights the following top-level metrics in the Controller console for each cluster:

Cluster CPU use: The percentage of time that a node is using the CPU or performing disk I/O. CPU utilization is provided both for the overall cluster and for individual nodes. High rates (e.g., above 80%) are generally bad. If your CPU use is high, processes might have a harder time getting resources. This means that processes on the node will start to slow down and that it might be time to add additional nodes to the cluster.
Cluster proxy throughput: The aggregate throughput for all inbound and outbound traffic, displayed in bytes per second. This will indicate network bottlenecks.
Average node memory use: The memory usage for all nodes in your cluster. Memory is displayed as used, buffered, cached, or free.
Total cluster disk I/O: Measures the total number of input/output operations per second (IOPS) on disks for the overall cluster and for individual nodes. Disk I/O is shown for both read IOPS and write IOPS. Note that because Swift constantly guards against bit rot, the cluster will continuously read some amount of data.
Top 5 Least Free Disks: The fullest disks in the cluster. Swift distributes data across the cluster evenly, so how full your disks are will correspond to how full the overall cluster is and when you should consider adding additional capacity.

The SwiftStack Controller also makes available several other monitoring graphs for your cluster, which you can use when tuning or troubleshooting your SwiftStack cluster. For the overall cluster, these graphs include:

Total Cluster Interface Bandwidth
StatsD Statistics Per Node
Average OpenVPN Traffic
Top 4 Average Node Process Groups by RSS
object-updater sweep Timing and Count
object-replicator.partition.update Req Timing and Count
Proxy Req Timing and Count
account-replicator replication Timing and Count

Node-level metrics

For each node, the following monitoring graphs are available:

Total node disk I/O
Per-disk read throughput
Per-disk write throughput
Per-disk read IOPS
Per-disk write IOPS
Node proxy server throughput
CPU utilization (all CPUs)
Per processes group CPU usage
Account processes CPU usage
Container processes CPU usage
Object processes CPU usage
Memory utilization
Node interface bandwidth
StatsD statistics
OpenVPN traffic
Top 4 node process groups by RSS
Object replicator operations
Object-updater sweep timing and count
object-replicator.partition.update req timing and count
Proxy req timing and count
account-replicator replication timing and count

These metrics and graphics are available under the “View all” graphs for this cluster menu.

Operating with SwiftStack

We have looked at several scenarios for adding and removing capacity and we have covered the best monitoring practices.

At this point, you should have a good understanding how Swift works and can and should be maintained, managed, and monitored.

Perhaps the greatest lesson is that although Swift is based on and relies on many common Linux packages and system administrator best practices, Swift at scale isn’t necessarily easy to manage and maintain. There are many server and procedural processes to keep track of to make Swift run smoothly. At SwiftStack, we built the SwiftStack Controller to streamline the day-to-day tasks involved in running a Swift cluster. Using the SwiftStack Controller makes it significantly easier, less risky, and more cost effective to run Swift clusters. Some of the highlights of the Swift Controller are:

SwiftStack makes the process of adding and removing capacity easy and highly automated.
The SwiftStack Controller tracks not only server-level metrics such as CPU utilization, load, memory consumption, and disk usage, but also hundreds of Swift-specific metrics to understand what the different daemons are doing on each server. This helps the operator answer questions such as, “What’s the volume of object replication on node8?”, “How long is it taking?”, “Are there errors? If so, when did they happen?”
The SwiftStack Controller collects and stores monitoring data for over 500 metrics for each node in your SwiftStack cluster. A subset of these are reported in the SwiftStack Controller so you can get a bird’s eye view of your cluster performance, with options to drill down into specific metrics for tuning and troubleshooting.
SwiftStack Nodes include StatsD, a simple statistics daemon. To avoid the problems inherent with middleware-based monitoring and after-the-fact log processing, StatsD metrics are integrated into Swift itself. With StatsD, metrics are sent in real-time from the nodes to the SwiftStack Controller. The overhead of sending a metric is extremely low: a one-UDP packet transmission.

Conclusion

This chapter covered how to best manage capacity in your Swift cluster both through Swift and SwiftStack; options for monitoring a cluster to prevent problems for your clients or with your data; and how to use SwiftStack tools and metrics to manage and maintain a cluster. In upcoming chapters we’ll move to failure handling, testing, and benchmarking and tuning.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 15. Operating a Swift Cluster

Create new playlist

Sign In

Sign Up

Chapter 15. Operating a Swift Cluster

Martin Lanner

Operational Considerations

How Swift Distributes Data

Keeping Track of the Rings and Builder Files

Managing Capacity

What to Avoid

Adding Capacity

Existing Cluster: Initial Ring on Node

Adding disks immediately

Adding disks gradually

Adding Nodes

Adding a node immediately

Note

Adding a node gradually

Removing Capacity

Removing Nodes

Removing Disks

Removing disks immediately

Removing disks gradually

Managing Capacity Additions with SwiftStack

Adding Capacity

Adding Drives

Adding a Node

Removing Capacity

Removing a Node

Note

Removing a Disk

Monitoring Your Cluster

Swift-Specific Metrics: What to Look For

Monitoring and Logging Tools

SwiftStack Tools

Cluster-level metrics

Node-level metrics

Operating with SwiftStack

Conclusion

Table of Contents for
15. Operating a Swift Cluster