5.3. Techniques with Multi-zone Disk Drives

Numerous studies [12], [126], [176], [13], [67], [76] have described techniques in support of a hiccup-free display assuming a homogeneous collection of single-zone disk drives. Single-zone disk drives provide a constant transfer rate. A multi-zone disk drive consists of a number of regions (termed zones) that provide a different storage capacity and transfer rate. For example, the Seagate Barracuda 18 provides 18 gigabyte of storage and consists of 9 zones (see Table 5.11). To the best of our knowledge, there are only four techniques in support of hiccup-free display with multi-zone disk drives: IBM's Logical Track [166], Hewlett Packard's Track Pairing [15] and USC's FIXB [65] and deadline driven [68] techniques. Studies that investigate stochastic analytical models in support of admission control with multi-zone disks, e.g., [119], are orthogonal because they investigate only admission control (while the above four techniques describe how the disk bandwidth should be scheduled, the block size for each object, and admission control). Moreover, we are not aware of a single study that investigates hiccup-free display using a heterogeneous collection of multi-zone disk drives.

Table 5.11. Three different Seagate disk models and their zone characteristics.
 Seagate Barracuda 4LP Introduced in 1994, 2 GBytes capacity, with a $1,200 price tag
Zone idSize (MB)Track Size (MB)No. of TracksRate (MB/s)
0506.70.0908557910.90
1518.30.0903573710.84
2164.10.0864189810.37
3134.50.083016209.96
4116.40.079614619.55
5121.10.076715799.20
6119.80.072316578.67
7103.20.068814988.26
8101.30.065915367.91
992.00.061514957.38
1084.60.058114556.97
 Seagate Cheetah 4LP Introduced in 1996, 4 GBytes capacity, with a $1,100 price tag
Zone idSize (MB)Track Size (MB)No. of TracksRate (MB/s)
01017.80.08761161714.65
1801.60.0840954014.05
2745.90.0791942913.23
3552.60.0745741012.47
4490.50.0697704011.65
5411.40.0651631710.89
6319.60.058954319.84
 Seagate Barracuda 18 Introduced in 1998, 18 GBytes capacity, with a $900 price tag
Zone idSize (MB)Track Size (MB)No. of TracksRate (MB/s)
Id(MB)Size (MB)Tracks(MB/s)
057620.12684542915.22
117430.12141435514.57
216580.11571433413.88
315980.11081441813.30
414890.10421429412.50
514210.09901435311.88
613000.09231409211.07
712680.08671463010.40
811260.0807139589.68

This section extends the four techniques to a heterogeneous disk subsystem. While these extensions are novel and a contribution in their own right, we believe that the primary contribution of this study is the performance comparison of these techniques and quantification of their tradeoff. This is because three of the described techniques assume certain characteristics about the target platform. Our performance results enable a system designer to evaluate the appropriateness of a technique in order to decide whether it is worthwhile to refine its design by eliminating its assumptions.

The rest of this section is organized as follows. Section 5.3.1 introduces five hiccup-free display techniques for a heterogeneous disk subsystem: two for IBM's Logical Track (termed OLT1 and OLT2), and one for each of the other techniques. Section 5.3.8 quantifies the performance tradeoff associated with these techniques. Our results demonstrate tradeoffs between cost per simultaneous stream supported by a technique, its startup latency, throughput, and the amount of disk space that it wastes. For example, while USC's FIXB results in the best cost/performance ratio, the potential maximum latency incurred by each user is significantly larger than the other techniques. The choice of a technique is application dependent: one must analyze the requirements of an application and choose a technique accordingly. For example, with nonlinear editing systems, the deadline driven technique is more desirable than the others because it minimizes the latency incurred by each users [97].

5.3.1. Five Techniques

In order to describe the alternative techniques, we assume a configuration consisting of K disk models: D0, D1 ..., DK-1. There are qi disks belonging to disk model i: , , ..., A disk drive of model Di consists of mi zones. To illustrate, Figure 5.17 shows a configuration consisting of two disk models D0 and D1 (K=2). There are two disks of each model (q0=q1=2), numbered , , , .Disks of model 0 consist of 2 zones (m0=2) while those of model 1 consist of 3 zones (m1=3). Zone j of a disk (say ) is denoted as Zj (). Figure 5.17 shows a total of 10 zones for the 4 disk drives and their unique indexes. The kth physical track of a specific zone is indexed as PTk(Zj()).

Figure 5.17. OLT1


We use the set notation, { : }, to refer to a collection of tracks from different zones of several disk drives. This notation specifies a variable before the colon and, the properties that each instance of the variable must satisfy after the colon. For example, to refer to the first track from all zones of the disk drives that belong to disk model 0, we write: PT0(Zj()): ∀i, j where 0 ≤ j < m0 and 0 ≤ i < q0}. With the configuration of Figure 5.17, this would expand to: { PT0(Z0()), PT0(Z0()), PT0(Z1()), PT0(Z1())}

5.3.2. IBM's Logical Track [166]

This section starts with a description of this technique for a single multi-zone disk drive. Subsequently, we introduce two variations of this technique, OLT1 and OLT2, for a heterogeneous collection of disk drives. While OLT1 constructs several logical disks from the physical disks, OLT2 provides the abstraction of only one logical disk. We describe each in turn.

With a single multi-zone disk drive, this technique constructs a logical track from each distinct zone provided by the disk drive. Conceptually, this approach provides equi-sized logical tracks with a single data transfer rate such that one can apply traditional continuous display techniques [12], [176], [13], [67], [116], [126]. With K different disk models, a naive approach would construct a logical track LTk by utilizing one track from each zone:


where 0 ≤ j < mi and 0 ≤ i < K and 0 ≤ p < qi. With this technique, the value of k is bounded by the zone with the fewest physical tracks, i.e., 0 ≤ k < Min[NT], where NT is the number of physical tracks in zone j of disk model Di. Large logical tracks result in a significant amount of memory per simultaneous display, rendering a continuous media server economically unavailable. In the next section, we describe two optimized versions of this technique that render its memory requirements reasonable.

5.3.3. Optimized Logical Track 1 (OLT1)

Assuming that a configuration consists of the same number of disks for each model[7], OLTI constructs logical disks by grouping one disk from each disk model (q logical disks). For each logical disk, it constructs a logical track consisting of one track from each physical zone of a disk drive. To illustrate, in Figure 5.17, we pair one disk from each model to form a logical disk drive. The two disks that constitute the first logical disk in Figure 5.17, i.e., disks and , consist of a different number of zones ( has 2 zones while has 3 zones). Thus, a logical track consists of 5 physical tracks, one from each zone.

[7] This technique is applicable as long as the number of disks for each model is a multiple of the model with the fewest disk drives: if min(qi) (0 ≤ i < K) denotes the model with fewest disks, then qj is a multiple of min(qi).

Logical disks appear as a homogeneous collection of disk drives with the same bandwidth. There are a number of well known techniques that can guarantee hiccup-free display given such an abstraction, see [12], [176], [13], [67], [116], [126], [175]. Briefly, given a video clip X, these techniques partition X into equi-sized blocks that are striped across the available logical disks [13], [175], [176]: one block per logical disk in a round-robin manner. A block consists of either one or several logical tracks.

Let Ti denote the time to retrieve mi tracks from a single disk of model Di consisting of mi zones: Ti = mi × (a revolution time + seek time). Then, the transfer rate of a logical track (RLT) is: RLT = i, 0i < K.

In Figure 5.17, to retrieve a LT from the first logical disk, incurs 2 revolution times and 2 seeks to retrieve two physical tracks, while disk incurs 3 revolutions and 3 seeks to retrieve three physical tracks. Assuming a revolution time of 8.33 milliseconds (7200 rpm) and the average seek time of 10 milliseconds for both disk models, requires 36.66 milliseconds (T0 = 36.66) while requires 54.99 (T1 = 54.99) milliseconds to retrieve a LT. Thus, the transfer rate of the LT is determined by disk model D1. Assuming that a LT is 1 megabyte in size, its transfer rate is = = 18.19 megabytes per second.

This example demonstrates that OLT1 wastes disk bandwidth by requiring one disk to wait for another to complete its physical track retrievals. In our example, this technique wastes 33.3% of D0's bandwidth. In addition, this technique wastes disk space because the zone with the fewest physical tracks determines the total number of logical tracks. In particular, this technique eliminates the physical tracks of those zones that have more than NTmin tracks, NTmin = Min[NT], i.e., it eliminates PTk that have NTmin < k < NT, for all i and j, 0 ≤ i < K and 0 ≤ j < mi.

5.3.4. Optimized Logical Track 2 (OLT2)

OLT2 extends OLT1 with the following additional assumption: each disk model has the same number of zones, i.e., mi is identical for all disk models, 0 ≤ i < K. Using this assumption, it constructs logical tracks by pairing physical tracks from zones that belong to different disk drives. This is advantageous for two reasons. First, it eliminates the seeks required per disk drive to retrieve the physical tracks. Second, assuming an identical revolution rate of all heterogeneous disks, it prevents one disk drive from waiting for another.

The details of OLT2 are as follows. First, it reduces the number of zones of each disk to that of the disk with fewest zones: mmin = Min[mi] for all i, 0 ≤ i < K. Hence, we are considering only zones, Zj for all i, j, and k (0 ≤ i < K, 0 ≤ j < mmin, and 0 ≤ k < q). For example, in Figure 5.18, the slowest zone of disks of and (Z2) are eliminated such that all disks utilize only two zones. This technique requires mmin disks of each disk model (totally mmin × K disks). Next, it constructs LTs such that no two physical tracks (from two different zones) in a LT belong to one physical disk drive. A logical track LTk consists of a set of physical tracks:

Equation 5.29


Figure 5.18. OLT2


where l = + j. The total number of LTs is mmin × NTmin, thus 0 ≤ k < mmin × NTmin.

OLT2 may use several possible techniques to force all disks to have the same number of zones. For each disk with δz zones more than mmin, it can either (a) merge two of its physically adjacent zones into one, repeatedly, until its number of logical zones is reduced to mmin, (b) eliminate its innermost δZ zones, or (c) a combination of (a) and (b). With (a), the number of simultaneous displays is reduced because the bandwidth of two merged zones is reduced to the bandwidth of the slower participating zone. With (b), OLT2 wastes disk space while increasing the average transfer rate of the disk drive, i.e., number of simultaneous displays. In [65], we describe a configuration planner that empowers a system administrator to strike a compromise between these two factors for one of the techniques described in this study (HetFIXB). The extensions of this planner in support of OLT2 is a part of our future research direction.

5.3.5. Heterogeneous Track Pairing (HTP)

We describe this technique in two steps. First, we describe how it works for a single multi-zone disk drive. Next, we extend the discussion to a heterogeneous collection of disk drive. Finally, we discuss the tradeoff associated with this technique.

Assuming a single disk drive (say ) with #TR0 tracks, Track Pairing [15] pairs the innermost track with the outermost track (TR0()), working itself towards the center of the disk drive. The result is a logical disk drive that consists of logical tracks that have approximately the same storage capacity and transfer rate. This is based on the (realistic) assumption that the storage capacity of tracks increases linearly as one moves from the innermost track to the outermost track. Using this logical disk drive, the system may utilize one of the traditional continuous display techniques in support of hiccup-free display.

Assuming a heterogeneous configuration consisting of K disk models, HTP utilizes Track Pairing to construct track pairs for each disk. If the number of disks for each disk model is identical (q0 = q1 = ... = qk-1), HTP constructs qi groups of disk drives consisting of one disk from each of the K disk models. Next, it realizes a logical track that consists of K track pairs, one track pair from each disk drive in the group. These logical tracks constitute a logical disk. Obviously, the disk with the fewest number of tracks determines the total number of logical tracks for each logical disk. With such a collection of homogeneous logical disks, one can use one of the popular hiccup-free display techniques. For example, similar to both OLT1 and OLT2, one can stripe a video clip into blocks and assign the blocks to the logical disks in a round-robin manner.

HTP wastes disk space in two ways. First, the number of tracks in a logical disk is determined by the physical disk drive with fewest track pairs. For example, if a configuration consists of two heterogeneous disks, one with 20,000 track pairs and the other with 15,000 track pairs, then the resulting logical disk will consist of 15,000 track pairs. In essence, this technique eliminates 5,000 track pairs from the first disk drive. Second, while it is realistic to assume that the storage capacity of each track increases linearly from the innermost track to the outermost one, it is not 100% accurate [15]. Once the logical tracks are realized, the storage capacity of each logical track is determined by the track with the lowest storage capacity.

5.3.6. Heterogeneous FIXB (HetFIXB)

Please refer to Section 2.5.3 for a description of how the system guarantees a continuous display with a single multi-zone disk drive and the FIXB technique. Here, we describe the extensions of this technique to heterogeneous disk drives.

With a heterogeneous collection of disks, we continue to maintain a Tscan per disk drive. While the duration of a Tscan is identical for all disk drives, the amount of data produced by each Tscan is different. We compute the block size for each disk model (recall that blocks are equi-sized for all zones of a disk) such that the faster disks compensate for the slower disks by producing more data during their Tscan period. HetFIXB aligns the Tscan of each individual disk drive with one another such that they all start and end in a Tscan.

To support N simultaneous displays, HetFIXB must satisfy the following equations.

Equation 5.30


Equation 5.31


Equation 5.32


Equation 5.33


where 0 ≤ i < K.

To illustrate, assume a configuration consisting of 3 disks, see Figure 5.19. Assume the average transfer rates of disks, AvgR0 = 80 Mbps, AvgR1 = 70 Mbps, and AvgR2 = 60 Mbps respectively. When RC = 4 Mbps, 1.5 Mbytes of data (M = 1.5 MB) is required every 3 seconds (Tp = 3 sec) to support a hiccup-free display. Based on the ratio among the average transfer rates of disk models, M0 = 0.5715 MB, Ml = 0.5 MB, and M2 = 0.4285 MB. Thus, B0 = M0/m0 = 0.19 MB, B1 = M1/m1 = 0.25 MB, B2 = M2/m2 = 0.14 MB. An object X is partitioned into blocks and blocks are assigned into zones in a round-robin manner. When a request for X arrives, the system retrieves X0, X1, and X2 (M0 = 3 x B0 amount of data) from D0 during the first Tscan. A third of M (0.5 MB) is consumed during the same Tscan. Hence, some amount of data, 0.0715 MB, remains unconsumed in the buffer. In the next Tscan, the system retrieves X3 and X4 (Ml = 2 × B1 amount of data) from D1. While the same amount of data (0.5 MB) is retrieved and consumed during this Tscan, the accumulated data (0.0715 MB) still remains in the buffer. Finally, during the last Tscan, the system retrieves X5, X6, and X7 (M2 = 3 x B2 amount of data) from D2. Even though the amount of data retrieved in this Tscan (0.4285 MB) is smaller than the amount of data displayed during a Tscan (0.5 MB), there is no starvation because 0.0715 megabytes of data is available in the buffer. This process is repeated until the end of display.

Figure 5.19. HetFIXB


5.3.7. Heterogeneous Deadline Driven (HDD)

With this technique, blocks are assigned across disk drives in a random manner. A client issues block requests, each tagged with a deadline. Each disk services block requests with the Earliest Deadline First policy. In [68], we showed that the assignment of blocks to the zones in a disk should be independent of the frequency of access to the blocks. Thus, blocks are assigned to the zones in a random manner. The size of the blocks assigned to each disk model is different. They are determined based on the average weighted transfer rate of each disk model. Let W Ri denote the weighted average transfer rate of disk model i:

Equation 5.34


Equation 5.35


Assuming Bi > Bj where i < j and 0 ≤ i, j < K, an object X is divided into blocks such that the size of each block Xi is Bi mod k. Blocks with the size of Bi are randomly assigned to disks belonging to model i. A random placement may incur hiccups that are attributed to the statistical variation of the number of block requests per disk drive, resulting in varying block retrieval time. Traditionally, double buffering has been widely used to absorb the variance of block retrieval time: while a block in a buffer is being consumed, the system fills up another buffer with data. However, we generalize double buffering to N buffering and prefetching N-1 buffers before initiating a display. This minimizes the hiccup probability by absorbing a wider variance of block retrieval time, because data retrieval is N-1 blocks ahead of data consumption.

We assume that, upon a request for a video clip X, a client: (1) concurrently issues requests for the first N-1 blocks of X (to prefetch data), (2) tags the first N-l block requests, Xi (0 ≤ i < N), with a deadline, , (3) starts display as soon as the first block (X0) arrives. For example, when N = 4, first three blocks are requested at the beginning. Then, the next block request is issued immediately after the display is initiated. Obviously, there are other ways of deciding both the deadline of the prefetched blocks and when to initiate display blocks. In [68], we analyzed the impact of these alternative decisions and demonstrated that the combination of the above two choices enhances system performance.

5.3.8. Performance Evaluation

In this section, we quantify the performance tradeoffs associated with alternative techniques. While OLT1, OLT2, HTP and HetFIXB were quantified using analytic models, HDD was quantified using a simulation study. We conducted numerous experiments analyzing different configurations with different disk models from Quantum and Seagate. Here, we report on a subset of our results in order to highlight the tradeoffs associated with different techniques. In all results presented here, we used the three disk models shown in Table 5.11. Both Barracuda 4LP and 18 provide a 7,200 rpm while the Cheetah provides a 10,000 rpm. Moreover, we assumed that all objects in the database require a 4 Mbps bandwidth for their continuous display.

Figure 5.20 shows the cost per stream as a function of the number of simultaneous displays supported by the system (throughput) for three different configurations. Figure 5.20a shows a system that is installed in 1994 and consists of 10 Barracuda 4LP disks. Figure 5.20b shows the same system two years later when it is extended with 10 Cheetah disks. Finally, Figure 5.20c shows this system in 1998 when it is extended with 10 Barracuda 18 disks. To estimate system cost, we assumed: a) the cost of each disk at the time when they were purchased with no depreciation cost, and b) the system is configured with sufficient memory to support the number of simultaneous displays shown on the x-axis. We assumed that the cost of memory is $7/MB, $5/MB, and $3/MB in 1994, 1996, and 1998, respectively. Additional memory is purchased at the time of disk purchases in order to support additional users. (Once again, we assume no depreciation of memory.) While one might disagree with our assumptions for computing the cost of the system, note that the focus of this study is to compare the different techniques. As long as the assumptions are kept constant, we can make observations about the proposed techniques and their performance tradeoff.

Figure 5.20. Throughput and cost per stream.

Figure 5.20a: One disk model (homogeneous).

Figure 5.20b: Two disk models (heterogeneous).

Figure 5.20c: Three disk models (heterogeneous).


In these experiments, OLT2 constructed logical zones in order to force all disk models to consist of the same number of zones. This meant that OLT2 eliminated the innermost zone (zone 10) of Barracuda 4LP, splitting the fastest three zones of Cheetah into six zones, and splitting the outermost zone of Barracuda 18 into two. Figure 5.20c does not show OLT1 and OLT2 because: a) their cost per stream is almost identical to that shown in Figure 5.20b, and b) we wanted to show the difference between HetFIXB, HDD, and HTP.

Figure 5.20 shows that HetFIXB is the most cost effective technique, however, it supports fewer simultaneous displays as a function of heterogeneity. For example, with one disk model, it provides a throughput similar to the other techniques. However, with 3 disk models, its maximum through-put is lower than those provided by HDD and HTP. This is dependent on the physical characteristics of the zones because HetFIXB requires the duration of Tscan to be identical for all disk models. This requirement results in fragmentation of the disk bandwidth which in turn limits the maximum throughput of the system. Generally speaking, the greater the heterogeneity, the greater the degree of fragmentation. However, the zone characteristics ultimately decide the degree of fragmentation. One may construct logical zones in order to minimize this fragmentation, see [65]. This optimization is not reported because of strict space limitations imposed by the call for paper. It raises many interesting issues that are not presented here. Regardless, the comparison shown here is fair because our optimizations are applicable to all techniques.

OLT1 provides inferior performance as compared to the other techniques because it wastes a significant percentage of the available disk bandwidth. To illustrate, Figure 5.21 shows the percentage of wasted disk bandwidth for each disk model with each technique when the system is fully utilized (the trend holds true for less than 100% utilization). OLT1 wastes 60% of the bandwidth provided by Cheetah and approximately 30% of Barracuda 18. Most of the wasted bandwidth is attributed to these disks sitting idle. Cheetahs sit idle because they provide 10,000 rpm as compared to 7,200 rpm provided by the Barracudas. Barracuda 4LP and 18 disks sit idle because of their zone characteristics. In passing, while different techniques provide approximately similar cost per performance ratios, each wastes bandwidth in a different manner. For example, both HTP and HetFIXB provide approximately similar cost per performance ratios, HTP wastes 40% of Cheetah's bandwidth while HetFIXB wastes only 20% of the bandwidth provided by this disk model. HTP makes up for this limitation by harnessing a greater percentage of the bandwidth provided by Barracuda 4LP and 18.

Figure 5.21. Wasted disk bandwidth.


Figure 5.22 shows the maximum latency incurred by each technique as a function of the load imposed on the system. In this figure, we have eliminated OLT1 because of its prohibitively high latency. (One conclusion of this study is that OLT1 is not a competitive strategy.) The results show that HetFIXB provides the worst latency while HDD's maximum latency is below 1 second. This is because HetFIXB forces a rigid schedule with a disk zone being activated in an orderly manner (across all disk drives). If a request arrives and the zone containing its referenced block is not active then it must wait until the disk head visits that zone (even if idle bandwidth is available). With HDD, there is no such a rigid schedule in place. A request is serviced as soon as there is available bandwidth. Of course, this is at the risk of some requests missing their deadlines. This happens when many requests collide on a single disk drive due to random nature of requests to the disks. In these experiments, we ensured that such occurrences impacted one in a million requests, i.e., a hiccup probability is less than one in a million block requests.

Figure 5.22. Maximum startup latency.


OLT2 and HTP provide a better latency as compared to HetFIXB because they construct fewer logical disks [12], [70]. While OLT2 constructs a single logical disk, HTP constructs 10 logical disks, and HetFIXB constructs 30 logical disks. In the worst case scenario (assumed by Figure 5.22), with both HTP and HetFIXB, all active requests collide on a single logical disk (say dbottieneck). A small fraction of them are activated while the rest wait for this group of requests to shift to the next logical disk (in the case of HetFIXB, they wait for one Tscan). Subsequently, another small fraction is activated on dbottieneck. This process is repeated until all requests are activated. Figure 5.22 shows the incurred latency by the last activated request.

With three disk models (Figure 5.20c), OLT1 and OLT2 waste more than 80% of disk space, HTP and HDD waste approximately 70% of disk space, while HetFIXB wastes 44% of the available disk space. However, this does NOT mean that HetFIXB is more space efficient than other techniques. This is because the percentage of wasted disk space is dependent on the physical characteristics of the participating disk drives: number of disk models, number of zones per disk, track size of each zone, storage capacity of individual zones and disk drives. For example, with two disk models (Figure 5.20b), HetFIXB wastes more disk space when compared with the other techniques.

Modern magnetic disk drives feature variable transfer rates due to a technique called zone-bit recording (ZBR) which increases the amount of data being stored on a track as a function of its distance from the disk spindle. Several techniques have been proposed to harness the average transfer rate of the zones [85], [15], [65], [62], [171], as described in Chapter 2. These techniques are orthogonal to Disk Grouping, Staggered Grouping, and Disk Merging. In this section, we will only describe one approach and why its design is orthogonal. Assuming that the number of tracks in every zone is a multiple of some fixed number, [85] constructs Logical Tracks (LT) from the same numbered physical track of the different zones. The order of tracks in a LT is by zone number. When displaying a clip, the system reads a LT on its behalf per time period. This forces the disk to retrieve data from the constituting physical tracks in immediate succession by zone order. An application observes a constant disk transfer rate for each LT retrieval. LTs are employed in Disk Grouping and Staggered Grouping as follows. With these techniques, each logical block is declustered across the participating physical disks into fragments. These fragments are then assigned to one or more LTs on each of the physical disks according to the size of the fragments. Similarly, for Disk Merging the logical blocks are assigned to LTs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.247.76