Tape recording has played a large part in the history of digital audio, and continues to be important although the rapid adoption of recorders based on hard disks and optical disks is having a significant effect. Tape recording using the rotary-head principle pioneered in video recorders is used in digital audio alongside the more conventional stationary head approach. Both of these will be considered here. The reader is referred to Chapters 6 and 7 for an explanation of coding and error-correction principles.
Digital audio become economic with the development of high-density recorders in the 1970s. The necessary rate of almost two megabits per second for a stereo signal can today be recorded with a very low tape consumption. It is not so long ago, however, that the data rate itself was a problem. When head and tape technology were less advanced than they are today, wavelengths on tape were long, and the only way that high frequencies could be accommodated was to use high speeds. High speed can be achieved in two ways. The head can remain fixed, and the tape can be transported rapidly, with obvious consequences, or the tape can travel relatively slowly, and the head can be moved. The latter is the principle of the rotary-head recorder. Figure 9.1 shows the general arrangement of the two major categories of rotary-head recorder. In transverse-scan recorders, relatively short tracks are recorded almost at right angles to the direction of tape motion by a rotating headwheel containing, typically, four heads. In helical-scan recorders, the tape is wrapped around the drum in such a way that it enters and leaves in two different planes. This causes the rotating heads to record long slanting tracks. In both approaches, the width of the space between tracks is determined by the linear tape speed. The track pitch can easily be made much smaller than in stationary-head recorders.
The use of rotary heads was instrumental in the development of the first video recorders. As video signals consist of discrete lines and frames, it was possible to conceal the interruptions in the tracks of a rotary-head machine by making them coincident with the time when the CRT was blanked during flyback.
If digital sample data are encoded to resemble a video waveform, which is known as pseudo-video or composite digital, they can be recorded on a fairly standard video recorder. Digital audio recorders have been made using quadruplex video recorders, one-inch video recorders, U-matic cassette recorders, and the smaller consumer formats. The device needed to format the samples in this way is called a PCM adaptor.
Digital audio recorders have also been made which use only the transport of a video recorder, with specially designed digital signal electronics. Instead of using analog FM, it is possible to use digital recording, as described in Chapter 6, to make a direct digital recorder. The machines built by Decca fall into this category.
The final category of digital audio recorder using rotary heads is one in which direct digital recording is used with a transport specially designed for audio use with no compromises due to a video-based ancestry. DAT is such a machine.
Figure 9.2 shows a block diagram of a PCM adaptor. The unit has five main sections. Central to operation is the sync and timing generation, which produces sync pulses for control of the video waveform generator and locking the video recorder, in addition to producing sampling-rate clocks and timecode. An ADC allows a conventional analog audio signal to be recorded, but this can be bypassed if a suitable digital input is available. Similarly a DAC is provided to monitor recordings, and this too can be bypassed by using the direct digital output. Also visible in Figure 9.2 are the encoder and decoder stages which convert between digital sample data and the pseudo-video signal.
An example of this type of unit is the PCM-1610/1630 which was designed by Sony for use with a U-matic Video Cassette Recorder (VCR) specifically for Compact Disc mastering. Chapter 4 showed how many audio sampling rates were derived from video frequencies. The Compact Disc format is an international standard, and it was desirable for the mastering recorder to adhere to a single format. Thus the PCM-1610 only worked in conjunction with a 525/60 monochrome VCR. There was no 625/50 version. Thus even in PAL countries Compact Discs were mastered on 60 Hz VCRs to allow the traditional international interchange of recordings. The PCM-1610 was intended for professional use, and thus was not intended to be produced in volume. For this reason the format is simple, even crude, because the LSI technology needed to implement more complex formats was not available.
A typical line of pseudo-video is shown in Figure 9.3. The line is divided into bit cells and, within them, black level represents a binary zero, and about 60 per cent of peak white represents binary one. The reason for the restriction to 60 per cent is that most VCRs use non-linear pre-emphasis and this operating level prevents any distortion due to the pre-emphasis causing misinterpretation of the pseudo-video. The use of a two-level input to a frequency modulator means that the recording is essentially frequency-shift keyed (FSK).
As the video recorder is designed to switch heads during the vertical interval, no samples can be recorded there. In all rotary-head recorders, some form of time compression is used to squeeze the samples into the active parts of unblanked lines. This is simply done by reading the samples from a memory at an instantaneous rate which is higher than the sampling rate. Owing to the interruptions of sync pulses, the average rate achieved will be the same as the sampling rate. The samples read from the memory must be serialized so that each bit is sent in turn.
It was shown in Chapter 7 that digital audio recorders use extensive interleaving to combat tape dropout. The PCM-1610 subdivides each video field into seven blocks of 35 lines each, and interleaves samples within the blocks. A simple crossword error-correction scheme is used. Some VCRs have dropout compensators built in, which repeat a section of the previous line to conceal the missing picture information. Such circuits must be disabled when used with PCM adaptors because they interfere with the error-correction mechanism.
A PCM adaptor for Compact Disc mastering was also developed by JVC.1 This format had a more powerful error-correction system, and could be used with VHS recorders; again only 525/60 machines were supported.
For consumer use, a PCM adapter format was specified by the EIAJ2 which would record stereo with fourteen-bit linear quantizing. These units would be used with a domestic VCR. Since the consumer would expect to be able to use the VCR for conventional TV recording as well, the EIAJ format is in fact two incompatible formats. One uses a sampling rate of 44.0559 kHz in conjunction with 525/59.94 NTSC timing, and one uses 44.1 kHz sampling with 625/50 PAL timing. In the popular PCM-F1, Sony produced a variation on the format which allowed sixteen-bit linear quantizing.
The PCM-F1 was built with LSI technology for low mass-production cost. Owing to the low cost of the product, it found application in professional circles, and indeed served as the introduction to digital audio for many people. Being a consumer product, only one convertor was used between digital and analog domains. This was multiplexed between the two audio channels, resulting in a timeshift between samples of half the sample period, or about 11 μs. This was not a problem in normal use, since the opposite shift was introduced by the multiplexed convertor used for replay. The standard PCM-F1 was not equipped with digital outputs or inputs, and accordingly not too much trouble was taken in controlling DC offsets due to convertor drift. When enthusiasts began to modify the unit to fit digital connections, these problems became significant.
Several companies manufactured adaptor units incorporating digital filters to remove DC offsets and the 11 μs shift to produce a standard AES/EBU output.
In contrast to the formats described above, which used the signal circuitry of video recorders virtually unmodified, the digital recorders developed by Decca3 for vinyl and Compact Disc mastering use only the transport and servomechanisms of a 625/50 one-inch open-reel video recorder, and make a direct digital recording on the diagonal tracks using MFM channel code.
When an existing video recorder is used as a basis for a digital audio recorder, the video bandwidth is already defined, and in most cases is much greater than necessary. Furthermore, the signal-to-noise ratio of video recorders is much too high for the purposes of storing binary. The result of these factors is that the tape consumption of such a machine will be far higher than necessary.
As digital audio became established, and markets opened up for large numbers of machines, it was no longer necessary to borrow technology from other disciplines, because it was economically viable to design a purpose-built product. The first of this generation of machines is DAT (digital audio tape). By designing for a specific purpose, the tape consumption can be made very much smaller than that of a converted video machine. In fact the DAT format achieved more bits per square inch than any other form of magnetic recorder at the time of its introduction. The origins of DAT are in an experimental machine built by Sony,4 but the DAT format has grown out of that through a process of standardization involving some eighty companies.
The general appearance of the DAT cassette is shown in Figure 9.4. The overall dimensions are only 73mm 54mm 10.5mm which is rather smaller than the Compact Cassette. The design of the cassette incorporates some improvements over its analog ancestor.5 As shown in Figure 9.5, the apertures through which the heads access the tape are closed by a hinged door, and the hub drive openings are covered by a sliding panel which also locks the door when the cassette is not in the transport. The act of closing the door operates brakes which act on the reel hubs. This results in a cassette which is well sealed against contamination due to handling or storage. The short wavelengths used in digital recording make it more sensitive to spacing loss caused by contamination.
As in the Compact Cassette, the tape hubs are flangeless, and the edge guidance of the tape pack is achieved by liner sheets. The flangeless approach allows the hub centres to be closer together for a given length of tape. The cassette has recognition holes in four standard places so that players can automatically determine what type of cassette has been inserted. In addition there is a write-protect (record-lockout) mechanism which is actuated by a small plastic plug sliding between the cassette halves. The end-of-tape condition is detected optically and the leader tape is transparent. There is some freedom in the design of the EOT sensor. As can be seen in Figure 9.6, transmitted-light sensing can be used across the corner of the cassette, or reflected-light sensing can be used, because the cassette incorporates a prism which reflects light around the back of the tape. Study of Figure 9.6 will reveal that the prisms are moulded integrally with the corners of the transparent insert used for the cassette window.
The high coercivity (typically 1480 oersteds) metal powder tape is 3.81 mm wide, the same width as Compact Cassette tape. The standard overall thickness is 13 μm. A striking feature of the metal tape is that the magnetic coating is so thin, at about 3 μm, that the tape appears translucent. The maximum capacity of the cassette is about 60m.
When the cassette is placed in the transport, the slider is moved back as it engages. This releases the lid lock. Continued movement into the transport pushes the slider right back, revealing the hub openings. The cassette is then lowered onto the hub drive spindles and tape guides, and the door is fully opened to allow access to the tape.
In DAT, threading is simplified because the digital recording does not need to be continuous. DAT extends the technique of time compression used to squeeze continuous samples into intermittent video lines. Blocks of samples to be recorded are written into a memory at the sampling rate, and are read out at a much faster rate when they are to be recorded. In this way the memory contents can be recorded in less time. Figure 9.7 shows that when the samples are time-compressed, recording is no longer continuous, but is interrupted by long pauses. During the pauses in recording, it is not actually necessary for the head to be in contact with the tape, and so the angle of wrap of the tape around the drum can be reduced, which makes threading easier. In DAT the wrap angle is only 90° on the commonest drum size. As the heads are 180° apart, this means that for half the time neither head is in contact with the tape. Figure 9.8 shows that the partial-wrap concept allows the threading mechanism to be very simple indeed. As the cassette is lowered into the transport, the pinch roller and several guide pins pass behind the tape. These then simply move toward the capstan and drum and threading is complete. A further advantage of partial wrap is that the friction between the tape and drum is reduced, allowing power saving in portable applications, and allowing the tape to be shuttled at high speed without the partial unthreading needed by videocassettes. In this way the player can read subcode during shuttle to facilitate rapid track access.
The track pattern laid down by the rotary heads is shown in Figure 9.9. The heads rotate at 2000 rev/min in the same direction as tape motion, but because the drum axis is tilted, diagonal tracks 23.5 mm long result, at an angle of just over six degrees to the edge. The diameter of the scanner needed is not specified, because it is the track pattern geometry which ensures interchange compatibility. For portable machines, a small scanner is desirable, whereas for professional use, a larger scanner allows additional heads to be fitted for confidence replay and editing. It will be seen from Figure 9.9 that azimuth recording is employed as was described in Chapter 6. This requires no spaces or guard bands between the tracks. The chosen azimuth angle of ± 20° reduces crosstalk to the same order as the noise, with a loss of only 1 dB due to the apparent reduction in writing speed.
In addition to the diagonal tracks, there are two linear tracks, one at each edge of the tape, where they act as protection for the diagonal tracks against edge damage. Owing to the low linear tape speed the use of these edge tracks is somewhat limited.
Several related modes of operation are available, some of which are mandatory whereas the remainder are optional. These are compared in Table 9.1. The most important modes use a sampling rate of 48 kHz or 44.1 kHz, with sixteen-bit two’s complement uniform quantization. Alongside the audio samples can be carried 273 kbits/s of subcode (about four times that of Compact Disc) and 68.3 kbits/s of ID coding, whose purpose will be explained in due course. With a linear tape speed of 8.15 mm/s, the standard cassette offers 120 min unbroken playing time. Initially it was proposed that all DAT machines would be able to record and play at 48 kHz, whereas only professional machines would be able to record at 44.1 kHz. For consumer machines, playback only of prerecorded media was proposed at 44.1 kHz, so that the same software could be released on CD or prerecorded DAT tape. Now that a SCMS (serial copying management system) is incorporated into consumer machines, they too can record at 44.1 kHz. For reasons which will be explained later, contact duplicated tapes run at 12.225 mm/s to offer a playing time of 80 min. The same subcode and ID rate is offered. The above modes are mandatory if a machine is to be considered to meet the format.
Table 9.1 The significance of the recognition holes on the DAT cassette. Holes 1, 2 and 3 form a coded pattern; whereas hole 4 is independent.
Hole 1 | Hole 2 | Hole 3 | Function |
0 | 0 | 0 | Metal powder tape or equivalent/13 μm thick |
0 | 1 | 0 | MP tape or equivalent/thin tape |
0 | 0 | 1 | 1.5 TP/13 μm thick |
0 | 1 | 1 | 1.5 TP/thin tape |
1 | (Reserved) |
Hole 4 | 1 = Hole present 0 = Hole blanked off |
||
0 | Non-prerecorded tape | ||
1 | Prerecorded tape |
Option 1 is identical to 48 kHz mode except that the sampling rate is 32 kHz. Option 2 is an extra-long-play mode. In order to reduce the data rate, the sampling rate is 32 kHz and the samples change to twelve-bit two’s complement with non-linear quantizing. Halving the subcode rate allows the overall data rate necessary to be halved. The linear tape speed and the drum speed are both halved to give a playing time of four hours. All the above modes are stereo, but option 3 uses the sampling parameters of option 2 with four audio channels. This doubles the data rate with respect to option 2, so the standard tape speed of 8.15 mm/s is used.
Figure 9.10 shows a block diagram of a typical DAT recorder, which will be used to introduce the basic concept of the machine and the major topics to be described. In order to make a recording, an analog signal is fed to an input ADC, or a direct digital input is taken from an AES/EBU interface. The incoming samples are subject to interleaving to reduce the effects of error bursts. Reading the memory at a higher rate than it was written performs the necessary time compression. Additional bytes of redundancy computed from the samples are added to the data stream to permit subsequent error correction. Subcode information such as the content of the AES/EBU channel status message is added, and the parallel byte structure is fed to the channel encoder, which combines a bit clock with the data, and produces a recording signal according to the 8/10 code which is free of DC (see Chapter 6). This signal is fed to the heads via a rotary transformer to make the binary recording, which leaves the tape track with a pattern of transitions between the two magnetic states.
On replay, the transitions on the tape track induce pulses in the head, which are used to re-create the record current waveform. This is fed to the 10/8 decoder which converts it to the original data stream and a separate clock. The subcode data are routed to the subcode output, and the audio samples are fed into a de-interleave memory which, in addition to time-expanding the recording, functions to remove any wow or flutter due to head-to-tape speed variations. Error correction is performed partially before and partially after de-interleave. The corrected output samples can be fed to DACs or to a direct digital output.
In order to keep the rotary heads following the very narrow slant tracks, alignment patterns are recorded as well as the data. The automatic track-following system processes the playback signals from these patterns to control the drum and capstan motors. The subcode and ID information can be used by the control logic to drive the tape to any desired location specified by the user.
As with any recorder intended for consumer use, economy of tape consumption is paramount, and this involves numerous steps to use the tape area as efficiently as possible. As magnetic tape is flexible and is manufactured to finite tolerances, there will always be some error between the path of the replay head and the recorded track. In the relatively wide tracks of analog audio recorders this is seldom a problem. The high-output metal tape used in DAT allows an adequate signal-to-noise ratio to be obtained with very narrow tracks on the tape. This reduces tape consumption and allows a small cassette, but it becomes necessary actively to control the relative position of the head and the track in order to maximize the replay signal and minimize the error rate.
The track width and the coercivity of the tape largely define the signal-to-noise ratio. A track width has been chosen which makes the signal-to-crosstalk ratio dominant in cassettes which are intended for user recording.
Prerecorded tapes are made by contact duplication, and this process only works if the coercivity of the copy is less than that of the master. The output from prerecorded tapes at the track width of 13.59 μm would be too low, and would be noise-dominated, which would cause the error rate to rise. The solution to this problem is that in prerecorded tapes the track width is increased to be the same as the head pole. The noise and crosstalk are both reduced in proportion to the reduced output of the medium, and the same error rate is achieved as for normal high-coercivity tape.
The 50 per cent increase in track width is achieved by raising the linear tape speed from 8.15 to 12.225 mm/s, and so the playing time of a prerecorded cassette falls to 80 min as opposed to the 120 min of the normal tape.
The track-following principles are the same for prerecorded and normal cassettes, but there are detail differences which will be noted. Tracking is achieved in conventional video recorders by the use of a linear control track which contains one pulse for every diagonal track. The phase of the pulses picked up by a fixed head is compared with the phase of pulses generated by the drum, and the error is used to drive the capstan. This method is adequate for the wide tracks of analog video recorders, but errors in the mounting of the fixed head and variations in tape tension rule it out for high-density use. In any case the control-track head adds undesirable mechanical complexity. In DAT, the tracking is achieved by reading special alignment patterns on the tape tracks themselves, and using the information contained in them to control the capstan.
DAT uses a technique called area-divided track following (ATF) in which separate parts of the track are set aside for track-following purposes. Figure 9.11 shows the basic way in which a tracking error is derived. The tracks at each side of the home track have bursts of pilot tone recorded in two different places. The frequency of the pilot tone is 130 kHz, which has been chosen to be relatively low so that it is not affected by azimuth loss. In this way an A head following an A track will be able to detect the pilot tone from the adjacent B tracks.
In Figure 9.12(a) the case of a correctly tracking head is shown. The amount of side-reading pilot tone from the two adjacent B tracks is identical. If the head is off track for some reason, as shown in Figure 9.12(b), the amplitude of the pilot tone from one of the adjacent tracks will increase, and the other will decrease. The tracking error is derived by sampling the amplitude of each pilot-tone burst as it occurs, and holding the result so the relative amplitudes can be compared.
There are some practical considerations to be overcome in implementing this simple system, which result in some added complication. The pattern of pilot tones must be such that they occur at different times on each side of every track. To achieve this there must be a burst of pilot tone in every track, although the pilot tone in the home track does not contribute to the development of the tracking error. Additionally there must be some timing signals in the tracks to determine when the samples of pilot tone should be made. The final issue is to prevent the false locking which could occur if the tape happened to run at twice normal speed.
Figure 9.13 shows how the actual track-following pattern of DAT is laid out.6 The pilot burst is early on A tracks and late on B tracks. Although the pilot bursts have a two-track cycle, the pattern is made to repeat over four tracks by changing the period of the sync patterns which control the pilot sampling. This can be used to prevent false locking. When an A head enters the track, it finds the home pilot-burst first, followed by pilot from the B track above, then pilot from the B track below. The tracking error is derived from the latter two. When a B head enters the track, it sees pilot from the A track above first, A track below next, and finally home pilot. The tracking error in this case is derived from the former two. The machine can easily tell which processing mode to use because the sync signals have a different frequency depending on whether they are in A tracks (522 kHz) or B tracks (784 kHz). The remaining areas are recorded with the interblock gap frequency of 1.56 MHz which serves no purpose except to erase earlier recordings. Although these pilot and synchronizing frequencies appear strange, they are chosen so that they can be simply obtained by dividing down the master channel-bit-rate clock by simple factors. The channel-bit-rate clock, Fch, is 9.408 MHz; pilot, the two sync frequencies and erase are obtained by dividing it by 72, 18, 12 and 6 respectively. The time at which the pilot amplitude in adjacent tracks should be sampled is determined by the detection of the synchronizing frequencies. As the head sees part of three tracks at all times, the sync detection in the home track has to take place in the presence of unwanted signals. On one side of the home sync signal will be the interblock gap frequency, which is high enough to be attenuated by azimuth. On the other side is pilot, which is unaffected by azimuth. This means that sync detection is easier in the tracking-error direction away from pilot than in the direction towards it. There is an effective working range of about +4 and –5 μm due to this asymmetry, with a dead band of 4 μm between tracks. Since the track-following servo is designed to minimize the tracking error, once lock is achieved the presence of the dead zone becomes academic. The differential amplitude of the pilot tones produces the tracking error, and so the gain of the servo loop is proportional to the playback gain, which can fluctuate due to head contact variations and head tolerance. This problem is overcome by using AGC in the servo system. In addition to subtracting the pilot amplitudes to develop the tracking error, the circuitry also adds them to develop an AGC voltage. Two sample-and-hold stages are provided which store the AGC parameter for each head separately. The heads can thus be of different sensitivities without upsetting the servo. This condition could arise from manufacturing tolerances, or if one of the heads became contaminated.
One of the most important aspects of DAT maintenance is to ensure that tapes made on a particular machine meet the specifications laid down in the format. If they do, then it will be possible to play those tapes on any other properly aligned machine. In this section the important steps necessary to achieve interchange between transports will be outlined.
When the cassette is lowered into the transport it seats on pillars which hold it level. The tape within the cassette is guided by liner sheets which determine the height of the tape pack above the transport baseplate. The first step in aligning the transport is to ensure that all the guides the tape runs past on its way to and from the scanner are at the same height as the tape. Figure 9.14 shows that the guides are threaded so that they can be screwed up and down. In the correct position, the tape will stay in the cassette plane and distortion will be avoided.
Once the tape can be passed through the machine without damage, the basic transport functions can be checked. Since tape tension affects the track angle, obtaining the correct tension is essential before attempting any adjustments at the scanner. As the DAT mechanism is so small, it is not possible to fit a conventional tape tension gauge. Instead, special test cassettes are made which incorporate torque meters into the reel hubs. Use of these test cassettes will allow the tension to be checked in various transport and shuttle modes. Since the scanner friction is in the opposite sense when the tape is reversed, the back tension must be higher than for forward mode to keep the average scanner tension constant. In some transports the tension-sensing arm is not statically balanced, and the tape tension becomes a function of the orientation of the machine. In this case the adjustment must be made with the machine in the attitude in which it is to be used.
The track spacing on record is determined by the capstan speed, which must be checked. As the capstan speed will be controlled by a frequency-generating wheel on the capstan shaft, it is generally only necessary to check that the capstan FG frequency is correct in record mode. A scratch tape will be used for this check.
Helical interchange can now be considered. Tape passing around the scanner is guided in three ways. On the approach, the tape is steered by the entrance guide, which continues to affect the first part of the scanner wrap. The centre part of the scanner wrap is guided by the machined step on the scanner base. Finally the last part of the wrap is steered by the exit guide. Helical interchange is obtained by adjusting the entrance and exit guide heights so that the tape passes smoothly between the three regions. In video recorders, which use wider tape, it will often also be found necessary to adjust the angle of the guides. In DAT the tape is so narrow that it will flex to accommodate angular errors, and the guides only need a height adjustment.
As tape is flexible, it will distort as it passes round the scanner if the entrance and exit guides are not correctly set. The state of alignment can be assessed by working out the effect of misalignments on the ability of the replay head to follow tape tracks. Figure 9.15(a) shows an example of the entrance guide being too low. The tape is forced to climb up to reach the scanner step, and then it has to bend down again to run along the step. If straight tracks were originally recorded on the tape, they will no longer be straight when the tape is distorted in this way. Figure 9.15(b) shows what happens to the track. Since in an azimuth recording machine the head is larger than the track, small distortions of this kind will be undetectable. It is necessary to offset the tracking deliberately so that the effect of the misalignment can be seen. If the head is offset upwards, then the effect will be that the RF signal grows in level briefly at the beginning of the track, giving the envelope an onion-like appearance on an oscilloscope. If the head is offset downwards, the distortion will take the track away from the head path and the RF envelope will be waisted. If the misalignment is in the exit guide, then the envelope disturbances will appear at the right-hand end of the RF envelope. The height of both the entrance and exit guides is adjusted until no disturbance of the RF envelope is apparent whatever tracking error is applied. One simple check is to disable the track-following system so that the capstan runs at approximately playback speed with no feedback. The tracking will slowly drift in and out of registration. Under these conditions the RF envelope should remain rectangular, so that the amplitude rises and falls equally over the entire head sweep. A rough alignment can be performed with a tape previously recorded on a trustworthy machine, but final alignment requires the use of a reference tape.
Once the mechanical geometry of the transport is set up, straight tracks on tape will appear straight to the scanner, and it is then possible to set up the automatic track-following system so that optimum tracking occurs when the tracking error is zero. This is done by observing the RF level on playback and offsetting the tracking adjustment to each side until two points are found where the level begins to fall. The adjustment is then placed half-way between these points. If the machine has a front panel tracking adjustment, it should be set to zero whilst the internal adjustment is made.
The final interchange adjustment is to ensure that the scanner timing is correct. Even with correct geometry, the tracks can be laid down at the correct angle and spacing, but at the wrong height on the tape, as Figure 9.16 shows. The point where recording commences is determined by the sensor which generates a pulse once per revolution of the scanner. The correct timing can be obtained either by physically moving the sensor around the scanner axis, or by adjusting a variable delay in series with an artificially early fixed sensor. A timing reference tape is necessary that has an observable event in the RF waveform. The tape is played, and the sensor or delay is adjusted to give the specified relative timing between the event on the reference tape and the sensor pulse. When this is correct, the machine will record tracks in the right place along the helical sweep.
The channel code used in DAT is designed to function well in the presence of crosstalk, to have zero DC component to allow the use of a rotary transformer, and to have a small ratio of maximum and minimum run lengths to ease overwrite erasure. The code used is a group code where eight data bits are represented by ten channel bits, hence the name 8/10. The details of the code are given in Chapter 6.
The basic unit of recording is the sync block shown in Figure 9.17. This consists of the sync pattern, a three-byte header and 32 bytes of data, making 36 bytes in total, or 360 channel bits. The subcode areas each consist of eight of these blocks, and the PCM audio area consists of 128 of them. Note that a preamble is only necessary at the beginning of each area to allow the data separator to phase-lock before the first sync block arrives. Synchronism should be maintained throughout the area, but the sync pattern is repeated at the beginning of each sync block in case sync is lost due to dropout.
The first byte of the header contains an ID code which in the PCM audio blocks specifies the sampling rate in use, the number of audio channels, and whether there is a copy-prohibit in the recording. The second byte of the header specifies whether the block is subcode or PCM audio with the first bit. If set, the least significant four bits specify the subcode block address in the track, whereas if it is reset, the remaining seven bits specify the PCM audio block address in the track. The final header byte is a parity check and is the exclusive-OR sum of header bytes one and two.
The data format within the tracks can now be explained. The information on the track has three main purposes, PCM audio, subcode data and ATF patterns. It is necessary to be able to record subcode at a different time from PCM audio in professional machines in order to update or post-stripe the timecode. The subcode is placed in separate areas at the beginning and end of the tracks. When subcode is recorded on a tape with an existing PCM audio recording, the heads have to go into record at just the right time to drop a new subcode area onto the track. This timing is subject to some tolerance, and so some leeway is provided by the margin area which precedes the subcode area and the interblock gap (IBG) which follows. Each area has its own preamble and sync pattern so the data separator can lock to each area individually even though they were recorded at different times or on different machines.
The track-following system will control the capstan so that the heads pass precisely through the centre of the ATF area. Figure 9.18 shows that, in the presence of track curvature, the tracking error will be smaller overall if the ATF pattern is placed part-way down the tracks. This explains why the ATF patterns are between the subcode areas and the central PCM audio area. The data interleave is block-structured. One pair of tape tracks (one + azimuth and one – azimuth), corresponding to one drum revolution, make up an interleave block. Since the drum turns at 2000 rev/min, one revolution takes 30 ms and, in this time, 1440 samples must be stored for each channel for 48 kHz working.
The first interleave performed is to separate both left- and right-channel samples into odd and even. The right-channel odd samples followed by the left even samples are recorded in the + azimuth track, and the left odd samples followed by the right even samples are recorded in the – azimuth track. Figure 9.19 shows that this interleave allows uncorrectable errors to be concealed by interpolation. At (b) a head becomes clogged and results in every other track having severe errors. The split between right and left samples means that half of the samples in each channel are destroyed instead of every sample in one channel. The missing right even samples can be interpolated from the right odd samples, and the missing left odd samples are interpolated from the left even samples. Figure 9.19(c) shows the effect of a longitudinal tape scratch. A large error burst occurs at the same place in each head sweep. As the positions of left- and right-channel samples are reversed from one track to the next, the errors are again spread between the two channels and interpolation can be used in this case also.
The error-correction system of DAT uses product codes and was treated in detail in Chapter 7.
In all DAT applications it is important to be able to read subcode in shuttle so that wanted areas of the recording can be reached rapidly. In audio recorders, it is also useful to be able to hear at least some sound in shuttle so that the desired part of the recording can be located by ear.
When helical scan recordings are made, the geometry of the tape tracks results from the ratio of the scanner and tape speeds. If it is desired to follow the tracks properly at other than normal speed, then both scanner speed and tape speed must change in the same proportion. Since the scanner speed is locked directly to the sampling clock, it follows that some speed variation can be had simply by changing the clock frequency on replay. If a reference sampling rate fed to a professional DAT machine is reduced in frequency slightly, this will have the effect of slowing down the scanner and all signal processing logic. The slower scanner will now find that the tape tracks are passing through the machine too quickly, and the ATF system will build up a tracking error which in turn causes the capstan to slow down.
This mechanism will be adequate over a small range, perhaps half a semitone, but clearly cannot be used for shuttle. Even if it were possible to turn the scanner at 200 normal speed, it is doubtful whether any useful head contact would be achieved.
When the tape is shuttled, the track-following process breaks down and the heads cross tracks randomly. The head-to-tape speed is the vector sum of the scanner peripheral speed and the tape linear speed. In most formats this is dominated by the head speed, and so the angle at which the heads cross the tracks is relatively shallow. Figure 9.20 shows that using a replay head which is wider than the track allows a reasonable length of track to be correctly recovered even at shuttle speeds. Provided the sync blocks are made shorter than the minimum distance shown, it is possible to recover some data. DAT takes advantage of this effect to allow some sound to be heard in shuttle, to allow the subcode to be read for track searching, and to pick up timecode. The heads in a two-headed machine will typically be 20 μm wide, which is a full 50 per cent wider than the track. This wider replay head is also necessary to replay the wider tracks which are used on contact-duplicated tapes owing to the reduced coercivity needed by the duplication process.
The shuttle readout process is aided by modifying the scanner speed so that the head-to-tape speed remains the same whatever the linear tape speed. Since the scanner turns with the tape direction in DAT, this means speeding up the scanner in forward shuttle and slowing it down in reverse. The effect is that offtape signals have a constant frequency, and so the filters and phase-locked loops in the replay circuits will be able to stay in lock and recover data whenever the head is sufficiently close to the track centreline.
The data areas of the track consist of numerous short sync blocks, and each one of these is self-contained in that the data separator can resynchronize at the beginning of each, and each contains a Reed– Solomon codeword. If the head crosses the tracks at a shallow angle, then it is highly probable that one or more sync blocks will be recovered correctly. Clearly it is not possible to predict which blocks these will be. In fact total recovery is not necessary, because the goal is to produce a simulation of the recording at much more than normal speed, and this can readily be done by sending to the output convertors only every nth sample from the recording. Provided the tape format is designed with this in mind, the track crossings in shuttle will automatically reduce the offtape data rate.
Figure 9.21 shows the principle in simplified form. A head crosses a number of tracks in one rotation and picks up a sync block from each. The interleave used on recording means that the memory which is filled from the successfully recovered sync blocks now contains samples which are more or less evenly spaced throughout a number of tracks. Since each sync block must be independent in this mode, it is important that both bytes of a given sample are in the same sync block. Reference to Figure 7.38 will show that the use of alternate symbols in the columns when forming inner codewords has this effect.
Each subcode track consists of eight sync blocks, but the subcode data rate which can be supported is much less than this would indicate because it is necessary to repeat the subcode information many times in successive tracks to guarantee that it can be read when the heads cross tracks in shuttle. There are a number of incompatible timecodes which have been designed for the various television standards, but it was not appropriate to adopt them because it was desired to have a world standard for DAT timecode which would be independent of television standards. Since the scanner speed in DAT is locked to the sampling rate, it is possible to deduce the exact time by counting head revolutions. There are exactly 100 revolutions in 3 seconds, so it is possible to have a timecode for DAT which counts in hours, minutes, seconds and DAT frames. This timecode is recorded on tape, but real machines will have gearbox software which allows them to convert the tape timecode into any of the television or film timecode formats where necessary. For synchronizing two or more DAT machines, the DAT timecode can be used directly.
The subcode of DAT is recorded in areas outside the ATF patterns, physically distinct from the PCM area. As a result, the subcode can be independently edited after an audio recording has been made. The DAT subcode performs the functions of program access in much the same way as in the Compact Disc, but it also has a subset of codes for professional use which allows the recording of timecode for synchronizing and edit control purposes.
The PCM audio data are primarily intended to be played at normal speed, with a reduced quality at other speeds. In contrast, the subcode must function well over a wide speed range so that it can be used for high-speed searching to cues. For this reason the structure of the subcode is repetitive to increase the chance of pickup, but it has no outer redundancy, as outer codes could not be assembled in shuttle.
Figure 9.22 shows the general arrangement of the subcode sync blocks. Like the PCM sync blocks, the subcode blocks have eight bytes of C1 redundancy in every other block, so there is a two-block sequence. The subcode data are assembled into standard-sized messages known as packs which contain eight bytes. In the first block of the pair, up to four packs can be accommodated, whereas in the second, only three are present because of the presence of the C1 redundancy.
The header structure of the subcode is identical to that of the PCM data. Figure 9.23 shows that following the sync byte there are two header bytes, followed by a parity byte generated on the first two. The MSB of the block address byte is always 1 in the subcode, to distinguish subcode from PCM data. There are eight subcode blocks at each end of the track, so the four LSBs of the block address byte convey the block number 0–15. The LSB of the block number allows the player to determine whether the first or second block of the subcode sequence has been found.
The smaller range of block addresses leaves three bits in the second header byte for other purposes. In the first block of the pair, these three bits form the Format ID which specifies the number of packs which have been recorded in the pair of blocks.
The first byte of the header in an even-numbered block is split into the Control ID and the Data ID. The Control ID consists of four individual flags. TOC-ID is set if the block contains Table of Contents packs. Skip-ID causes the machine to fast forward to the next Start-ID, which serves a similar function to the P-flag in Compact Disc. Finally, Priority-ID is set if the Program Number (P-No.) in the odd-numbered subcode header has been edited, so that the subcode P-No. has priority over any P-No. in the PCM-ID which cannot be edited independently of the audio. Data-ID serves the same purpose as ID-0 in PCM-ID; all zeros indicate subcode should be interpreted as digital audio standard, 1000 indicates DDS format (Digital Data Storage) subcode.
The majority of subcode data are stored in eight-byte packs in the subdata area. Figure 9.24(a) shows the basic layout of a pack. The four MSBs of the first Pack Contents (PC) symbol contain the Item code which defines the meaning of the rest of the pack. The last PC is a simple XOR parity symbol calculated by adding PC 1 through PC 7 in modulo-2.
Figure 9.24(b) shows the current valid Item codes. Program Time, Absolute Time and Running Time all have the same basic pack layout which is shown in Figure 9.25. As stated, the first four bits of PC 1 are the Item code. Bit 3 of PC 1 must be zero, leaving three bits in PC 1 and the whole of PC 2 to form an eleven-bit Program Number (P-No.). In this context the word ‘Program’ corresponds to a band on a vinyl LP; it is one song or movement. PC 3 contains the index code which optionally allows a Program to be subdivided. The remaining symbols PC 4–7 carry the time information in hours, minutes, seconds and DAT scanner frames (33.33 … Hz).
When DAT is to be used for professional applications, timecode recording is often essential. DAT timecode is carried in a pack known as Professional Running Time, abbreviated to Pro R time.
There are many forms of timecode arising from the variety of frame rates used in television and film. As DAT is in international use, the adoption of a single timecode standard to the exclusion of others is not acceptable. The solution is to record a universal form of timecode on the tape, and to use conversion circuitry appropriate to the frame rate of the system with which it is proposed to work. Internally DAT Pro R time records hours, minutes, seconds and DAT frames (33.33 … Hz) which relate simply to the scanner speed. The relationship of DAT frames to frames in one of the standard timecodes produces a variety of phase relationships as shown in Figure 9.26.
Figure 9.26(a) shows the example of EBU 25 Hz television timecode being fed into a DAT recorder. The phase relationship between the frame boundaries changes from frame to frame. The phase relationship measured in samples is known as the Timecode Marker. It is recorded in the Pro R time pack along with the DAT frame number. The pack is also recorded with the sampling rate in use and the type of timecode being input. On replay, there is sufficient information in the pack to allow a suitable processor to compute from the DAT timecode and marker the position and content of EBU timecode frames which will have the same relationship to the audio samples as they originally had. The timecode marker consists of a binary number which can vary from zero up to the number of sample periods in a DAT frame (959, 1322 or 1439 according to the sampling rate in use). Figure 9.26(b) shows the situation with 24 Hz film timecode.
When a DAT recorder contains a built-in timecode generator, it will be simple to synchronize it to the sampling rate, and this is the preferred mode of operation. In this case the next timecode marker can be calculated by subtracting a constant from the previous one and expressing the result modulo the number of sample periods in a DAT frame.
Synchronous timecode and sampling rate are essential if a tape is to be played into a digital system via a timecode synchronizer. The system cannot lock to two things at once, so if a tape has asynchronous timecode and sampling rate, the synchronizer will make the replay sampling rate drift, or the sampling-rate reference will make the timecode drift. However, if the replay is to be done in the analog domain, the sampling-rate drift is of no consequence, and asynchronous working is acceptable. The DAT timecode system still works without synchronism between the external timecode signal and the scanner speed. The only difference is that the Timecode Marker parameter cannot be predicted, but will have to be measured at each scanner rotation. Figure 9.26(c) shows an example of asynchronous working.
Pro R time is recorded using a modification of the R time pack (Item 0011) as shown in Figure 9.24. Normally bit 3 of PC 1 in this pack is set to zero; for Pro R time it is set to 1 and the interpretation of the pack changes.
Bits F0 and F1 in PC 2 reflect the audio sampling rate. The Sub-Pack bits SPI-0 and SPI-1 determine whether the timecode recorded is one of the film/television timecodes, or whether it is the embedded timecode of the AES/EBU digital audio interface. When these bits are both zero, the pack is in film/television mode, and bits T0, T1 and T2 specify the frame rate in use. The remaining three bits of PC 2 and the whole of PC 3 form the eleven-bit Timecode Marker.
DAT timecode is measured in the usual hours, minutes, seconds and frames, with the prefix R. RH, RM, RS and RF are all two-digit BCD numbers. Since there are not a whole number of DAT frames in a second, two of the seconds contain 33 frames and the third contains 34 frames. This results in exactly 100 frames in 3 seconds. As in all packs, PC 8 is the modulo-2 sum of all of the other PCs.
The sample address form of timecode conveyed in the AES/EBU digital audio interface (see Chapter 8) can be carried in the pack with the Sub-Pack bits set to 01. The AES/EBU channel-status data frame repeats every 192 sample periods (4 ms at 48 kHz) and contains (among other data) a 32-bit code which is a binary count of the number of sample clocks since midnight at the beginning of the frame. Since there will be several AES/EBU frames in one DAT frame, the DAT pack records the sample address of the AES/EBU frame during which a DAT frame began, converted to DAT timecode, and the Timecode Marker parameter is the number of sample periods from the beginning of the AES/EBU frame to the beginning of the DAT frame. The principle is shown in Figure 9.27.
Conversion from the various forms of input timecode into DAT timecode is based on the fact that all forms of timecode begin from midnight with all parameters at zero. Knowledge of the basic frame rate of the standard concerned allows any actual timecode values to be converted to real time. This can then be converted back to DAT timecode. As a result, the timecode recorded by DAT is truly international. A recording made with EBU 25 Hz timecode as an input results in DAT timecode on the tape. This could, with a suitable player, generate SMPTE timecode when the tape is played. The timecode conversion equations are given in Appendix 9.1.
For replay only, it is possible to dispense with the scanner and ATF servos in some applications. The scanner free-runs at approximately twice normal speed, whilst the capstan continues to run at the correct speed. The rotary heads cross tracks randomly, but because of the increased speed, virtually every sync block is recovered, many of them twice. The increased scanner speed requires a higher clock frequency in the data separator.
Each pair of sync blocks contains two inner codewords, and those which are found to be error-free or which contain correctable random errors can be used. Each sync block contains an ID pattern and this is used to put the data in the correct place in the product block. If a second copy of any sync block is recovered it is discarded at this stage.
Once the product code memory is full, the de-interleave and error-correction process can occur as normal. Any blocks which are not recovered due to track crossing will be treated as dropouts by the error-correction system, as will genuine dropouts.
In personal portable machines and car-dashboard players the above approach allows a cost saving since two servo systems are eliminated. A further advantage is that alignment of the scanner is not necessary during manufacture, and tapes which are recorded on misaligned machines can still be played. Mistracking resulting from shock and vibration has no effect since the system is mistracking all the time.
The Sony NT (Non-Tracking) Format uses this approach. The rotary-head format uses a postage stamp-sized cassette and has no scanner servo in replay. The non-tracking approach means that interchange alignment is unnecessary. The slant guides on each side of the scanner are actually moulded into the cassette reducing mechanical complexity and cost. A 32 kHz sampling rate and data reduction allow a realistic playing time despite the minute cassette.
Following work which suggests that a rotary-head machine can accept spliced tape, Kudelski7 proposed a format for 1/4-inch tape using a rotary head which became that of the NAGRA D. This machine offers four independently recordable channels of up to 20-bit wordlength and timecode faciliies. The block structure is basically that of the audio channels of the D-1 DVTR. The format is restricted to low-density recording because of the potential for contamination with open reels. Whilst the recording density is not as great as in DAT, it is still competitive with professional analog machines and as the NAGRA D is a professional only product, tape consumption is of less consequence than reliability. Manual splicing of a helical scan tape causes a serious tracking and data loss problem at the splice. The principle of jump editing (see Chapter 11) is used so that the area of the splice is not played.
A number of manufacturers have developed low-cost digital multitrack recorders for the home studio market. These are based on either VHS or Video-8 rotary-head cassette tape decks and generally offer eight channels of audio. Recording of individual audio channels is possible because the slant tape tracks are divided up into separate blocks for each channel with edit gaps between them. Some models have timecode and include synchronizers so that several machines can be locked together to offer more tracks. These machines represent the future of multitrack recording as their purchase and running costs are considerably lower than that of stationary head machines. It is only a matter of time before a low-cost 24-track is offered.
The audio samples in a DVTR are binary numbers just like the video samples, and although there is an obvious difference in sampling rate and wordlength, this only affects the relative areas of tape devoted to the audio and video samples. The most important difference between audio and video samples is the tolerance to errors. The acuity of the ear means that uncorrected audio samples must not occur more than once every few hours. There is little redundancy in sound, and concealment of errors is not desirable on a routine basis. In video, the samples are highly redundant, and concealment can be effected using samples from previous or subsequent lines or, with care, from the previous frame. Major differences can be expected between the ways that audio and video samples are handled in a DVTR. One such difference is that the audio samples have 100 per cent redundancy: every one is recorded using about twice as much space on tape as the same amount of video data.
In DVTR formats the audio samples are carried by the same channel as the video samples. Using separate heads would have increased tape consumption and machine complexity. The use of the same rotary heads for video and audio reduces the number of preamplifiers and data separators needed in the system, whilst increasing the bandwidth requirement by only a few per cent even with double recording. In order to permit independent audio and video editing, the tape tracks are given a block structure. Editing will require the heads momentarily to go into record as the appropriate audio block is reached. Accurate synchronization is necessary if the other parts of the recording are to remain uncorrupted. The concept of a head which momentarily records in the centre of a track which it is reading is the normal operating procedure for all computer disk drives, as will be seen in Chapter 10. There are in fact many parallels between digital helical recorders and disk drives. Perhaps the only major difference is that in one the heads move slowly and the medium revolves, whereas in the other, the medium moves slowly and the heads revolve. Disk drives support their heads on an air bearing, achieving indefinite head life at the expense of linear density. Helical digital machines must use high-density recording and so there will be head contact and a wear mechanism. With these exceptions, the principles of disk recording apply to DVTRs, and some of the terminology has migrated.
One of these terms is the sector. In moving-head disk drives, the sector address is a measure of the angle through which the disk has rotated. This translates to the phase of the scanner in a rotary-head machine. The part of a track which is in one sector is called a block. The word ‘sector’ is often used instead of ‘block’ in casual parlance when it is clear that only one head is involved. However, as DVTRs have two heads in action at any one time, the word ‘sector’ means the two side-by-side blocks in the segment. As there are four independently recordable audio channels, there are four audio sectors. In D-1 (Figure 9.28), the audio is in the centre of the track, so there must be two video sectors and four audio sectors in one head sweep, and since there are two active heads, in one sweep there will be four video blocks written and eight audio blocks. In D-2 and D-3 there are also two active heads in each sweep, but the audio blocks are at the ends of the tracks, so that there are only two video blocks in the centre.
There is a requirement for the DVTR to produce pictures in shuttle. In this case, the heads cross tracks randomly, and it is most unlikely that complete video blocks can be recovered. To provide pictures in shuttle, each block is broken down into smaller components called sync blocks in the same way as is done in DAT. These contain their own error checking and an address, which in disk terminology would be called a header, which specifies where in the picture the samples in the sync block belong. In shuttle, if a sync block is read properly, the address can be used to update a frame store. Thus it can be said that a sector is the smallest amount of data which can be written and is that part of a track pair within the same sector address, whereas a sync block is the smallest amount of data which can be read. Clearly there are many sync blocks in a sector.
The sync block structure continues in the audio because the same read/ write circuitry is almost always used for audio and video data. Clearly the address structure must also continue through the audio. In order to prevent audio samples from arriving in the video frame store in shuttle, the audio addresses are different from the video addresses. In all formats, the arrangement of the audio blocks is designed to maximize data integrity in the presence of tape defects and head clogs. The allocation of the audio channels to the sectors is often changed from one segment to the next. If a linear tape scratch damages the data in a given audio channel in one segment, it will damage a different audio channel in the next. Thus the scratch damage is shared between all four audio channels, each of which need correct only one quarter of the damage. It will also be seen that the relationship of the audio channels to the physical tracks rotates by one track against the direction of tape movement from one audio sector to the next. The effect of this is that, if a head becomes clogged, the errors will be distributed through all audio channels, instead of causing severe damage in one channel. In the D-2 format the audio blocks are at the ends of the head sweeps; the audio information is split so that half is recorded at each edge of the tape, and each half will be played with a different head.
In each sector, the track commences with a preamble to synchronize the phase-locked loop in the data separator on replay. Each of the sync blocks begins, as the name suggests, with a synchronizing pattern which allows the read sequencer to deserialize the block correctly. At the end of a sector, it is not possible simply to turn off the write current after the last bit, as the turnoff transient would cause data corruption. It is necessary to provide a postamble such that current can be turned off away from the data. It should now be evident that any editing has to take place a sector at a time. Any attempt to rewrite one sync block would result in damage to the previous block owing to the physical inaccuracy of replacement, damage to the next block due to the turnoff transient, and inability to synchronize to the replaced block because of the random phase jump at the point where it began. The sector in a DVTR is analogous to the cluster in a disk drive. Owing to the difficulty of writing in exactly the same place as a previous recording, it is necessary to leave tolerance gaps between sectors where the write current can turn on and off to edit individual write blocks. For convenience, the tolerance gaps are made the same length as a whole number of sync blocks. The first half of the tolerance gap is the postamble of the previous block, and the second half of the tolerance gap acts as the preamble for the next block. The tolerance gap following editing will contain, somewhere in the centre, an arbitrary jump in bit phase, and a certain amount of corruption due to turnoff transients. Provided that the postamble and preamble remain intact, this is of no consequence.
The number of audio sync blocks in a given time is determined by the number of video fields in that time. It is only possible to have a fixed tape structure if the audio sampling rate is locked to video. With 625/50 machines, the sampling rate of 48 kHz results in exactly 960 audio samples in every field.
For use on 525/60, it must be recalled that the 60 Hz is actually 59.94 Hz. As this is slightly slow, it will be found that in sixty fields, exactly 48 048 audio samples will be necessary. Unfortunately 60 will not divide into 48 048 without a remainder. The largest number which will divide 60 and 48 048 is 12; thus in 60/12 = 5 fields there will be 48 048/12 = 4004 samples. Over a five-field sequence the product blocks contain 801, 801, 801, 801 and 800 samples respectively, adding up to 4004 samples.
In order to comply with the AES/EBU digital audio interconnect, wordlengths between sixteen and twenty bits can be supported, but it is necessary to record a code in the sync block to specify the wordlength in use. Pre-emphasis may have been used prior to conversion, and this status is also to be conveyed, along with the four channel-use bits. The AES/EBU digital interconnect (see Chapter 8) uses a block-sync pattern which repeats after 192 sample periods corresponding to 4 ms at 48 kHz. He who confuses block sync with sync block is lost. Since the block size is different from that of the DVTR interleave block, there can be any phase relationship between interleave-block boundaries and the AES/EBU block-sync pattern. In order to re-create the same phase relationship between block sync and sample data on replay, it is necessary to record the position of block sync within the interleave block. It is the function of the interface control word in the audio data to convey these parameters. There is no guarantee that the 192-sample block-sync sequence will remain intact after audio editing; most likely there will be an arbitrary jump in block-sync, phase. Strictly speaking, a DVTR playing back an edited tape would have to ignore the block-sync positions on the tape, and create new block sync at the standard 192-sample spacing. Unfortunately the DVTR formats are not totally transparent to the whole of the AES/EBU data stream, as certain information is not recorded.
Stationary-head digital audio recorders have fixed heads like an analog recorder and often resemble their analog ancestors closely. Stationary head multi-track recorders were developed in preference to rotary head because of the perceived need to support splicing and because the electronic circuitry required was simpler. Stationary head recording is not as efficient as rotary and in the long term the familiar digital multi-track will give way to rotary cassette-based formats with electronic editing. The stereo stationary head PCM recorder has already succumbed to DAT and hard disks in professional use. The use of compression allows the efficiency problem to be overcome for consumer products and this resulted in the digital compact cassette (DCC).
Professional stationary-head recorders were specifically designed for record production and mastering, and had to be able to offer all the features of an analog multitrack. Digital multitracks mimicked analog machines so exactly that they could be installed in otherwise analog studios with the minimum of fuss. When the stationary head formats were first developed, the necessary functions of a professional machine were: independent control of which tracks record and play, synchronous recording, punch-in/punch-out editing, tape-cut editing, variable-speed playback, offtape monitoring in record, various tape speeds and bandwidths, autolocation and the facilities to synchronize several machines.
In both theory and practice a rotary-head recorder can achieve a higher storage density than a stationary-head recorder, thus using less tape. When multitrack digital audio recorders were first proposed, the adaptation of an analog video-recorder transport had to be ruled out because it lacked the necessary bandwidth. For example, a 24-track machine requires about 20 megabits per second. A further difficulty is that helical-scan recorders were not designed to handle tape-cut edits which were then considered necessary. Accordingly, multitrack digital audio recorders evolved with stationary heads and open reels; they look like analog recorders, but offer sufficient bandwidth and support splicing.
A stationary-head digital recorder is basically quite simple, as the block diagram of Figure 9.29 shows. The transport is not dissimilar to that of an analog recorder. The tape substrate used in professional analog recording is quite thick to reduce print-through, whereas in digital recording, the tape is very thin, rather like videotape, to allow it to conform closely to the heads for short-wavelength working. Print-through is not an issue in digital recording. The roughness of the backcoat has to be restricted in digital tape to prevent it embossing the magnetic layer of the adjacent turn when on the reel, since this would nullify the efforts made to provide a smooth surface finish for good head contact. The roughness of the backcoat allows the boundary layer to bleed away between turns when the tape is spooled, and so digital recorders do not spool as quickly as analog recorders. They cannot afford to risk the edge damage which results from storing a poor tape pack. The digital transport has rather better tension and reel-speed control than an analog machine. Some transports offer a slow-wind mode to achieve an excellent pack on a tape prior to storage.
Control of the capstan is rather different too, being more like that of a video recorder. The capstan turns at constant speed when a virgin tape is being recorded, but for replay, it will be controlled to run at whatever speed is necessary to make the offtape sample rate equal to the reference rate. In this way, several machines can be kept in exact synchronism by feeding them with a common reference. Variable-speed replay can be achieved by changing the reference frequency. It should be emphasized that, when variable speed is used, the output sampling rate changes. This may not be of any consequence if the samples are returned to the analog domain, but it prevents direct connection to a digital mixer, since these usually have fixed sampling rates.
The major items in the block diagram have been discussed in the relevant chapters. Samples are interleaved, redundancy is added, and the bits are converted into a suitable channel code. In stationary-head recorders, the frequencies in each head are low, and complex coding is not difficult. The lack of the rotary transformer of the rotary-head machine means that DC content is less of a problem. The codes used generally try to emphasize density ratio, which keeps down the linear tape speed, and the jitter window, since this helps to reject the inevitable crosstalk between the closely spaced heads. DC content in the code is handled using adaptive slicers as detailed in Chapter 6. On replay there are the usual data separators, timebase correctors and error-correction circuits.
The DASH8 format was the most successful of the stationary-head formats. It was not one format as such, but a family of like formats, supporting a number of different track layouts. With ferrite-head technology, it was possible to obtain adequate channel SNR with 24 tracks on half-inch tape (H) and eight tracks on quarter-inch tape (Q). The reason that these numbers are not pro-rata is that the same number of analog and control tracks are necessary for both, and take up proportionately more space on the narrower tape. This gave rise to the single-density family of formats known as DASH I. The most successful member of this family was the Sony PCM-3324.
The dimensions of the 24-track tape layout are shown in Figure 9.30. The analog tracks are placed at the edges where they act as guard bands for the digital tracks, protecting them from edge lifting. Additionally there is a large separation between the analog tracks and the digital tracks. This prevents the bias from the analog heads from having an excessive erasing effect on the adjacent digital tracks. For the same reason AC erase may have to be ruled out. One alternative mechanism for erasure of the analog tracks is to use two DC heads in tandem. The first erases the tape by saturating it, and the second is wound in the opposite sense, and carries less current, to return the tape to a near-demagnetized state.
In the half-inch format, the timecode and control tracks are placed at the centre of the tape, where they suffer no more skew with respect to the digital tracks than those at the edge of quarter-inch tape in the presence of tape weave.
The construction of a bulk ferrite multitrack head is shown in Figure 9.31, where it will be seen that space must be left between the magnetic circuits to accommodate the windings. Track spacing is improved by putting the windings on alternate sides of the gap. The parallel close-spaced magnetic circuits have considerable mutual inductance, and suffer from crosstalk. This can be compensated when several adjacent tracks record together by cross-connecting antiphase feeds to the record amplifiers.
Using thin-film heads, the magnetic circuits and windings are produced by deposition on a substrate at right angles to the tape plane, and as seen in Figure 9.32 they can be made very accurately at small track spacings. Perhaps more importantly, because the magnetic circuits do not have such large parallel areas, mutual inductance and crosstalk are smaller, allowing a higher practical track density.
The so-called double-density version, known as DASH II, uses such thin-film heads to obtain 48 digital tracks on half-inch tape and sixteen tracks on quarter-inch tape. The 48-track version of DASH II is shown in Figure 9.33 where it will be seen that the dimensions allow 24 of the replay head gaps on a DASH II machine to align with and play tapes recorded on a DASH I machine. In fact the PCM-3348 could take 24-track tapes and record a further 24 tracks on them.
The DASH format supported three sampling rates and the tape speed is normalized to 30 in./s at the highest rate. The three rates are 32 kHz, 44.1 kHz and 48 kHz. This last frequency was originally 50.4 kHz, which had a simple fractional relationship to 44.1 kHz, but this was dropped in favour of 48 kHz when arbitrary sampling rate conversion was shown to be feasible. In fact most stationary-head recorders will record at any reasonable sampling rate just by supplying them with an external reference, or word clock, at the appropriate frequency. Under these conditions, the sampling-rate switch on the machine only controls the status bits in the recording which set the default playback rate.
In the digital domain it is quite easy to distribute samples from one audio channel over a number of tape tracks. In DASH-F, the fast version, one audio track requires one tape track, and the tape moves at its greatest speed. In DASH-M, the medium version, one audio channel is spread over two tape tracks, and the tape runs at half speed. In DASH-S, the slow version, one audio channel is spread over four tape tracks, and the tape runs at one quarter speed. In twin DASH, the data corresponding to one audio channel are recorded twice, giving advantages in splice tolerance. Clearly the number of audio channels must be halved in twin DASH-F, but in DASH-M and DASH-S, the tape speed could be doubled instead.
By way of example, the well-known PCM-3324 is a DASH-FIH machine:
F | = | Fast format, one channel per track | |
I | = | Single density | |
H | = | Half-inch tape, hence 24 tape tracks and 24 audio channels |
The track-allocation mechanisms for S, M and F are shown in Figure 9.34 which also depicts the relationship with the control track.
The error-correction strategy of DASH is to form codewords which are confined to single-tape tracks. DASH uses cross-interleaving, which was described in principle in Chapter 7. In all practical recorders measures have to be taken for the rare cases when the error correction is overwhelmed by gross corruption. In open-reel stationary-head recorders, one obvious mechanism is the act of splicing the tape and the resultant contamination due to fingerprints.
The use of interleaving is essential to handle burst errors; unfortunately it conflicts with the requirements of tape-cut editing. Figure 9.35 shows that a splice in cross-interleave destroys codewords for the entire constraint length of the interleave. The longer the constraint length, the greater the resistance to burst errors, but the more damage is done by a splice.
In order to handle dropouts or splices, samples from the convertor or direct digital input are first sorted into odd and even. The odd/even distance has to be greater than the cross-interleave constraint length. In DASH, the constraint length is 119 blocks, or 1428 samples, and the odd/ even delay is 204 blocks, or 2448 samples. In the case of a severe dropout, after the replay de-interleave process, the effect will be to cause two separate error bursts, first in the odd samples, then in the even samples. The odd samples can be interpolated from the even and vice versa in order to conceal the dropout. In the case of a splice, samples are destroyed for the constraint length, but Figure 9.36 shows that this occurs at different times for the odd and even samples. Using interpolation, it is possible simultaneously to obtain the end of the old recording and the beginning of the new one. A digital crossfade is made between the old and new recordings.
The interpolation during concealment and splices causes a momentary reduction in frequency response which may result in aliasing if there is significant audio energy above one quarter of the sampling rate. This was overcome in twin DASH machines in the following way. All incoming samples will be recorded twice, which means twice as many tape tracks or twice the linear speed is necessary. The interleave structure of one of the tracks will be identical to the interleave already described, whereas on the second version of the recording, the odd/even sample shuffle is reversed. When a gross error occurs in twin DASH, it will be seen from Figure 9.37 that the result after de-interleave is that when odd samples are destroyed in one channel, even samples are destroyed in the other. By selecting valid data from both channels, a full bandwidth signal can be obtained and no interpolation is necessary. In the presence of a splice, when odd samples are destroyed in one track, even samples will be destroyed in the other track. Thus at all times, all samples will be available without interpolation, and full bandwidth can be maintained across splices. Figure 9.38 shows the results of a splice in twin DASH. The status bits in the control track of twin DASH reflect the use of twin recording.
DCC is a stationary-head format in which the tape transport is designed to play existing analog Compact Cassettes in addition to making and playing digital recordings. This backward compatibility means that an existing Compact Cassette collection can still be enjoyed whilst newly made or purchased recordings will be digital.9 To achieve this compatibility, DCC tape is the same width as analog Compact Cassette tape (3.81 mm) and travels at the same speed (in./s or 4.76 cm/s). The formulation of the DCC tape is different; it resembles conventional chrome video tape, but the principle of playing one ‘side’ of the tape in one direction and then playing the other side in the opposite direction is retained.
Although the DCC cassette has similar dimensions to the Compact Cassette so that both can be loaded in the same transport, the DCC cassette is of radically different construction. The DCC cassette only fits in the machine one way, it cannot be physically turned over as it only has hub drive apertures on one side. The head access bulge has gone and the cassette has a uniform rectangular cross-section, taking up less space in storage. The transparent windows have also been deleted as the amount of tape remaining is displayed on the panel of the player. This approach has the advantage that labelling artwork can cover almost the entire top surface. The same approach has been used in pre-recorded MiniDiscs (see Chapter 12). As the cassette cannot be turned over, all transports must be capable of playing in both directions. Thus DCC is an auto-reverse format. In addition to a record lockout plug, the cassette body carries identification holes. Combinations of these specify six different playing times from 45 min to 120 min as in Table 9.1.
The apertures for hub drive, capstans, pinch rollers and heads are covered by a sliding cover formed from metal plate. The cover plate is automatically slid aside when the cassette enters the transport. The cover plate also operates hub brakes when it closes and so the cassette can be left out of its container. The container fits the cassette like a sleeve and has space for an information booklet.
DCC uses a form of data reduction which Philips call Precision Adaptive Sub-band Coding (PASC). PASC is based on MPEG audio compression as described in Chapter 5 and its use allows the recorded data rate to be about one quarter that of the original PCM audio. This allows for conventional chromium tape to be used with a minimum wavelength of about one micrometre instead of the more expensive high-coercivity tapes normally required for use with shorter wavelengths. The advantage of the conventional approach with linear tracks is that tape duplication can be carried out at high speed. This makes DCC attractive to record companies. Even with data reduction, the only way in which the bit rate can be accommodated is to use many tracks in parallel.
Figure 9.39 shows that in DCC audio data are distributed over eight parallel tracks along with a subcode track which together occupy half the width of the tape. At the end of the tape the head rotates about an axis perpendicular to the tape and plays the remaining tracks in reverse. The other half of the head is fitted with magnetic circuits sized for analog tracks and so the head rotation can also select the head type which is in use for a given tape direction.
However, reducing the data rate to one quarter and then distributing it over eight tracks means that the frequency recorded on each track is only 96 kbits/s or about that of a PCM machine recording a single audio channel with a single head. The linear tape speed is incredibly low by stationary-head digital standards in order to obtain the desired playing time. The rate of change of flux in the replay head is very small due to the low tape speed, and conventional inductive heads are at a severe disadvantage because their self-noise drowns the signal. Magneto-resistive heads are necessary because they do not have a derivative action, and so the signal is independent of speed. A magnetoresistive head uses an element whose resistance is influenced by the strength of flux from the tape and its operation was discussed in Chapter 6. Magneto-resistive heads are unable to record, and so separate record heads are necessary. Figure 9.40 shows a schematic outline of a DCC head. There are nine inductive record heads for the digital tracks, and these are recorded with a width of 185 μm and a pitch of 195 μm. Alongside the record head are nine MR replay gaps. These operate on a 70 μm band of the tape which is nominally in the centre of the recorded track. There are two reasons for this large disparity between the record and replay track widths. First, replay signal quality is unaffected by a lateral alignment error of ±57 μm and this ensures tracking compatibility between machines. Second, the loss due to incorrect azimuth is proportional to track width and the narrower replay track is thus less sensitive to the state of azimuth adjustment. In addition to the digital replay gaps, a further two analog MR head gaps are present in the replay stack. These are aligned with the two tracks of a stereo pair in a Compact Cassette.
The twenty-gap head could not be made economically by conventional techniques. Instead it is made lithographically using thin film technology.
Tape guidance is achieved by a combination of guides on the head block and pins in the cassette. Figure 9.41 shows that at each side of the head is fitted a C-shaped tape guide. This guide is slightly narrower than the nominal tape width. The reference edge of the runs against a surface which is at right angles to the guide, whereas the non-reference edge runs against a sloping surface. Tape tension tends to force the tape towards the reference edge. As there is such a guide at both sides of the head, the tape cannot wander in the azimuth plane. The tape wrap around the head stack and around the azimuth guides is achieved by a pair of pins behind the tape which are part of the cassette. Between the pins is a conventional sprung pressure pad and screen.
Figure 9.42 shows a block diagram of a DCC machine. The audio interface contains convertors which allow use in analog systems. The digital interface may be used as an alternative. DCC supports 48, 44.1 and 32 kHz sampling rates, offering audio bandwidths of 22, 20 and 14.5 kHz respectively with eighteen-bit dynamic range. Between the interface and the tape subsystem is the PASC coder. The tape subsystem requires error-correction and channel coding systems not only for the audio data but also for the auxiliary data on the ninth track.
As explained in the text, conversion from one timecode standard to Pro R time consists of finding the number of the last timecode frame completed before the beginning of the current Pro R time frame. The beginning of both frames is then expressed in real time, and the timecode marker (TCM) measures the difference between them, in sample periods Ts.
The upper part of Figure 9A.1 shows EBU timecode frames of period TC beginning from time zero. The number of complete timecode frames before the DAT frame in question begins is the Timecode Frame Count, FC, which is an integer.
The lower section of the diagram shows DAT frames of period TD, which did not necessarily begin at time zero. The DAT Offset DO, which is a constant, measures the relationship between the beginning of the first DAT frame and time zero.
The DAT Frame Count FD is the number of completed DAT frames before the one in question. The time difference between the beginnings of the respective frames is TCM Ts.
The absolute time at the beginning of a given DAT frame is:
FD TD + OD
The timecode difference TCM TS is simply the absolute time expressed Modulo-TC. Thus:
1. | Yamada, Y., Fujii, Y., Moriyama, M. and Saitoh, S., Professional use PCM audio processor with a high efficiency error-correction system. Presented at the 66th Audio Engineering Society Convention, (Los Angeles, 1980), Preprint 1628(G7) |
2. | Ishida, Y., Nishi, S., Kunii, S., Satoh, T. and Uetake, K., A PCM digital audio processor for home use VTRs. Presented at the 64th Audio Engineering Society Convention (New York, 1979), Preprint 1528 |
3. | Griffiths, F.A., A digital audio recording system. Presented at the 65th Audio Engineering Society Convention (London, 1980), Preprint 1580(C1) |
4. | Nakajima, H. and Odaka, K., A rotary-head high-density digital audio tape recorder. IEEE Trans. Consum. Electron., CE-29, 430–437 (1983) |
5. | Itoh, F., Shiba, H., Hayama, M. and Satoh, T., Magnetic tape and cartridge of R-DAT. IEEE Trans. Consum. Electron., CE-32, 442–452 (1986) |
6. | Hitomi, A. and Taki, T., Servo technology of R-DAT. IEEE Trans. Consum. Electron., CE-32, 425–432 (1986) |
7. | Kudelski, S., et al., Digital audio recording format offering extensive editing capabilities. Presented at the 82nd Audio Engineering Society Convention (London, 1987), Preprint 2481(H-7) |
8. | Doi, T.T., Tsuchiya, Y., Tanaka, M. and Watanabe, N., A format of stationary-head digital audio recorder covering wide range of applications. Presented at the 67th Audio Engineering Society Convention, (New York, 1980), Preprint 1677(H6) |
9. | Lokhoff, G.C.P., DCC: Digital compact cassette. IEEE Trans. Consum. Electron., CE-37, 702–706 (1991) |
44.200.94.150