Chapter 2

Introduction to Digital Video

This introduction to digital video technology will describe some of the fundamental common features of all of the formats that make up digital video, as well as some of the distinguishing features among them. Chapter 1 has covered the fundamental sound differences, and to delve more deeply we will have to venture into the picture side of the equation, because the distinguishing features among recording formats are often related to the picture. Although there are some distinguishing sound differences among digital video formats, one would have to say that they are smaller than the range of picture qualities. Thus a practical-sized sound book can cover a lot more formats than could a similarly detailed picture-related book, and learning about just a few distinguishing issues for sound formats makes it possible to cover the whole field.

The rapid development of digital video over the past several years has set the stage for a number of new workflows and recording paradigms to emerge, principally involving the refinement of video compression schemes and the proliferation of tapeless media such as memory cards, optical discs, and hard disks. What these systems share with the older tape-based formats that first made digital video viable is the method of recording digital signals to represent distinctly analog phenomena: traveling waves of sound and light. Moreover, many of the basic techniques, compromises, and audio trade-offs are the same for tape-based and newer file-based cameras. What newer file-based systems add is a non-linear recording paradigm, extending the non-linear, random-access nature of digital editing to the world of production video capture. The process of digital recording, the major characteristics of digital video, and the peculiarities that inform sound recording for digital video projects will be covered in turn.

Basic Digital

The basic process of capturing picture and sound for all digital formats involves conversion of the continuously varying signals produced by the image and sound sensing devices into numbers. Digital video cameras use one or more image sensors, typically charge-coupled devices (CCDs) or complementary metal-oxide silicon (CMOS) sensors, to convert the varying brightness of the three-color components of light striking the sensor into a numerical value for each pixel. For sound, the moment-by-moment amplitude of the sound pressure level versus time is converted by one or more microphones into an electrical voltage versus time, which is then represented as a series of numbers. In each case, it is the amplitude of the signal that changes in time and is recorded as a numerical value at specific moments in time, once per frame for the video and typically thousands of times per second for digital audio. The reliability of storing numbers rather than directly recording the smoothly and continuously varying signals that constitute analog phenomena is a primary justification for digital techniques. The advantage of the digital technique is that copies are literally clones of their masters if certain prerequisites are met: copies are then just as good as the original. This is never true with analog techniques, where each stage of copying exhibits some inevitable generation loss, blurring the variations in detail and tone that made up the original work the way a copy machine introduces slight imperfections to a picture through the generations.

This difference between analog and digital reproduction makes digital techniques well suited to distribution channels, where getting it right for large numbers of copies is important. That is why the impact of digital methods on the sound industry in particular began with the growth of a distribution format, the CD (compact disc). Later it was found that digital techniques were valuable in sound production, not just replication, once they were made to exhibit all of the qualities necessary. In digital video, the first users were high-end postproduction companies that needed to maintain quality among the generations needed to make video masters. Later in the development of digital techniques came the explosion caused by the near-simultaneous introduction of DV and DVD for capture and distribution. Ultimately, digital video introduced many advantages as it began to replace older analog workflows. For example, digital processing makes it possible to select one particular color in a scene—say that of the grass skirts of Hawaiian dancers—and change it from green to chartreuse without affecting anything else in the scene, something that conventional film chemical and exposure techniques could never achieve.

Another advantage that has helped digital video solidify its position in the marketplace is cost. Digital camcorders and cameras have effectively ousted analog video cameras from the market since the turn of the century because of the tremendous quality that can be achieved by digital means at relatively low cost. Compared to shooting on film or high-end digital cinema formats, high-definition digital video is eminently affordable, offering high-quality image and sound recording at great value. In particular, cameras with professional audio features such as external microphone inputs and manual level controls allow high-quality sound to be recorded alongside the picture, while growth in the digital audio recording market has made double-system recording viable for many independent productions shooting on DSLRs and other cameras without professional audio capture built in. So digital technology, as it develops, continues to bring the possibility of recording professional-looking (and sounding) programs to a wider base of people than ever before.

Some Basic Video for Audio People

Frame Rates

Film and video share a common feature: both produce moving images as a sequence of frames taken at uniform intervals through time that are typically not seen separately, but instead blend together into continuous motion through our persistence of vision. A purely mechanical 19th-century invention, the Zoetrope, shows that blending still images together through time can produce the illusion of motion.

Theatrical films have used a frame rate of 24 fps ever since the coming of sound forced standardization in this area, around 1927. This is a universal standard, used around the globe. Actually, in most instances each frame is shown twice on the cinema screen before being advanced to the next frame, so there are 48 flashes showing 24 unique pictures each second. The reason for this has to do with the perception of flicker. Shown at just 24 fps and at the brightness used in theaters, pictures seem to flicker (the term flicks used as slang to describe the movies probably originated from this effect).

U.S.-based NTSC video started out at 30 fps. The reason it could not be made 24 fps was the higher brightness of television compared to theatrical exhibition. Even showing each frame twice still produced visible flicker at 24 fps, because human perception perceives more flicker at higher brightness, and television sets were routinely operated some two to six times brighter than theater screens. So the black-and-white television standard adopted was 30 fps. Films shown on television used a convention of showing some of the frames twice in some of the video fields (half-frames)1, so that while the film traveled through the equipment at 24 fps, 30 video frames per second were produced. This is called a 3:2 sequence (sometimes 2:3) and is illustrated in Figure 2-1.

Figure 2-1 Films shown on television or transferred to video in the U.S. are fit to the higher frame rate of NTSC television by a process called 3:2 insertion, resulting in a sequence of images with portions of some film frames doubled.

Then came color. Color television standardization had the requirement that the color picture had to be fit into the same broadcast space on the air—called bandwidth —which black-and-white television had been using. The color component of the broadcast, transmitted as a subcarrier, could not interfere with the black-and-white broadcast, nor with the sound. In order to accommodate all of these requirements, among other things the frame rate was changed slightly, to 29.97 fps,2 for color broadcasts. An ordinary black-and-white set had no problem recovering a picture that was slightly off speed, and a color set could recover the color subcarrier and audio with minimal interference, so this rate was standardized.

In Europe, the standard adopted for color television was 25 fps, and U.S. theatrical films are portrayed on European television by being played 4 percent faster, so that 25 film frames are shown on the television each second. Audio played faster to maintain sync with the image sounds higher-pitched if untreated, so film to video transfers in Europe often have an attendant pitch increase in the sound. This effect is noticeable even on actor’s voices, so it should be corrected in transfer through the use of a hardware or software device called a pitch shifter, but in practice only certain titles are treated this way to recover the correct final pitch. Whereas pitch correction of time-scaled material was formerly thought to introduce artifacts into the audio and was in limited use, recent advances in digital processing have improved the output quality that can be expected from professional equipment.

For film shot for theatrical release at exactly 24 fps, the conversion for postproduction editing to NTSC video involves the 2:3 field sequence of repetition to get to a nominal 30 fps, and a slow down on the telecine or film scanner to 23.976 fps, together called 2:3 (or 3:2) pulldown. This means that sound recorded double-system also has to be slowed down by the same amount to maintain synchronization. This is most often accomplished by simply playing the audio 0.1% slower than it was originally recorded, whether off an analog tape or a digital bitstream. For example, digital audio recorded on set at 48 kHz will only be played back during telecine or editing at a rate of 47,952 samples per second, and will often be resampled to match the 48 kHz sample rate of audio recorded in postproduction in sync with the pulled-down picture. One way of avoiding the need to resample digital audio when film is pulled down to video speed is to record audio on set at 48,048 Hz (sampling 0.1% more often per second) so that, when copied into the postproduction editing environment and run at 48 kHz, it will maintain sync, having been slowed down the same amount as the picture.3 Unlike the effect of the 4% speedup of film transferred to 25 fps video, the slight change in pitch (0.1%) attendant with pulldown to NTSC video is not noticeable4 and need not be corrected.

The previous discussion demonstrates one of the difficulties of dealing with film originals in an environment that will include video postproduction. Video original for video post is easier, but the standards are less universal, and adapting to them is potentially more complicated. For instance, if you shoot NTSC video and wish to release internationally, conversion to PAL can be complicated, expensive, and imperfect. An emerging method of making video more universal is to capture 24 fps (using the 24P mode on a digital video camera) and treat the video like a film original, as explained later.

Interlaced Video

Most video cameras operate at 29.97 fps for NTSC or 25 fps in PAL countries in interlaced format. Interlaced scanning means that each frame of video is split into two fields, or half-frames, where the odd-numbered lines 1, 3, 5, and so on of each frame are scanned by the camera during recording (and by the television during playback) in one pass, followed by the even-numbered lines (2, 4, 6, …) in a separate pass. Each video field is evenly spaced in time so that a camera recording 29.97 fps video is actually recording fields at 59.94 fields per second, and a 25 fps camera captures fields at 50 fields per second.

The reasons for interlacing the picture in early television systems were practical: refreshing the television display 50 or 60 times per second resulted in less visible flicker than only refreshing at 25 or 30 Hz, just as film projected with a double shutter to show an image 48 times a second flickered less than true 24 fps projection in theaters. But while film projectors illuminated an entire frame of film at once, television sets scanned images one row at a time, and the time required for the electron gun to scan each line meant that only half a frame of video could be displayed in 1/60th of a second at the desired frame size. Rather than settling for a lower vertical resolution and scanning each frame twice, the full vertical resolution of what we call standard-definition television was maintained and only half the image (a single video field) was scanned in one 1/60th of a second pass. Then the electron gun returned to the top of the screen and scanned the second field of the image in the next 1/60th of a second. This method of scanning and transmitting video was called interlaced because the even and odd scanlines, stored and transmitted separately on the medium, were interlaced back together visually during playback.

When high-definition television (HDTV) was introduced, the standards included both interlaced and progressively scanned (non-interlaced) modes of operation. During its adoption, HDTV existed (and still, to some extent, does exist) alongside traditional standard-definition content and distribution channels, so interlacing remained partly as a historical artifact of standard-definition systems. However, the practical advantage of increasing the effective vertical resolution that can be represented in a given bandwidth is another reason interlaced broadcast remains popular for some high-resolution signals: it is a space-saving measure. With cathode-ray tube displays largely supplanted in the marketplace by newer technologies, such as LCD and plasma displays, which don’t rely on an electron scanning gun to illuminate an image pixel by pixel, true interlaced signals must be de-interlaced by modern displays. That fact alone may be a driving factor in progressively scanned video becoming increasingly common; indeed, progressive scanning is also considered desirable as part of the “film look” camera manufacturers advertise when promoting 24P modes on their products—the P stands for “progressive.”

If that’s the case, why do we say that most cameras still operate in interlaced format? The answer is threefold. First, many new cameras still offer the option of shooting standard-definition video at lower data rates, which is typically recorded in the U.S. as interlaced NTSC video. Second, since high-definition standards include both interlaced and non-interlaced format options, most camera manufacturers offer 60i (technically 59.94 Hz interlaced) or 50i recording in addition to 30p or 25p non-interlaced scanning and, often, a 24p “film” mode.5 Finally, even the progressive scan modes on most digital video cameras store images in an interlaced format on the medium, taking the even and odd scanlines captured at a single moment in time and storing them into the two fields of the image as if they were captured separately. This results in an image that does not need to be de-interlaced for display on a computer or modern television even though it is stored in an interlaced container format on tape or disc. When digital video cameras with progressive scanning first came to market, the existing video infrastructure and the DV codec family almost universally in use dictated that interlaced NTSC video be recorded on tapes, so 24p and other progressively scanned modes were fit into the existing infrastructure using techniques such as the 3:2 sequence familiar from film to NTSC transfers. The technique of fitting progressive content into an interlaced format persists even in tapeless devices today because, in addition to improving backward compatibility, it is easier to build a camera with a single recording format—interlaced video—and treat progressively scanned images as if they were interlaced.6

“Film” Look

There are a number of reasons video looks different from film, even when filmed images are transferred to video and viewed on television. One is the size of the sensor area of a video camera compared to the area exposed on a film negative. The CCD or CMOS sensor on a typical digital video camera is much smaller than a full-frame 35 mm negative, so it is much harder to achieve shallow depth of field effects on video than it is with film. In other words, it is difficult to keep just the subject in focus and throw the background out of focus, a common technique in film cinematography, using video.

Another is the higher frame rate of NTSC video compared to 24 fps film, combined with the fact that video has traditionally been captured in interlaced format whereas film cameras expose the entire frame at once. To compensate for some of these differences, producers have increasingly relied on the “film look” of 24P for entertainment shot on video but intended to have the emotional and dramatic scope of motion pictures traditionally shot on film. As explained above, 24P also offers an emerging pathway to a more universal video master format than 25 fps or 29.97 fps modes, which require conversion for video to be transferred from one system to another.

The 24P mode on a video camera means that 24 frames of video are captured each second using progressive scanning, reading each line from the image sensor in order (1, 2, 3, 4, …) at a single moment in time, as opposed to interlaced video captured one field at a time. One problem with interlaced scanning that progressive scanning mitigates is a lack of sharpness when the picture moves noticeably between the first and second fields of a frame, which would turn a moving vertical line into a zigzag, for instance. If you look closely, you can sometimes see this effect on interlaced video when the camera pans and then comes to a stop: once the pan is over, vertical edges are sharper than when the camera was moving. The laser disc version of Jurassic Park, which uses interlaced video, shows this effect when the bad guy is stealing the genetic material of the dinosaurs from the vault. The camera pans over the vertical lines of bars in the vault, and before and after the pan, the vertical lines are sharp; but while the camera is panning, the vertical lines appear jaggy on a conventional television with interlaced scanning.

Another advantage that makes progressive scanning seem more film-like is the increased density of information each time the picture on a modern progressive display is refreshed. With each interlaced frame split into two fields 1/60th of a second apart in time, the lines written during the first half of an interlaced scan are fading by the time the second half is written.7 We perceive this as lesser sharpness in interlaced compared to progressive scanning. Thought of another way, interlaced scanning produces one-half the scanning lines in 1/60th of a second, but progressive produces all of the lines in the same amount of time, so more information is presented per unit time by progressive scanning, and this is seen as better sharpness.

Even though it is called 24P, the actual rate at which frames are captured is 23.976 fps on most camcorders in the U.S. because the video is encapsulated into a 29.97 fps interlaced signal for recording, as described above. The two fields of interlaced video recorded on tape can be put back together in digital editing systems to reproduce the progressively scanned image as it was originally read off the camera’s image sensor, so the benefits of progressive scan video are preserved by this arrangement. With video editing software set to correctly recover the 24 original frames out of every 30 recorded frames, 24P footage can be edited directly at 23.976 fps, and the only practical difference is the 0.1% slowdown from film to NTSC speed (24 fps to effectively 23.976 fps) caused by the camera clock running at NTSC speed.8 This speed differential can be reversed by “pulling up” the video to film speed in order to create a true 24 fps film print, with a corresponding time correction to the audio accompanying picture, or the video can be sped up by about 4 percent and output in PAL format for European distribution and to other parts of the world using the PAL system. Of course, the video can also be exported back to NTSC format tapes or discs by repeating fields in a 3:2 sequence like the camcorder would have during capture or a telecine would have for a film transfer to video, and this output can be used with equipment based on the NTSC system traditionally used in the United States, Japan, and other parts of the world. Thus 24P camera originals become source masters useful for film release and for transfer to both of the world’s major video standards, kind of a Great Grand Master.

While most camcorders and even DSLRs with a 24P mode actually record 23.976 video frames per second,9 some professional cameras like the Sony F900 and various offerings from RED Digital Cinema allow true 24 fps recording in addition to 23.796 fps shooting for compatibility with NTSC video. When recording at one of these speeds and playing a tape back at the other, or playing a file back at the other speed with audio pulled up or down in software to match, this difference is not noticeable. However, if the sound is separated from the picture and the assumption about speed is made incorrectly, the audio will no longer sync with picture because of the 0.1 percent speed difference. In that case, audio recorded or reproduced at the rate opposite that of the camera will go out of sync visibly within a little over a minute. This is one of the many sources of potential problems for audio, because any sync problem that occurs is almost invariably considered an audio problem even if it is the picture that is running off the stated speed. So it’s important to be aware that consumer and prosumer cameras without separate 23.976 and 24 fps settings may well be operating at 23.976 fps even when they say 24 fps or refer to 24P in menu settings or operating manuals.

Now that 24P recording has become commonly available, the field of digital video is developing to more closely approximate the look of film and high-end digital cinema in other ways as well. One of the driving factors behind the adoption of digital SLRs that shoot video has been the possibility of capturing higher-quality images for a fraction of the cost of high-end digital video cameras. When shooting video on a DSLR, video-graphers have access to the same range of interchangeable lenses used by still photographers on a particular camera. The image sensors in DSLRs with professional-quality movie modes are also larger than their counterparts in most digital camcorders, allowing greater range of depth of field as discussed earlier and what many consider a more artistic look. Some DSLRs, such as the Canon 5D, have so-called “full frame” sensors matching the size of a 35 mm still frame—quite a bit larger than a 35 mm motion picture frame due to the orientation of the image on the negative. Still others, like the Canon 7D, more closely approximate the size of a 35 mm motion picture negative. And even outside the world of DSLRs, higher-end digital video cameras like the Sony F3 now offer sensor sizes comparable to traditional motion picture film frames. Many purpose-built digital video cameras in prosumer and professional lines can also be used with a variety of cinema lenses via 35 mm lens adapters, and the features brought to the table by DSLRs promise to spur future development in this area of the prosumer digital video market as the artistic capabilities of digital video continue to grow.

Resolution and Aspect Ratio

For decades, the most obvious distinguishing feature of film shot for theatrical distribution was its widescreen aspect ratio compared to the narrower 4:3 aspect ratio of video and television broadcast. That’s one reason the 16:9 widescreen aspect ratio was introduced as an integral part of HDTV and high-definition video standards, and continues to be adopted even in emerging markets like mobile and Internet video platforms. Widescreen images can be produced in several ways. One is simply cutting off the top and bottom of the picture with black bars, but this limits resolution of the remaining picture because of the loss of 25 percent of the image area in pixels at the top and bottom of the frame. A second way is to use a squeeze lens, an anamorphic attachment that squeezes the image horizontally so that a 16:9 (1.78:1) picture fits onto a 4:3 (1.33:1) image sensor (Fig. 2-2), and then is unsqueezed in playback on a 16:9 monitor so that it takes up the entire widescreen frame. Both of these methods are used by various film formats: the 1.85:1 aspect ratio commonly used in the United States results from cropping off the top and bottom portion of each frame of a 35 mm print, whereas wider aspect ratios such as 2.35:1 rely on the use of anamorphic lenses.10 The third method of capturing widescreen video is to use native 16:9 chips in the camera, which is what is done today in high-definition cameras designed specifically for the 16:9 aspect ratio.

Figure 2-2 (a) An image in the 16:9 widescreen aspect ratio. (b) The same image shrunk horizontally by an anamorphic lens.

The original prosumer digital video formats (DV, DVCAM, DVCPRO) recorded video at a resolution of 720 × 480 pixels: 480 lines of video with the image sampled horizontally 720 times per line. Although there were more than 1.33 times as many pixel locations on the horizontal axis than on the vertical axis, the image was captured and intended to be displayed in the 4:3 aspect ratio; you could think of the pixels represented on a DV tape as non-square pixels, as shown in Figure 2-3. To shoot in anamorphic widescreen on prosumer DV cameras, anamorphic lenses or, in some cameras like the Canon XL-2, 16:9 CCDs and digital processing were needed to fit the 16:9 widescreen image into the 720 × 480 pixels of standard-definition DV formats.

Figure 2-3 The arrangement of pixels in a 720 × 480 standard-definition DV frame. Notice that the density of pixels is greater along the horizontal axis than along the vertical axis.

While standard-definition digital video uses 480 active lines, high-definition video may use 720 or 1080 lines, either progressively scanned or interlaced. Table 2-1 shows the different high-definition video standards in use in North America. While 1280 × 720 video can be stored or broadcast at 60 full frames per second in progressive scan mode, a 1920 × 1080 image contains more than twice as many pixels, so, to fit into the same bandwidth at comparable image quality, 1080-line video must be interlaced into 60 fields (half-frames) per second or captured at half the frame rate (30 fps) in progressive scan mode. For this reason, 1080 is not necessarily always better than 720; it is on fixed picture, but moving pictures are complicated by the interplay of interlacing and allowable frame rates. In addition to 30 and 60 fps modes, the high-definition standards also allow for 24 fps (and 23.976 fps) video natively, without the need for the 3:2 sequencing used by early NTSC-based 24P systems. So, although many cameras still record 24P images in a 59.94 fps interlaced container format, future development will be in the direction of native 24 fps recording.12

Table 2-1 High-definition and digital standard-definition broadcast formats defined by the Advanced Television Systems Committee (ATSC), which correspond to formats in use for current and future digital video recording systems in North America.

The standardization of multiple high-definition video formats has bearings on sound because of the different frame rates permissible and typically used with each mode, including both NTSC-based and integer frame rates. Although many combinations of resolution and frame rate are possible, the most common formats for high-definition digital video capture are 1080i60, 1080p30, 1080p24, and 720p60,13 usually at NTSC-based (non-integer) frame rates (23.976, 29.97, and 59.94 fps). Generally, if audio is not recorded on camera single-system, an audio recorder would be set to 29.97 fps time code in any of these situations, but to 30 fps time code for material recorded at film-based (integer) frame rates. Time code settings, including the difference between drop-frame and non-drop-frame time code, are discussed below.

Under- and Over-Cranked Camera

Motion-picture cameras may be equipped with variable-speed motors, allowing them to shoot at a precise 24 fps most of the time, but also allowing the operator to run them off speed over a wide speed range for special effects. Until recently, these effects were hard to come by with video, because all the timing signals and so forth in a camera are based on the frame rate— many things have to change together to vary the frame rate. Today, some high-end video cameras offer variable speed recording, but usually only in specific increments. Thus it may be possible with some setups to capture slo-mo or sped-up action by over- or under-cranking the camera, respectively. Audio for over- or under-cranked situations is usually invented in postproduction, although one of the sources used may be a separate recording of the same event recorded in real time and then sped up or slowed down by the same amount as the camera to maintain synchronization.

Operational Matters

Quality Modes and Codecs

Recording raw pixel values uncompressed would take an enormous amount of space on recording media, so digital video today is stored and subsequently transmitted using a data compression scheme known as a codec (short for compressor– decompressor). The DV family of codecs (including DVCAM and DVCPRO) provided a means of recording standard-definition video at bit rates between 25 Mbit/s for consumer variants and 100 Mbit/s for the highest bit rate professional formats, providing about a 2:1 to 5:1 reduction in bit rate compared to uncom-pressed standard-definition video. The recording times and capabilities of DV tape formats varied depending not only on the features and bit rates of the codecs in use, but also on the particular cassette formats in use and the settings selected on the camera. Today, high-definition cameras continue to offer users a trade-off between video quality and recording time, although newer video codecs are in use on high-definition cameras.

Uncompressed high-definition video takes more space than standard-definition material, for obvious reasons, so the compression ratios involved in storing HD video are generally greater than the 5:1 ratio achieved by digital video in its infancy. The most common codecs in use today are MPEG-2 and MPEG-4 variants, including AVC/H.264, and typical compression ratios are on the order of 20:1. The digital video recording landscape mirrors the distribution world of Blu-ray and DVD in that respect, since standard-definition DVDs encode video using the MPEG-2 codec, and both MPEG-2 and AVC/H.264 are standard Blu-ray encoding formats. An important difference between the worlds of capture and distribution lies in how audio is encoded on the medium. The AC-3 audio codec, a perceptual coding scheme also known as Dolby Digital, and similar offerings from DTS are commonly used to compress audio for authoring to DVD, and to a lesser extent Blu-ray, where lossless “high-definition audio” formats like Dolby TrueHD and DTS-HD are now commonly offered as well (see Chapter 11 for information on encoding final audio for distribution). Perceptual coding is well suited to compressing media for distribution, where it will not have to be re-encoded or edited down the line. In capturing video, on the other hand, recording full-quality, uncompressed PCM audio is important in ensuring that audio can be edited and processed in postproduction while maintaining the full fidelity of the original recordings.

Similar to SP and LP recordings on older tape formats, many cameras offer a choice between standard play (SP) and long play (LP), standard quality (SQ) and high-quality (HQ), or other similarly named recording modes. By using a lower video bit rate, low-quality or long-playing modes fit more video onto a recording medium, but at the expense of image quality and often sound quality as well. Accepting the trade-off in picture quality for increased recording time may be useful in recording home videos or simple videos for the web, but with the space availability on high-capacity discs and memory cards today, low bit rate modes are normally not needed in professional or industrial video production.

The caveat for sound is that, on consumer camcorders in particular, compressed video modes may be tied to a lossy audio format, the rationale being that the SQ or LP is designed to save space on the memory card and that more space can be saved by heavily compressing audio as well as video. For home video recordings designed to be played back directly with no editing or postprocessing, this may be fine, depending on the particular codec implementation and bit rate used. But for professional purposes or situations in which audio may be processed and combined with other elements during postproduction, lossy audio recordings can introduce audible artifacts into a program when the audio is decoded (and potentially re-encoded) in the editing room. Indeed, more than a few student and even independent films have been compromised by low-quality, “ringy” audio caused by MPEG encoding on an incorrectly set-up recorder. The important thing when selecting a camera is to make sure that a video mode exists that will record uncompressed PCM audio, and to use it during recording.

Interchangeability

In the early days of video, tape formats were designed to allow video to be recorded on any camera using a particular cassette format and played back on any deck handling that format. In digital video, material could still be interchanged between two Mini DV or two DVCAM cameras, although complications could arise when recording on a camera made by one manufacturer and playing back off another manufacturer’s device. On the other hand, certain DV cassette formats were used by both consumer and professional systems, leading to the question of whether a tape recorded in one digital format would play back on another format of camera. Generally, the answer was that higher-end format decks or cameras could play lower-end tapes, but not the other way around. That was when everyone was using the same codec family (DV/DVCAM) and recording medium (8 mm magnetic tape).

Today, there are so many variables involved in recording an image on tape or, more commonly, memory card or hard disk file systems that it’s unlikely a card recorded on one camera will be readable even on another camera that uses the same media type. Within a single manufacturer, camera, or model line, it may be possible to share cards on a production, but generally each card will be read out to one or more hard drives for backup and ingestion into editing software once it is full, or at a natural break in shooting, and reformatted to start from scratch in whatever camera it ends up in.14 Memory cards, optical discs, and hard disks can all be read by computer15 and backed up there; the camera remains an instrument for capturing video, but is no longer needed as a vehicle for playback and ingestion the way it was with tape-based workflows.

So interchangeability has decreased with growth in the number of camera formats on the market, but on the other hand the necessity of playback from a camera or deck during postproduction has been removed as tape originals no longer represent the archival format they once did. Instead of ensuring that the tapes in use on a production are all interchangeable and can be played back on a particular deck or camera during postproduction, the burden of interchangeability in a tapeless workflow becomes making sure that the files recorded all play nicely in the editing room. And for that, it remains important not to mix wildly different formats and expect them to work together seamlessly. The best advice for both picture and sound recording is to select a method and stick with it. For sound in particular, be sure to select a recording platform that records uncompressed PCM audio that can be used transparently for editing and processing in postproduction.

Off-Line/Online

In one conventional method of working, video is first copied from source tapes at low resolution into an off-line editing system. The lowered resolution allows seeing what is happening, but also permits keeping a great deal of content available simultaneously because of the trade-off between picture quality and running time.16 The picture is edited off-line (without accessing the full quality originals) and the result is an Edit Decision List (EDL), a file containing the starting and ending time codes of each clip included in the program. This EDL, in electronic form, is then taken to an online facility along with the original camera or other source tapes. The online system rapidly re-creates the cuts by selective recording from the source tapes to an Edit Master at full resolution. Off-line time is much cheaper per hour than online suites, which offer not only higher quality but also special effects that have traditionally not been available in basic editing systems.

In contrast, the digitization of video at the camera level and tremendous growth in hard disk storage capacity have today enabled the transfer of full resolution video into non-linear editing systems, giving the editor a clone of the camera original throughout the editing process. Full-resolution editing is possible even when a great deal of content is kept online. The enabling technology is the digital video compression employed between the CCD and the recording medium to reduce the bit rate of even high-definition video to practical levels for recording onto memory cards and storing on hard disk, along with the remarkable gains in storage capacity and falling cost of storage over the last several decades. Thirty years ago, I (Tom Holman) bought a hard drive that cost $2.43 per megabit, whereas today you can buy a drive for $0.00001 per megabit, an improvement in cost-performance of a factor of 240,000:1! This fact demonstrates why editing systems today can afford to handle full-resolution content even for large projects, whereas formerly they could not.

Time Code

Almost all prosumer digital video camcorders, many consumer models, and some DSLRs, have the capability of recording time code in the SMPTE format. The time code 05:15:22:23 means 5 hours, 15 minutes, 22 seconds, and 23 frames, marking a single frame with a number that may be used as an index address to that specific frame. The numbers start at the beginning of a tape or disc at 00:00:00:00 on many consumer models, but on more professional devices the starting number is presettable.

Time code is simple in the hours, minutes, and seconds digits, incrementing upward serially just as a clock would, rolling over from 59 seconds to zero and incrementing the minutes counter, for instance. However, when it comes to the frames counter, things are a little more complicated because of the various frame rates in use in digital video. For NTSC television and NTSC-based formats, the counter increments from zero through 29, then back to zero, incrementing the seconds counter, while for PAL-based formats the frames counter increments from zero through 24, then back to zero at the beginning of each new second. This process thus yields a nominal 30 frames per second (fps) for video in NTSC countries and 25 fps in PAL countries. For true 24P material, the frames counter increments from zero through 23 and then starts from zero once more, incrementing the seconds counter. However, for 24P material embedded in a 50i or 60i video stream as described above, 25 fps or nominal 30 fps time code would be used. One reason for this approach is that material recorded this way is backwards compatible with existing video infrastructure and can be easily played back on any television.

Now let us turn our attention to the word nominal in the foregoing discussion, when 30 fps was used as a term. Although 29.97 fps video in NTSC countries plays fewer than 30 frames on average every second, we don’t count time code in increments smaller than a frame, so the 30 fps time code sequence is used even for 29.97 fps video. However, because frames are now very slightly longer in playing time than those played at precisely 30 fps, an error accumulates over time. The time code is running slow compared to a wall clock, so playing out frames at this rate will result in the program running slightly long. One hour of time code will actually run one hour plus 108 frames of clock time. For many purposes, this error is unimportant. For such uses, the counting sequence that includes all numbers in the normal, incrementing sequence is used, and programs run very slightly long in comparing their time code to a wall clock. The name for this, rather unfortunately, is non-drop-frame (NDF) time code. We’ll come back to why we say the name is unfortunate in a moment.

Alternatively, if you are delivering to a broadcast operation that runs on clock time, 108 frames per hour of error is very important. For time-sensitive operations, a means of keeping the time code in sync with a wall clock is needed, but it can’t be done if all of the frame numbers are used— quite the conundrum. The solution to the problem is to skip 108 frame numbers per hour, jumping over them completely. No frames are in fact skipped, but rather the number sequence skips certain numbers; if video frames were in fact skipped, you would often perceive the jump as a mistake. This is why we consider the name unfortunate, because this code sequence is called drop frame (DF) time code when, in fact, no frames are dropped!

In drop frame (DF) time code, frames 00 and 01 are skipped once every minute (which would yield a difference of 120 frames per hour), except in the minutes 00, 10, 20, 30, 40, and 50 (for a total of 108 frames per hour). So the actual displayed time is not precise at most points in time during an hour but adds up to one displayed hour of time code at the end of one actual hour. You can tell if video has been recorded DF or NDF by seeing whether these particular frame numbers are skipped. A common method of indicating DF code is to change one or more separators of the time code parts from a colon (:) to a semicolon (;).

The choice of NDF or DF code to be made at the time of recording depends on the editing system and the requirements of the delivery master. Most conventional broadcast uses require DF time-coded master tapes, so real-time television operations can be timed by looking at the time code. Thus this requirement backs up into editorial and even, sometimes, into the camera tape specification. At one time it was difficult for off-line editing systems to make the internal calculations necessary for counting frames with DF code, when some numbers were jumped over, and thus the most common way of working was to record NDF code on the camera, work with it editorially, and then make the final delivery master with DF code. In some cases, the choice of video format may affect the type of time code available for use on a particular camera. For example, even on cameras that allow switching between DF and NDF time code, selecting a 24P mode may limit the camera to recording NDF time code.

Today more editing systems can handle DF code internally, so it is possible to shoot it on many cameras and keep it throughout postproduction. As a sound person, this is something that you should bring up to producers: “What type of time code should we use, non-drop-frame or drop frame?”17 The answer to this question is too infrequently known during shooting because most of the consequences occur in postproduction. One of the biggest problems occurs if the answer is not known during production and different parts of a production randomly choose DF or NDF code. In such a case, editing is at least greatly complicated, and sometimes made impossible on certain editing systems, by the mixed code.

The simplest code generators start at 00:00:00:00 each time the camera is powered on or a new disc or tape is inserted into the camcorder. In fact, some camcorders will lose track of the last time code recorded if taken out of standby mode or if a tape is rewound for playback and then cued up for subsequent recording. In that case, the same time code sequence may be recorded a second time on the same tape—a potential difficulty for the editing system to sort out if trying to go back and find a particular clip off a source tape. Other cameras are able to read the code of an existing tape or from files on the medium up to the point where no recording has been made and then pick up recording the time code from where they left off.

An important difference lies in the two possible modes of generating and recording time code on the camera: FREE RUN and REC RUN time code. For DV cameras, typically the time code generator runs when the tape rolls and pauses when the tape is paused. This is called Record Run (REC RUN, R RUN) and it produces continuous code throughout a tape even with camera starts and stops, so long as the power isn’t interrupted or the tape removed and put back in. Free Run (FREE RUN, F RUN) means that the time code generator is typically set to clock time and runs continuously, storing discontinuous code with each start of the camera, but providing evidence of the actual time of an event, sometimes needed for court requirements, for instance. Cameras recording to more professional formats than DV may be switched to REC RUN or to FREE RUN, although REC RUN is used far more often. REC RUN is generally preferred because editing systems are then presented with continuous time code, and this is more easily dealt with than discontinuous code.

Setting the time in FREE RUN mode can be accomplished by a function called JAM SYNC, which sets the camera’s internal time code generator from an external source. Jam sync is used to synchronize camera time code generators when multiple cameras are used to shoot the same event, or to synchronize one or more camera time code generators to the time code generated by an audio recorder when double-system sound is in use. Higher-end cameras may also provide an input for a Genlock video input that permits sources shot on separate cameras to be matched precisely in time (the first line of video will start at the same time on each camera). Genlocking multiple cameras permits overlaying a graphic while switching between cameras in real time, for instance, whereas just jam sync alone is not enough to accomplish this.

Many prosumer and professional cameras have the capacity for the user to set a starting time code even in REC RUN mode, as do a few consumer models. In tape-based situations with large amounts of footage—including documentary programs—the resetting to 00:00:00:00 each time a new tape or disc is inserted is a nuisance for logging media, because the time code then can’t be used to distinguish one out of a multiplicity of tapes for a given project. Running time code continuously from one disc or tape to the next can reduce or eliminate the appearance of repeated source time codes over the length of a project. In one conventional method of working, the minutes and seconds of time code are reset each time a new tape or disc is inserted, while the hour counter is incremented by hand to match the tape number (or, more generally, the digital reel number) in use.

In an off-line/online workflow (described above), the EDL would then contain information about which source tape a clip comes from embedded in the time code, in addition to any notation elsewhere in the metadata for a clip. Without that extra information, very good records must be kept of which tape is meant for each picture and/or sound edit, because the online editing system will ask for a specific tape by an identifier such as a six-digit code to be mounted into a playback machine and then will shuttle to the time code indicated on whatever tape is provided. If it is not known what tape a particular time code event comes from, then the wrong tape may be used and the wrong picture or sound inserted into the program. Moreover, if a given tape or disc contains passages with identical time codes at multiple locations on the medium, it is especially ambiguous which clip is meant.

Luckily, the need for fulfilling the bookkeeping requirements of the off-line/online method of editing is not as great as it once was because newer cameras and editing systems allow full-resolution digital video editing in most situations, precluding the need to go back to camera original tapes or media during an online session since source video files on the editing system are clones of the original recordings. In most systems in use for simple digital video editing today, the distinction between off-line and online quality does not exist.18 Thus there is no need to repeat the import of audio and video for an online session, and keeping track of sources is not quite so vital. In fact, source media are frequently re-used in solid-state recording systems once footage has been transferred to a primary hard drive and one or more backup drives, so the original sources may no longer exist once editing has commenced in earnest. Still, for backup purposes, it is always a good idea to know where the source video came from, in the case of an editing disaster or a question about syncing sound and picture or matching multi-cam picture, so labeling and logging tapes with a distinguishing code such as at least A, B, C,… Z, AA, AB, AC… is good practice. Even in the case of reusable media being recorded over several times on a production, the file structure of each individual card is typically preserved when the card is initially transferred, so notes on which card a clip originated on are still useful in case of emergency.

In recording time code on an audio recorder separate from the camera, one nuance is that 30 fps or 29.97 fps time code is used on NTSC material even for cameras operating in 24P modes. Usually 29.97 fps time code is used for material shot at 29.97, 59.94, or 23.976 fps. When recording audio at 48,048 Hz for a project shot on film (or at precisely 24 fps) that will be pulled down to video speed, 30.00 fps NDF time code is used, as it will also be slowed down to 29.97 fps when the audio is clocked out at 48 kHz to match the video derived from the film. Once again, note that even where the camera is operating at 24.000 fps, the audio recorder is set to 30.000 fps time code.

User Bits

Within the SMPTE specification for time code is a place for the user to note additional information, called user bits or UB (sometimes wrongly and hilariously called the user’s bit by manufacturers). This can be information regarding scene and take, time of day, particular filters in use, method of recording the frames (whether interlaced or progressive), and so forth. Because these are literally the user bits, they are not subject to strict standardization, and so mean different things on different cameras. Consumer camcorders generally do not allow the user to write user bits, whereas prosumer and professional digital video cameras may. Capabilities in this area vary by format, manufacturer, and even specific camera model.

PAL Formats

Generally PAL formats, those used in Europe and other 50-Hz power line countries, use 25 fps code and do not have the complications associated with 29.97 fps time code described previously. Frames are numbered with time code 00 to 24, and then increment the seconds counter and reset to 00. The time code runs according to a wall clock, so there is no notion of dropframe versus non-drop-frame time code in PAL-based systems. One difficulty, noted above, is that in order to transfer to film for theatrical exhibition, the 25 fps video must be slowed down by some 4 percent to achieve film standard 24 fps projection. Sound accompanying PAL video will also be slowed down accordingly in order to stay in sync, and without correction this will result in all the pitches being 4 percent lower—the opposite effect of the 4 percent upward pitch shift on film or 24P originals transferred to 25 fps video. Once again, this is noticeable even on actor’s voices, so it should be corrected in transfer through the use of a hardware or software pitch shifter.

Locked versus Unlocked Audio Sample Rate

Camcorders contain internal oscillators, much like those used in quartz crystal watches, to set various timings. They may use one master clock that is then divided as necessary down to various rates for audio and video, or they may employ two separate clocks for the audio and the video. The separation into two clocks makes the design somewhat simpler. On the other hand, this requires the audio and video clocks, those doing the audio sampling and making the video frames, to run in a synchronized relationship to each other, called a locked audio sample rate, to avoid the possibility of sound and picture drifting out of sync over the course of a long recording. For NTSC standard U.S. video at 29.97 fps (including 59.94 Hz interlaced), every five video frames contain exactly 8008 audio samples.19

On tape-based formats, where large sync drifts could occur over the length of a continuously recorded tape if audio and video were separated and then brought back together in an editing system, this was a particularly important consideration. The consumer DV format (also known as Mini DV) used an unlocked audio sample rate, in which no special hardware or firmware was included in the camera to ensure that the audio and video clocks remained in sync over the length of a tape. Note that tapes played back would not exhibit drift since audio and video were interleaved on the tape, but since the total number of audio samples recorded per video frame was not necessarily the same for every frame, programs would drift in some editing systems that strictly reassigned 48,000 samples of audio to each second of video.20 Other formats such as DVCAM and DVCPRO used locked audio so that no matter what happened down the line, even with audio and video transferred separately, so long as certain conditions were met,21 lip sync (audio–video synchronization) would be preserved.

Not having a strict requirement on audio sample rate makes cameras somewhat simpler and cheaper, so cameras in the early DV format and certain other models employ unlocked audio in the interest of cost savings. While prosumer cameras generally should employ synchronized (locked) audio and video clocks, it is still a good idea to check that audio and video on a new camera maintain sync through to the editing system over the course of a long recording during preproduction, as described in Chapter 3. In an hour-long interview recorded on a single disc or memory card, or an hour of tape playback, a typical amount of drift that unlocked audio may cause in an editing system (with sound and picture transferred digitally) is 20 frames out of sync. Because one frame out of sync is visible, and two frames is quite noticeable, 20 frames, or about two-thirds of a second, is quite bad. Most takes are not 60 minutes long, and a six-minute take would be two frames out of sync and would be on the edge of becoming unacceptable. However, transferring full-length tapes into an editing system all at once can result in later takes being badly out of sync.

Instant Playback

One tremendous benefit of file-based (as opposed to tape-based) recording on modern digital video cameras is the ability to instantly play back a take through the on-camera screen or an externally connected monitor without rewinding the tape. In particular, this reduces the danger of recording over material after rewinding a tape and eliminates the need to cue the tape back up to the end of a recording after playback. Non-linear access to material has long been a hallmark of digital editing systems, and now the availability of random access media in camera original recording has introduced the non-linear paradigm to on-set production as well. While perhaps not as far-reaching a transformation as the genesis of digital photo cameras capable of displaying a photograph on screen instantaneously, this capability of modern digital video cameras nonetheless adds ease of use to the process of reviewing video on set.

Interconnecting Video

In addition to viewing video on the LCD screen or through the viewfinder of a camera, video can be output for live monitoring and playback to an external monitor on set or in a studio or office to view dailies after production.22 Video consists of three color signals, but these may be portrayed on from one to five wires, with various degrees of quality and complexity as outlined in Table 2-2.

Table 2-2 Video interconnects.

Conclusion

The principal differences among the various digital video formats are frame rates, codecs in use (which can include undesirable audio codecs on some cameras or in some camera modes), resolution, and aspect ratio. Particular camera characteristics that can affect sound also include having a locked versus an unlocked sample rate, the ability to handle simple or more complex time code requirements, and the capability for over- or under-cranking, which introduces special sound considerations that should be considered in preproduction. Except for in low-quality or long-playing modes, though, the basic audio quality remains fairly consistent in cameras that offer professional audio inputs. Normally, for professional production, audio is recorded on camera as two-channel, 48-kHz sampling rate, 16-bit audio (or in some cases 24-bit), or with an external recorder capturing two or more channels of 48-kHz, 16- or 24-bit audio.

Director’s Cut

  • Be careful to choose and stick to a single choice of standard format, because intermixing types is next to impossible. NTSC is U.S. standard video; PAL is European standard video. NTSC records at 29.97 fps, and thus it is difficult, although possible, to get a film output at 24 fps. PAL records at 25 fps, and may simply be slowed down 4 percent to make a 24 fps film output. Even though PAL and NTSC are technically standard-definition video systems, the frame rates derived from PAL and NTSC television carry over to American and European high-definition video systems, including 24P video modes.

  • Aspect ratio: while standard-definition video is usually 4:3, high-definition is 16:9. (Some specific anamorphic video modes such as widescreen video on DVD are intended for 16:9 playback even though recorded in a standard-definition 4:3 frame.)

  • Understand interlaced versus progressive scanning. Most traditional video is interlaced; most computer work is done progressive. High-definition video may be interlaced or progressive depending on frame size (1080 or 720 active lines), frame rate, and bandwidth requirements. Many cameras store 24P video as interlaced video, although the picture itself is progressively scanned and can be put back together into a single progressive frame in editing, which leads to better film outputs if necessary. 24P in effect mimics the universal nature of 24 fps film that may be released as film or transferred to PAL video at 25 fps by a 4.166 percent speed-up, and NTSC video by a slowdown to 23.976 fps and the introduction of a 2:3 repeating field sequence, resulting in a video frame rate of 29.97 fps.

  • Use high-quality video modes to ensure both picture and sound are recorded high-quality. Don’t use long-playing (LP) or “standard”-quality video modes except in absolute emergencies.

  • Understand time code terms such as Non-Drop-Frame (NDF), Drop Frame (DF), Record Run, Free Run, and Jam Sync, described in the text. Choose one method of working and stick to it throughout a production. For multicamera shoots, use Genlock if possible.

  • Know that audio may go out of sync in some editing systems for DV tapes, or even on discs or longer files on memory cards, because of the use of unlocked audio and video clocks. Test for sync drift before recording a program.

  • Interconnecting video ranges from one-wire composite video, with compromises, through S-Video, to component video, each step with less of a compromise. In addition to playing video via a connected monitor on set, it is also possible to review digital video with on-camera LCD screens and take advantage of the non-linear nature of modern digital video recording.

1 A field represents half a frame of interlaced video, typically the odd or even scanlines of a single frame. Video fields and the reasons for interlacing video are discussed in more detail below.

2 This number, while close to the exact value, is technically an approximation. The exact frame rate of NTSC color television and any derived formats called nominally 29.97 or 23.976 (or, less precisely, 23.98) fps is slower than 30 fps (or 24 fps) by a ratio of 1000:1001, so that 29.97 would be written without rounding as the repeating decimal value 29.970029970029….

3 Most high-end portable production recorders offer 48,048 Hz in addition to 48 kHz in their sample rate settings, although it may be referred to as 0.1% pulldown or by some other name. They may also offer a 47,952 Hz or 0.1% pullup sampling rate for video projects that will be pulled up to film speed, although that situation would be far less common. When shooting on film and editing on video, it is important to make the decision whether to record audio at 48,048 Hz or 48 kHz before production and stick with it; mixing audio recorded at different sample rates will most likely lead to headaches during postproduction.

4 Even by those with perfect pitch. The minimum noticeable pitch difference for the 0.01 percent of the population with perfect pitch on an absolute basis (not comparing side by side) is about 20 percent of a half-step, or 1.2 percent.

5 Here 50i and 60i refer to interlaced capture, with the number preceding the letter “i” indicating how many fields are captured per second. For NTSC devices, the image capture frequency is usually rounded up to the nearest integer, so that 60i is used when what is actually meant is 59.94i video. Similarly, 30p is usually used to represent 29.97 fps progressively scanned video, the “p” indicating that each image is captured as a complete frame at a single moment in time rather than in two separate fields per frame.

6 Treating an interlaced image as progressive would result in a “combing” effect on the image during motion since every pair of adjacent lines would have been captured at a different moment in time. On the other hand, a progressively scanned image will look correct even when treated as interlaced video and viewed on an interlaced display.

7 This was especially a problem in CRT displays using a scanning electron gun to illuminate phosphors, although the principle remains true on newer progressive displays in that progressive content may be displayed all at once (and refreshed every 1/60th of a second or more) whereas interlaced content must be de-interlaced or shown field by field to appear correctly on a progressive scan monitor.

8 The alternative to editing at 23.976 fps is editing the 29.97 fps video stream as if it were an NTSC video original, but this eliminates many of the benefits of shooting in 24P since interlacing artifacts introduced by the 3:2 sequencing of frames will then be present in the edited program, and cuts may not occur precisely on 24P frame boundaries. When shooting 24P, it is better to use editing software capable of interpreting and editing 24P material directly.

9 24 × 1000/1001 = 23.9760239760… to be exact.

10 Anamorphic widescreen video is also widely known as a distribution format for wide-screen movies released on DVD, replacing the older method of letterboxing with black bars at the top and bottom of the frame. As in anamorphic capture, the goal in this case is to preserve greater vertical resolution by utilizing every pixel in the frame to store useful picture information.

12 One of the earliest cameras to offer “native” 23.976 fps recording was the Panasonic HVX200. Recording 24P video natively rather than embedded in a 30 or 60 fps video stream saved space on the recording medium since redundant frames were no longer stored.

13 The chief advantage of 720p over 1080-line formats is that each frame of 720-line video contains less data, so more frames can be stored per second, and the higher frame-rate of 60 full frames per second allows for smoother motion. That’s why it is particularly useful for sports or action programs with a great deal of motion on the screen. That’s also why use of a lower frame rate such as 720p24 is uncommon, since at that frame rate the bandwidth limitations on 1080-line video do not come into play.

14 The nuances of working in a tapeless production environment and managing media will be discussed in more detail in Chapter 7.

15 With the appropriate software if a proprietary video file format or file system is in use.

16 “S..t Tom, you gotta tell George about this,” said Stanley Kubrick to author Holman in a phone call. Kubrick went on to say he’d memorized the film dailies, so he didn’t need high quality picture in the editing system, and that the flexibility offered by a non-linear editor opened up a whole new experience cutting for him.

17 Unless the answer is prescribed by the format. Some consumer camcorders, including most early DV cameras, for example, only support dropframe time code. Most 24P modes only support non-dropframe.

18 Although there may still be a finishing step in an online suite that can offer more extensive video processing than is available in off-line systems. In this case, an export of a file containing the video is made from the off-line system to the online one. In the conventional case, all that is exported from the off-line system is an Edit Decision List (EDL), and the edits must be reproduced by the online system. Editing at full resolution in a non-linear editing system relieves the requirement of recopying from source tapes to the master during an online session and possibly eliminates the need for a session altogether.

19 Note that no smaller number of frames contains an integer number of audio samples.

20 What is done to maintain sync while playing back the tape in real time is that the audio sample rate is sped up or slowed down by a small amount during playback of the tape to keep the audio in step with the video.

21 Namely, that the clocks doing the playback of the audio and video have the same time base, like setting watches to the same time before a mission (“synchronize your watches, gentlemen”) in all those World War II movies.

22 Video can also be transferred into an editing system or dubbed to disk or tape with standard video cables or Firewire cables, although that was more common in the era of tape-based video formats, including both analog and DV formats. Today, direct file transfer from solid state media to hard disks is more common, as described in Chapter 7.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.15.61.129