Chapter 18. Internet Production

Computer and digital technologies in general and the Internet in particular have provided the means to produce and disseminate audio on an unprecedented scale. What was once a relatively closed medium—the purview of professionals in studio-based environments—is now accessible to practically anyone, with the potential to reach a worldwide market at the touch of a button.

Opportunities abound. They include producing audio for an online company, recording online sessions interactively, promoting and selling your own work, hiring out your production services to others, podcasting, producing sound for Web pages, and creating audio just for the fun of it. In producing sound for professional purposes, however, it serves to rephrase avant-garde writer Gertrude Stein’s famous line “A rose is a rose is a rose is a rose”: audio production is audio production is audio production.

As the methods of producing sound change, the wisdom of that proverb is sometimes forgotten. How audio is produced is only the means to an end. The basic principles related to that end are similar regardless of where production is carried out—in a major production facility with state-of-the-art gear or in a project studio with only a personal computer. The end is always to produce a high-quality product, technically and aesthetically. The point can be made in another way: whether writing with a pen, a typewriter, or a computer, it is not the medium that creates the poetry; it is the poet.

Because the world of computers and the Internet changes continuously, the approach to this chapter is generic, dealing with the processing of audio in a general way and avoiding as much as possible proprietary references and detailed operational information. This material also assumes some experience with computers and the Internet.

Data Transfer Networks

The delivery and the playback of audio over the Internet differ from the traditional methods used in the conventional audio studio. They involve, in computer parlance, a network—a system of computers configured and connected in a way that allows them to communicate with one another. Such a system allows the transfer of information among computers. The Internet is simply an extremely large computer network comprising many smaller networks.

Local-Area Networks

A local-area network (LAN) is configured for a small, localized area such as a home or business. Generally, a LAN is more controllable than a wide-area network because it is small enough to be managed by only a few people. LANs often feature fast data transfer speeds because the connection types and the hardware of all computer systems on the network can be controlled. One concept and approach using LANs is Audio over Ethernet (AoE).

In some measure AoE is similar to Voice over Internet Protocol (VoIP), which was discussed in Chapters 4 and 17. The difference lies in greater bandwidth that can accommodate the higher sampling rates and bit-depth resolution associated with uncompressed, high-quality digital audio. When developing AoE for use on a LAN, an audio or broadcast engineer may take full advantage of low latency—a short delay of usually no more than 6 ms—and fast connectivity, within a stable and reliable transmission environment. Depending on configuration and cabling, AoE protocols can transmit anywhere from 32 to 64 channels of audio at a 48 kHz sampling rate at 16- or 24-bit. While other sampling rates and bit-depth rates may be accommodated, there may be a trade-off in channel capacity. Overall, LAN and Ethernet-based networking is geared to produce high-quality audio not possible on less controlled wide-area networks.

Wide-Area Networks

A wide-area network (WAN) is configured for a large geographical area. Often a WAN is composed of several interconnected LANs. The Internet is the largest and best-known example of a WAN. A WAN is generally less controllable than a LAN because it comprises too many computers and individual LANs for any one person or group to manage. Some computers on a WAN may be quite fast, whereas others may be quite slow. When producing audio for the Web, a designer must take into consideration that WANs are extremely sizable and multifaceted.

Servers

A server is a computer dedicated to providing one or more services to other computers over a network, typically through a request-response routine. Users of the Internet are familiar with the practice of visiting various Web sites. Web sites are stored on a server, which receives Web page requests from users and transmits the desired page. Any audio associated with that page (or requested by a user in other ways) is likewise sent from a server across the network to the user. A network usually has one or several servers serving a large number of client computers. A server can be configured for specific tasks, such as managing e-mail, printing, or Web pages, or to store and serve audio files, for example, on a radio station’s Web site.

Clients

A client is a computer connected to a network configured to make requests to a server. A client could be a personal home computer, an office workstation, or another server. Client computers are often the limiting factor when developing audio (or anything else) for use on the Internet. Whereas servers are fast, robust, and capable of quickly processing large amounts of data, clients come in all shapes and sizes. A video gamer might have an extremely fast computer connected to the Internet over a high-speed connection, whereas a casual Internet user, on the other hand, may use an older PC connected through a telephone modem and a dial-up connection at a very slow speed. Therefore, when designing audio for the Web, the limitations of the client computers must be considered. When developing audio to be used over a LAN, however, clients are often configured and controlled by the same team that manages the servers, so LAN clients are often more capable than the typical client computers found on a WAN or the Internet.

Audio Fidelity

Networks, including the Internet, have a variety of limiting factors that reduce the speed at which data can be transmitted. Large amounts of data can take a great deal of time to be fully transmitted from a server to a client. Such large transfers of information can also interfere with or impede other transmissions across a network. The overall time it takes for information to move from a server to a client is known as the connection speed.

We are all familiar with the experience of clicking on a Web link and waiting for endless minutes as an audio file opens. Because audio files are often quite large, transmitting them from server to client can take a long time. To reach the largest audience possible, a Web sound designer must design audio for the slowest likely connection. The slower the connection speed, the poorer the sound quality. If a designer has the advantage of creating sound to be used in a very fast LAN application, higher-quality sound can be used because of the greater bandwidth and faster connection speeds. A Web designer creating audio for a Web site likely to be viewed by users on slow connections would do well to reduce audio file size so that the sound is accessible to users with slower connections.

It is worthwhile to remember that, unlike telephone or radio sound transmission, the transfer of data over a network does not require that sound be sent in real time (although it can be, if desired). Sound sent by a server via the Internet can and often does take longer to reach its intended client than it takes for the client to play it back. Waiting for long periods of time to download audio over a network is frustrating. A number of solutions have been developed to address this problem, such as improvements in connection speed, reducing file size by manipulation, and data compression. These techniques allow information to flow faster across a network but in some cases not without a trade-off in audio quality.

Connection Speed

The greater the connection speed, the faster a network can transmit information, including audio information. Connection speed is measured in kilobits per second (Kbps). For example, a connection speed of 150 Kbps can allow a bandwidth of 20,000 Hz and stereo CD-quality sound if played in real time. If sound is transmitted in real time, a connection speed of 8 Kbps produces a bandwidth of 2,500 Hz and mono telephone-quality sound (see 18-1).

Table . 18-1 Changes in audio quality in relation to connection speed.

Connection Speed (Kbps)

Mode

Audio Quality

Bandwidth (kHz)

Application

128–150

Stereo

CD quality

20

Intranets

96

Stereo

Near CD quality

15

ISDN, LAN

56–64

Stereo

Near FM quality

12

ISDN, LAN

24–32

Stereo

Boombox FM quality

7.5

28.8 modem

16

Mono

AM quality

4.5

28.8 modem

12

Mono

SW quality

4

14.4 modem

8

Mono

Telephone sound

2.5

Telephone

ISDN = Integrated Services Digital Network; LAN = local-area network

In fact, most Internet audio is not transmitted in real time, so connection speed is viewed as a limitation on the speed of transmission rather than as a means of determining a transmission’s sound quality. A 150 Kbps connection transmits 150 kilobits per second, so with a bit rate of 150 kilobits you can transfer an audio file of 18.85 kilobytes (KB) in one second. The file could be a short clip of high-quality audio or a lengthy clip at reduced audio quality. In either case, the file would still take one second to travel to the client. This is known as latency. Obviously, a faster connection speed allows the same audio file to transfer more quickly.

File Manipulation

Another way to improve Internet sound is by reducing the size of audio files because size reduction facilitates file transfer between computers. This can be done through manipulation and compression (compression is covered later in this section).

There are different ways to handle file size reduction by manipulation, such as reducing the sampling rate, reducing the word length, reducing the number of channels, reducing playing time by editing, and, with music, using instrumental rather than vocal-based tracks (see 18-2).

Table . 18-2 Approximate file sizes for one minute of audio recorded at differing sampling rates and resolutions.

Word Length/Sampling Rate

44.1 kHz

22.05 kHz

11.025 kHz

16-bit stereo

10 MB

5 MB

2.5 MB

16-bit mono/8-bit stereo

5 MB

2.5 MB

1.26 MB

8-bit mono

2.5 MB

1.26 MB

630 KB

1,000 KB = 1 MB

Often these reductions in file size result in a loss of sound quality because many of them actually remove information from the file. Because the Internet is a huge, varied, and sometimes slow network, most audio distributed across the Web is reduced in file size to improve transmission speed, and therefore most Web audio is of reduced quality. That said, if longer download times are acceptable, audio quality does not have to suffer.

Reducing the Sampling Rate

Some sounds, such as a solo instrument, a sound effect, and a straight narration, to name just a few, do not require a high sampling rate of, say, 44.1 kHz. For these sounds 32 kHz may suffice. Be aware, however, that reducing the sampling rate may have unintended consequences with regard to the fidelity of a complex sound with a rich overtone or harmonic structure.

Reducing the Word Length

Audio with a narrow dynamic range, such as speech, simple sound effects, and certain instruments (see 19-33), does not require longer word lengths. An 8-bit word length should support such sound with little difficulty. Of course, wider dynamics require longer word lengths—16-bit and higher. But a given audio recording does not comfortably fit into either a narrow or a wide dynamic range; it almost always includes both. To counter the adverse effects caused by having too short a word length to handle a changing dynamic range, one technique is to use a sliding window—selecting an 8-bit format that moves up and down on the 16-bit scale, depending on whether the sound level is high or low.

Reducing the Number of Channels

Stereo audio uses twice as much data as mono. If an audio file can be handled in mono, it makes sense to do so. Sound effects are usually produced in mono; effects in stereo do not provide as much flexibility to the editor, mixer, and rerecordist in postproduction because the spatial imaging has already been committed. Speech and vocals can be monaurally imaged because they are usually centered in a mix. Music is the one sonic element that requires fuller spatial imaging and therefore stereo. But on the Web, music is ordinarily distributed in a mono format to reduce file size and improve transmission speed.

Reducing Playing Time by Editing

Obviously, reducing playing time reduces file size. Of course, this is not always feasible or desirable, but when the opportunity does present itself editing the file will help. In this case, editing refers to eliminating extraneous bits of data (as opposed to conventional editing discussed in Chapter 20). Three ways to handle such editing is through compression, noise reduction, and equalization.

Compression (in relation to dynamic range) reduces the distances between the peaks and the troughs of a signal. This raises the level of the overall signal, thereby overriding background noise.

Noise reduction is best handled by listening to the silent parts of a file to check for unwanted noise. If it is present, that section of the file can be cleaned, taking care to retain all desirable sound.

Equalization comes in handy to add the high frequencies that are usually lost in compression (data reduction).

Using Instrumental Music Instead of Vocal-Based Tracks

Reducing the data of instrumental music is easier than that of vocal-based music backed by an accompanying ensemble. Therefore it is another way to eliminate unwanted bits so long as production requirements make it feasible to do so.

Compression

Compression is the most common way to reduce the size of large audio files by removing psychoacoustically unimportant data. The process is facilitated by a codec, a device that computes enormous mathematical calculations (called algorithms) to make data compression possible.

Compression can be either lossless or lossy. In lossless compression the original information is preserved; in lossy compression it is not. Lossless compression preserves binary data bit-for-bit and does not compress data at very high ratios. Historically, the codec does not articulate audio data as well because its operations are based on strict mathematical functions that are more conducive to handling video. But, this is changing with open source solutions such as Ogg and Free Lossless Audio Codec (FLAC).

Lossy compression delivers high compression ratios. The lossy codec compression approximates what a human ear can hear and then filters out frequency ranges to which the ear is insensitive.

There are several compression formats in use today: adaptive differential pulse code modulation, a-law and μ-law, RealAudio, MPEG, MPEG-2 layer 3 technology, MPEG-2 AAC, MPEG-4 AAC, Ogg, and FLAC.

  • Adaptive differential pulse code modulation (ADPCM)uses the fact that audio is continuous in nature. At thousands of samples per second, one sample does not sound much different from another. ADPCM records the differences between samples of sound rather than the actual sound itself. This technique reduces the sampling rate from 16-bit to 4-bit, compressing original data to one-fourth its size.

  • A-law and μ-lawconstitute an 8-bit codec format developed for sending speech via digital telephone lines. It uses the principle of redistributing samples to where sound is loudest—during speech—from where it is softest, during pauses in speech. μ-law is used in North America and Japan; a-law is used in the rest of the world.

  • RealAudiois a proprietary system using a compression method called CELP at low bit rates for speech. For music it uses the Dolby Digital codec with higher bit rates. The Dolby system is known as AC-3 (the AC stands for audio coding) (see Chapter 22).

  • MPEGstands for the Moving Picture Experts Group. It was established by the film industry and the International Standards Organization (ISO) to develop compression software for film. MPEG uses an analytical approach to compression called acoustic masking. It works by establishing what the brain perceives the ear to be hearing at different frequencies. When a tone, called a masker, is at a particular frequency, the human ear cannot hear audio on nearby frequencies if they are at a low level. Based on this principle, MPEG developers determined various frequencies that could be eliminated. They use perceptual coders that divide the audio into multiple frequency bands, called filterbanks, each assigned a masking threshold below which all data is eliminated.

  • MPEG-2 layer 3 technologyis commonly known as MP3. It is considered an excellent system for speech, sound effects, and most types of music. It ranges from 16 to 128 Kbps. MP3 is among the most commonly used codecs because of its ability to dramatically reduce audio file size while retaining fairly good sound quality. It supports a variety of sound applications and Web authoring environments.

  • MPEG-2 AAC (Advanced Audio Coding)is a compression format from MPEG. It is approximately a 30 percent improvement over MPEG-2 layer 3 technology and 100 percent better than AC-3.

  • MPEG-4 AAC with SBR (Spectral Bandwidth Replication), or MP4,. is the most recent coding system from MPEG. It offers near-CD-quality stereo to be transmitted over connection speeds as low as 48 Kbps. A data rate of 96 Kbps for a 44.1 kHz/16-bit stereo signal is also possible. MP4 compression is replacing the popular MP3 format because it offers higher-quality audio over the same connection speeds. In addition to mono and stereo, MPEG-4 AAC with SBR supports various surround-sound configurations, such as 5.1 and 7.1.

Most AAC encoder implementations are real-time capable. AAC supports several audio sampling rates ranging from 8 to 96 kHz. This makes it suitable for high-quality audio in applications with limited channel or memory capacities.

  • Oggis an open-source multimedia container format, comparable to MPEG program stream or QuickTime. Its suite of video, audio, and metadata compression formats and solutions, however, are designed as nonproprietary—that is, free—alternatives and are attractive to individuals as well as third-party audio and media developers. Ogg Vorbis is a professional audio encoding and streaming technology. The encoder uses variable bit rate (VBR) and can produce output from about 45 to 500 Kbps for a CD-quality (44.1 kHz/16-bit) stereo signal, improving the data compression transparency associated with lossy compression.

  • Free Lossless Audio Codec (FLAC)is an open-source, cross-platform file format that achieves lossless compression rates of 40 to 50 percent for most musical material. It can handle a wide range of sampling rates, bit-depth resolutions from 4- to 32-bit, and between one and eight channels, making stereo and 5.1 groupings possible. The format is popular among groups sharing music and also for those who are interested in archiving music recorded on vinyl and CD. Unlike MPEG formats, FLAC has the advantage of storing exact duplicates of original audio, including metadata from a recording such as track order, silences between tracks, cover art, and text.

File Formats

Because programs that play digital audio need to recognize the files before they can play them, files that are converted into digital audio must be saved in a particular format. The process adds an extension (a period followed by a two- to four-letter descriptor) to the file name, identifying it as an audio file. It also adds header information, which tells the playback program how to interpret the data. File identifiers specify the name of the file, file size, duration, sampling rate, word length (bits), number of channels, and type of compression used.

A large number of formats are available on the Internet, and they come and go; a few, however, have been in relatively widespread use for a while (see 18-3). These include MPEG formats (.mpa, .mp2, .mp3, .mp4), Windows Media Audio (.wma), WAV (.wav), RealAudio (.ra), OGG (.ogg), FLAC (.flac), and Flash (.swf).

Table . 18-3 Common file formats and their associated extensions.

Lossless File Formats

Extensions

WAV

.wav

AIFF (Audio Interchange File Format)

.aif, .aiff

RealAudio

.ra

Broadcast Wave Format

.wav

Windows Media Audio

.wma

Free Lossless Audio Codec

.flac

Lossy File Formats

Extensions

MPEG Audio

.mpa, .mp2, .mp3, .mp4

ATRAC (used with mini disc)

 

Flash

.swf

Sun

.au

Ogg

.ogg

Metafile Formats

(Metafile formats are designed to interchange all the information needed to reconstruct a project.)

AES-31 (AES is the Audio Engineering Society)

OMF (open media format)

Downloadable Nonstreaming Formats

Downloadable nonstreaming formats require that the user first download the entire file and store it on the computer’s hard disk. A plug-in then plays the file. No specific server (transmitter) software is needed.

The difference between nonstreaming and streaming is that streaming allows audio data to be sent across a computer network so that there are no interruptions at the receiving end for what appears to be continuous audio playback. In practice, however, interruptions do occur with slower connections.

Downloading nonstreaming data is usually slow when the files are even relatively large. Because a WAV file with one minute of digital-quality sound—44.1 kHz/16-bit stereo—will occupy roughly 10 MB of disk space, this format is generally used for audio of a minute or so. 18-4 provides an idea of the various formats’ ease of use, quality versus size, and portability.

Table . 18-4 Common downloadable formats in relation to ease of use, quality versus size, and portability (its support across different platforms, such as Windows, Macintosh, and Unix).

 

Ease of Use

Quality Versus Size

Portability

AU[*]

Good

Good

Excellent

MPEG(MP3)

Fair

Excellent

Good

MPEG-4 AAC

Fair

Excellent

Good

WAV/AIFF

Excellent

Poor

Good

WMA

Excellent

Excellent

Good

Flash

Fair

Good

Excellent

FLAC

Good

Excellent

Good

Ogg

Good

Excellent

Excellent

 

Strengths

AU

Can be used across all platforms

MPEG (MP3)

High compression ratios that can shrink CD files to one-tenth their size with no apparent loss of quality

MPEG-4 AAC

Higher quality than MP3; small file sizes

WAV/AIFF

Because these are the default audio type for all operating systems, they are extremely easy to use on all systems and applications

WMA

Native to Windows environment so it plays on any Windows machine; good sound quality

Flash

Due to widespread plug-ins, Flash files are playable on 90 percent of computers; format also allows the creation of interactivity not available in other formats

FLAC

High-quality compression; open source; fast streaming

Ogg

High-quality compression; open source; can be used across many platforms

 

Recommendation: Use in...

Reasons

AU

Short audio clips (<30 seconds), background sound tracks, sound effects, and voice clips

High portability; good sound with small file sizes

MPEG (MP3)

Audio that requires CD quality

Highly compressible; extremely popular

MPEG-4 AAC

Audio that requires CD quality

Will replace MP3 eventually

WAV/AIFF

Short audio clips of voice and solo instrumental music

Files too large; nonportable if compressed

WMA

Short to medium-length clips intended for Internet playack

Works well in Windows Media Player embedded into a Web page

Flash

Audio that needs to be interactive or features integrated user controls

Allows the inclusion of programming artwork and other nonaudio elements

FLAC

Audio that requires CD quality

Lossless compression and flexibility

Ogg

Audio that requires CD quality

Highly efficient compression

[*] AU is a simple audio file format developed by Sun Microsystems.

Nonstreaming formats are easily portable and useful for file-sharing applications. Because the file downloads to the computer before playback, it can be easily copied, saved, manipulated, and redistributed (note, however, that copyrighted material should not be distributed without express permission of the copyright holder).

Nonstreaming formats are not useful if the desire is to have audio play automatically within a Web browser or Web page. Nonstreaming files often need to be started manually by the user and often take some time to download before playback can begin.

Downloadable Streaming Formats

The principle behind streaming technologies is buffering. As the player, which is built into or added onto a user’s Web browser, receives data, it stores it in a buffer before use. After data builds a substantial buffer, the player (such as RealPlayer) begins to play it. All incoming data is first stored in the buffer before playing.

Thus the player is able to maintain a continuous audio output despite the changing transmission speeds of the data. Sometimes the buffer runs empty, and the player has to wait until it receives more data, in which case the computer displays the message “Web congestion, buffering.” Data is sent over the Internet in packets. Each packet bears the address of the sender and the receiver. Because the Internet is lossy and non-deterministic, packets may get lost and arrive at different times. One company has developed a system that allows audio to flow even if packets are lost or appear later. Nonetheless, streaming technologies come closest to approximating traditional broadcast.

Streaming formats cannot be permanently saved and are purged from the buffer after playback. This is sometimes desirable when distributing copyrighted material, for instance. At other times the inability to save material is a limitation. Streaming formats also tend to be of lower sound quality than downloadable formats. Often users experience choppiness or stuttering during playback as well as dropouts and artifacts, especially with slower connection speeds.

Transmission of Streaming Formats

With streaming technology the transmission process passes through the encoder, the server, the Internet, and the player. The encoder is responsible for translating the original audio signal into one of the data compression formats. The compressed audio bitstream is then sent to a server. Sometimes the encoder and the server can run on the same computer, but this is not usually the case for high-capacity servers or encoders running at high bit rates. The server is responsible for repackaging the bitstream for delivery over the Internet, sending the same bitstream to multiple users. This link is the weakest in the chain because the Internet was not designed with real-time streaming applications in mind. At the receiving end is the user’s computer with the appropriate applications software to play the audio file. This player unpacks the streaming data from the server, recovers the compressed audio bitstream, and decodes it back into an audio signal.

Progressive Download Formats

Web developers working with sound also take advantage of progressive download formats. These files download like a nonstreaming file, allowing users to save them for later use. They buffer like a streaming format, however, allowing playback to begin before the entire file has been completely saved. This improves performance and playback over the Web while retaining the advantages of a nonstreaming format.

Online Collaborative Recording

Integrated Services Digital Network (ISDN) made it possible to produce a recording session—from a radio interview to music—in real time between studios across town or across the country (see Chapter 6). Various software applications allow multiple users to connect to one another in real time and record or perform together. These systems generally require very fast connection speeds and robust server and client setups.

The Internet has seen an explosion in social networking sites, where audiophiles and music lovers form online communities. Audio—especially music—can be created, performed in virtual venues, and distributed or shared online.

For many years it has been possible for individuals to collaborate by sharing files to produce audio materials such as spot announcements, radio programs, music recordings, film sound tracks, and more. An entire project can be posted to a secure server and downloaded by one or more members of the production team, who can provide additional audio and then edit and mix the project. All the updating can then be posted back to the server. The process can be repeated as many times as necessary from anywhere in the world with an Internet connection. Often these systems can be configured to pass audio files back and forth automatically, reducing the likelihood that various team members will accidentally work on different versions of a file or overwrite each other’s work. Some companies have developed in their online software the features of a virtual studio, facilitating fast, convenient ways to collaborate with others synchronously and in real time.

The possibilities of this technology seem limitless. With recent innovations in online collaborative recording, it is possible to assemble a team of musicians from around the globe without requiring travel, relocation, or adjustments to personal schedules. Musicians from different cities, countries, or continents can be engaged for a recording session in a virtual online studio. These musicians report to studios near their own homes, or work from home, yet still perform together as a group just as they would in a single studio. The key elements that make such collaboration possible involve access to adequate bandwidth, as well as the development of data-thinning schemes and algorithms that ameliorate latency and delay issues that hinder real-time musical interaction.

It is also becoming easier to arrange a recording session in one country but monitor and mix it elsewhere. Wireless technology makes it possible to record sound effects in the field and transmit them to a studio. As Internet technology and connection speeds improve, the sound quality to be found in such collaborative experiences will equal that of the best studios.

Currently, however, Internet technology is still somewhat limited in making online collaborative recording viable for many potential users. Within a LAN, connection speeds and client performances are high enough to make these techniques practical and sometimes beneficial, although LANs usually do not cover a wide geographical area. Across the Internet, where slow connections and outdated technology still abound, online collaborative recording techniques are still in their infancy.

But online collaboration is happening in the production of commercials and bona fide programs, as Thomas Friedman points out in his book The World Is Flat. He tells of a program called Higglytown Heroes and how this

all-American show was being produced by an all-world supply chain.... The recording session is located near the artist, usually in New York or L.A., the design and direction is done in San Francisco, the writers network in from their homes (Florida, London, New York, Chicago, L.A., and San Francisco), and the animation of the characters is done in Bangalore (India).... These interactive recording/writing/animation sessions allow [recording] an artist for an entire show in less than half a day, including unlimited takes and rewrites.[1]

Although not interactive, there is online collaborative recording, of sorts, available from services that make professional musicians available for personal recording projects. Using the Internet, the client submits an MP3 example of the music and indicates the type of musical support needed. The service then discusses the musical direction to take, the musicians it has available, and the fee. Working in their own studios, the hired musicians record their individual parts. As each instrument recording is completed, the service compiles an MP3 rough mix and sends it to the client for approval. The process is repeated until completion, at which point the client receives the edit master files for final mixing. If the client prefers, the service can provide a professional engineer to handle the final mix.

Podcasting

Podcasting is an ever-increasingly popular mode of media production and distribution. Through rapid advances, Internet technology allows users to create and distribute their own audio (and video) productions over the Web. The term podcasting is a combination of the words pod, referring to the iPod sound player, and casting, short for broadcasting. It points to the essential idea behind podcasting: to create a media broadcast suitable for playback on mobile media, more often than not an MP3-compatible player like the iPod, a cell phone, or a personal digital assistant (PDA).

From a technical perspective, there are three reasons why podcasting is more related to Internet technology than to audio production. First, the pioneers of podcasting tended to be Web designers and developers rather than audio professionals. Much has gone into the development and the deployment of the various Web technologies that make podcasting possible, but less attention has gone into the good use of sound in a podcast.

Second, podcasters hope to reach large audiences, but the emphasis is on achieving a widespread distribution through RSS feeds—a method of distributing data about a podcast to a widespread group of Internet subscribers—and other Web technologies (see discussion later in this section).[2] Less attention is given to sound production technique and sound quality as a means of increasing audience size.

Third, the medium’s surging popularity has tended to dilute the emphasis on sound quality. As in any field, a relatively small group of professionals can better implement and control high production standards. The large and ever-growing group of novices getting into podcasting is more intent on learning how to create a podcast and (because of inexperience) not necessarily on producing one that is professional sounding. Typical podcasts feature poor-quality audio and amateurish production values. That said, there are also many well-done podcasts, usually on large, heavily trafficked Web sites that feature professionally produced audio—narration, music, and sound effects. They are essentially professional radio broadcasts distributed via the Internet.

Though the current emphasis in podcasting is with Web technology and Web distribution, the podcaster is still faced with many of the same limitations as any other developer who distributes audio over the Internet. File size is still of paramount concern. Though fast connections make the download of podcasts relatively speedy, it is still advantageous for a podcaster to compress audio and reduce file sizes to reach as wide an audience as possible. It is also important for the podcaster to consider that users may not be willing to upload a large podcast to an iPod or other portable MP3 player that is already full of other media. A smaller file size leaves a smaller footprint on an MP3 player, making storage and playback easier for the listener.

Most audio podcasts are distributed in MP3 format, which is playable on virtually all computers and MP3 players, although some MP3 players require a specific type of encoding for certain MP3 files to be played. Despite this limitation MP3 is still the file type of choice for podcasting.

Creating a podcast MP3 is relatively straightforward. Computer technology notwithstanding, the equipment required is the same used in conventional radio broadcasting. What accounts for differences in sound quality is whether the podcaster employs consumer-grade equipment with no signal-processing capability or any of the higherquality podcasting equipment packages available.

Conceptually, a podcast can take any number of forms. Some are released in sequence or on a schedule—daily, weekly, or monthly audio blogs or “webisodes.” Many are simply experiments or onetime releases. It is becoming increasingly common for news outlets, both print and broadcast, to create podcast versions of some content (both audio and video) so that users can listen or watch on the go, using mobile media. Whatever the format, if the podcaster is serious about the product and building an audience, it pays to keep good Internet sound production practices in mind.

There are many technical details involved in placing a podcast on the Web. Some Web sites offer free or paid services that allow users to handle this with relative ease. More ambitious podcasters and professional Web developers generally learn how to distribute a podcast manually. Whatever the approach, there are certain requirements for a podcast to become more than an audio file sitting in a user’s home computer.

First, the audio file must be uploaded to a Web server, via File Transfer Protocol (FTP), where it is accessible to anyone with an Internet connection. The file is now available, but a user may not know it exists. Linking to the file from a Web site can help, as can including useful metatag data in the header of the Web page. Successful podcasting requires a more robust approach to distribution, however. One popular method of getting one’s work out there makes use of the RSS feed.

An RSS feed is specified using XML (Extensible Markup Language) that facilitates sharing data over the Internet. The feed format lends itself to frequent updates and may include a range of metadata about the podcast: from content summary, to transcript, to dates and authorship. The feed is a computer file in a standardized format that lists addresses and information about the various podcasts available on a particular server. The feed is also made available on the server in a semi-permanent location known as the feed URI or feed URL.[3] Ideally, this feed is made available in a variety of places across the Internet. Users who find the feed can look through it using a software application called an aggregator, or podcatcher,[4] and select various podcasts to which to subscribe.

A feed of this kind is more advantageous than a simple link because as the feed is updated with new podcasts or changed information, the aggregator on the user’s computer will see the changes and bring them to the user’s attention automatically. Some services will also download to the user’s MP3 player so that new podcasts are always available.

Audio Production for Mobile Media

Producing audio for mobile media such as iPods, MP3 players, cell phones, and PDAs involves numerous considerations, especially where they intersect with multiple platforms and the parameters of Internet delivery (see Chapters 14, 17, and 23). Unlike production for a state-of-the-art cinema experience, home entertainment center gaming or film environment, or multichannel playback system geared to the highest fidelity, producing audio for mobile media presents particular technical and aesthetic challenges that are generally without historical precedent.

You will recall that mobile devices are designed to handle compressed audio—based on the need to manage access speed and sound quality according to available bandwidth and storage capacity. Although this may seem to be purely a technical matter, the implications affect creative and aesthetic decision-making at every stage of production. The full range and power of a symphony orchestra or the subtle sonic complexity of nature’s dawn chorus may not be possible to render in a compressed audio format; too much would be lost. Other music and voices may fare better from a listener’s perspective. To some extent, form follows function in relation to what is possible.

Playback through mobile media is often through small piezoelectric loudspeakers with a limited frequency response and dynamic range (such as inboard and outboard computer speakers of varying quality and cell-phone speakers). The latter are designed with the human voice in mind and therefore have a correspondingly narrow frequency response, ranging from around 100 Hz to 8 kHz. Earbuds are capable of handling a broader spectrum, from 10 Hz to 20 kHz, but they too are limited in playing back acoustically rich and detailed audio because of their size and unnatural positioning as a sound source at the outer edge of the ear canal. Headphones, particularly those with noise-canceling capability, offer the better sound option for individual listening.

Editing and mixing audio for mobile media generally requires careful and selective equalization in the upper and lower ranges of the program material; highs and lows are rolled off, and the midrange is emphasized. Dynamics usually must be compressed so that program material is heard clearly through tiny loudspeakers.

Apart from the technical challenges, the conditions for listening to mobile media for those on the go are not necessarily conducive to focused, attentive listening. People are more likely to listen to an iPod on the bus, while riding a bicycle, or while walking down a city street than in the solitude of a quiet room. These environments are often acoustically competitive and many sounds are masked by ambient din. To compound the situation, a listener may be engaged in any number of other activities—reading, texting, exercising, and so on. In a word: multitasking. The listening experience is therefore compromised on many levels and exists in a multisensory continuum of stimulation.

From a sound design perspective, there is also a need to consider how program audio associated with picture functions with regard to scale in small mobile-media devices. While it is possible to download a feature film or television episode to an iPod or a cell phone, much is lost in translation. These materials were designed with a larger picture and acoustic image space in mind and the experience was, more often than not, a collective one: a cinema or living room where people gathered together. Consider watching a film like Lawrence of Arabia with its sweeping, epic desert scenes and a cast of thousands on a screen that is 2 inches wide. Individuals are rendered in little more than a few pixels. Although the sound track, even though it is compressed audio, can be turned up and experienced with much of its drama and grandeur intact through headphones, the essential relationship of sound to picture no longer exists with regard to scale and appropriate, meaningful perspective.

As more and more content is created specifically for mobile media, from short-form webisodes, to audio/video blogs, podcasts, and games, a new aesthetic is emerging. With a viewing area reduced to a few inches, production designers are tending to adjust their scene and shot compositions to accommodate more elemental information: the massing of bold and simple shapes; bright color schemes using primary and secondary colors; light and shadow relationships that need to stand out on an LCD screen being pushed into higher contrast ratios; and close-ups being favored over long and medium camera shots. For the sound designer working with mobile media, the sonic palette remains somewhat restricted and the production parameters limited, at least until the technical means afford greater possibilities with respect to aesthetic ends.

All this is not to forget the aesthetic challenge created by the disproportionate “sizes” of the sound and the picture. Compared with the video, in sonically good reproducing systems the sound heard through headphones or earbuds can be enveloping, overwhelming the video image and therefore dominating it, if not separating from it entirely (see Chapter 14).

Main Points

  • Computer technology in general and the Internet in particular have provided the means to produce and disseminate audio on an unprecedented scale.

  • The delivery and the playback of audio over the Internet differ from the traditional methods used in the conventional audio studio. It involves, in computer parlance, a network—a system of computers configured and connected in a way that allows them to communicate with one another.

  • A local-area network (LAN) is configured for a small, localized area such as a home or business. A wide-area network (WAN) is configured for a large geographical area.

  • A server is a computer dedicated to providing one or more services to other computers over a network, typically through a request-response routine. A client is a computer connected to a network configured to make requests to a server.

  • Achieving digital-quality sound transmission over the Internet depends on several factors, not the least of which are a computer and loudspeakers capable of delivering the high-quality audio.

  • Among the factors relevant to high-quality audio are the connection speed and the file size.

  • Reducing file size can be done by file manipulation or data compression.

  • File manipulation includes reducing sampling rate, word length, number of channels, and playing time and, with music, using instrumental instead of vocal-based tracks.

  • Playing time can be reduced by editing through compression of the dynamic range, noise reduction, and equalization.

  • Compression can be either lossless, preserving the original information, or lossy, with high compression ratios where some data is filtered out.

  • Protocols used for data compression include adaptive differential pulse code modulation (ADPCM); a-law and μ-law; RealAudio; MPEG; MPEG-2 layer 3 technology (MP3); MPEG-2 AAC (Advanced Audio Coding); MPEG-4 AAC with SBR (Spectral Bandwidth Replication), or MP4; Ogg; and FLAC (Free Lossless Audio Codec).

  • File formats facilitate the saving of digital audio files.

  • The difference between streaming and nonstreaming is that streaming allows audio data to be sent across a computer network with no interruptions at the receiving end, although, in practice, interruptions do occur with slower connections.

  • Downloading nonstreaming data is usually slow and therefore limited to small files.

  • The principle behind streaming technologies is buffering.

  • With streaming technology the transmission process passes through the encoder, the server, the Internet, and the player.

  • Using secure file servers, it is possible to do collaborative audio production online by uploading and downloading such audio materials as music, voice-overs, and sound effects. An entire project can be posted to a server and then downloaded by one or more members of the production team, who can provide additional audio and then edit and mix the project.

  • Podcasting involves Internet technology that allows users to create and distribute their own audio (and video) productions over the Web.

  • File size is of paramount concern to the podcaster. Though fast connections make the download of podcasts relatively speedy, it is still advantageous to compress audio and reduce file sizes to reach as wide an audience as possible.

  • Most audio podcasts are distributed in MP3 format, which is playable on virtually all computers and MP3 players.

  • Sound quality in podcasts varies widely for several reasons: the emphasis is more on achieving widespread distribution than on audio production values; equipment runs from consumer-grade to professional; and many podcasters have little or no experience with audio production techniques.

  • The RSS feed is a robust computer file in a standardized format that lists addresses and information about the various podcasts available on a particular server.

  • Unlike production for a state-of-the-art cinema experience, home entertainment center gaming or film environment, or multichannel playback system geared to highfidelity, producing audio for mobile media presents particular technical and aesthetic challenges that are generally without historical precedent.

  • Playback through mobile media is often through small piezoelectric loudspeakers with a limited frequency response and dynamic range.

  • Editing and mixing audio for mobile media generally requires careful and selective equalization in the upper and lower ranges of the program material; highs and lows are rolled off, and the midrange is emphasized. Dynamics usually must be compressed so that program material is heard clearly through tiny loudspeakers.

  • The conditions for listening to mobile media for those on the go are not necessarily conducive to focused, attentive listening.

  • For the sound designer working with mobile media, the sonic palette remains somewhat restricted and the production parameters limited, at least until the technical means afford greater possibilities with respect to aesthetic ends.

  • There is also the aesthetic challenge in mobile media when listening to good audioreproducing systems through headphones or earbuds that the audio imaging compared with the video imaging will be disproportionately greater.



[1] Thomas L. Friedman, The World Is Flat (New York: Farrar, Straus, and Giroux, 2005), p. 72.

[2] RSS is a family of Web feed formats used to publish frequently updated digital content, such as a podcast, blog, or news feed. The initials RSS are variously used to refer to Really Simple Syndication, Rich Site Summary, and RDF Site Summary. RSS formats are specified in XML (a generic specification for data formats). RSS delivers its information as an XML file called an RSS feed, Webfeed, RSSstream, or RSS channel. (From Wikipedia.)

[3] A URI, or Uniform Resource Identifier, is a compact string of characters used to identify or namea resource. The main purpose of this identification is to enable interaction with representations of theresource over a network, typically the World Wide Web, using specific protocols. URIs are defined ina specific syntax and associated protocols. A URI can be classified as a locator, a name, or both. A URL, or Uniform Resource Locator, is a URI that, in addition to identifying a resource, provides meansof acting on or obtaining a representation of the resource by describing its primary access mechanismor network “location.” (From Wikipedia.)

[4] An aggregator is client software that uses a Web feed to retrieve syndicated Web content such asblogs, podcasts, and mainstream mass media Web sites and, in the case of a search aggregator, a cus-tomized set of search results. Aggregators reduce the time and the effort needed to regularly checkWeb sites for updates, creating a unique information space or “personal newspaper.” (From Wikipedia.)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.233.245