Chapter 14. Internet Audio

Digital technology in general and the Internet in particular have provided the means to produce and disseminate audio on an unprecedented scale. Opportunities abound and include producing audio for myriad online venues and applications; recording, editing, and producing online sessions interactively; the marketing and distribution of your own work; hiring out your production services to others; podcasting; and creating audio just for the fun of it. With that said, the basic principles involved in creating a professional audio product are similar regardless of where production is carried out—in a major production facility with state-of-the-art gear or on a laptop. The goal is always to produce work of a high technical and aesthetic caliber.

Because digital technology, the Internet, and mobile media are rapidly and continually evolving, this chapter is necessarily more general in scope, touching on key technical aspects and production considerations.

Data Transfer Networks

The delivery and the playback of audio over the Internet relies on a system of computers configured and connected in a way that allows them to communicate with one another—in other words, a network. When you download a ringtone to your cell phone, stream a radio program to your laptop, share your music among an online community, or upload a podcast from your computer, the audio is transmitted from a source, generally a server, to a user or client across one or more networks. Think of the Internet as an extremely large network comprising many smaller networks.

Local-Area Networks

A local-area network (LAN) is a network configured for a small, localized area such as a home, classroom, or business. Generally, a LAN is more controllable than a wide-area network because it is small enough to be managed by only a few people. LANs often feature fast data transfer speeds because the connection types and the hardware of all computer systems on the network can be controlled. When developing audio for use on a LAN, an audio designer may take advantage of this fast connectivity to produce high-quality audio not possible on less controlled wide-area networks.

Wide-Area Networks

A wide-area network (WAN) is a network configured for a large geographical area. Often, a WAN is composed of several interconnected LANs. The Internet is the largest and best-known example of a WAN. A WAN is generally less controllable than a LAN because it comprises too many computers and individual LANs for any one person or group to manage. Some computers on a WAN may be quite fast, whereas others may be quite slow. When producing audio for the Web, a designer must take into consideration that WANs are extremely sizable and multifaceted.

Servers

A server is a computer configured on a network to provide information to other computers. Websites are stored on a server, which receives Web page requests from users and transmits the desired page. Any images, video, and audio associated with that page are likewise sent from a server across the network to the user.

Clients

A client is a computer connected to a network configured to make requests to a server. A client could be a personal home computer, an office workstation, or another server. Client computers are often the limiting factor when developing audio (or anything else) for use on the Internet. Whereas servers are fast, robust, and capable of quickly processing large amounts of data, clients come in all shapes and sizes. A video gamer might have an extremely fast computer connected to the Internet over a high-speed connection. On the other hand, a casual Internet user may use an older PC connected through a telephone modem at a very slow speed. Therefore, when designing audio for the Web, the limitations of the client computers must be considered.

Audio Fidelity

Networks, including the Internet, have a variety of limiting factors that reduce the speed at which data can be transmitted. The overall time it takes for information to move from a server to a client is known as the connection speed.

We are all familiar with the experience of clicking on a Web link and waiting for endless minutes as an audio file opens. Because audio files are often quite large, transmitting them from server to client can take a long time. To reach the largest audience possible, a Web sound designer must design audio for the slowest likely connection. The slower the connection speed, the poorer the sound quality.

Sound sent by a server across the Internet can and often does take longer to reach its intended client than it takes for the client to play it back. Waiting for long periods of time to download audio over a network is frustrating. A number of solutions have been developed to address this problem such as improvements in connection speed, reducing file size by manipulation, and data compression. These techniques allow information to flow faster across a network but in some cases not without a tradeoff in audio quality.

Connection Speed

The greater the connection speed, the faster a network can transmit information, including audio information. Connection speed is measured in kilobits per second (Kbps). For example, a connection speed of 150 Kbps can allow stereo CD-quality sound if played in real time. In contrast, if sound is transmitted or streamed in real time at a connection speed of 8 Kbps the result is a lower, telephone-quality sound (see Figure 14-1). However, most Internet audio is not transmitted in real time, so connection speed is viewed as a limitation on the speed of transmission rather than as a means of determining a transmission’s sound quality.

Table 14-1. Changes in audio quality in relation to connection speed.

Connection Speed (Kbps)

Mode

Audio Quality

Bandwidth (kHz)

Application

128-150

Stereo

CD quality

20

Intranets

96

Stereo

Near CD quality

15

ISDN, LAN

56-64

Stereo

Near FM quality

12

ISDN, LAN

24-32

Stereo

Boombox FM quality

7.5

28.8 modem

16

Mono

AM quality

4.5

28.8 modem

12

Mono

SW quality

4

14.4 modem

8

Mono

Telephone sound

2.5

Telephone

ISDN = Integrated Services Digital Network; LAN = local area network.

File Manipulation

Another way to improve Internet sound is by reducing the size of audio files because size reduction facilitates file transfer between computers. This can be done through manipulation and compression. To avoid confusion, the term compression may refer to either audio level or volume compression or to audio data compression where the amount of information in a recorded sample of audio is reduced to facilitate faster transmission.

There are different ways to handle file size reduction by manipulation such as reducing the sampling rate; reducing the bit depth; reducing the number of channels; reducing playing time by editing; and, with music, using instrumental rather than vocal-based tracks (see Figure 14-2).

Table 14-2. Approximate file sizes for one minute of audio recorded at differing sampling rates and resolutions.

Word Length/Sampling Rate

44.1 kHz

22.05 kHz

11.025 kHz

16-bit stereo

10 MB

5 MB

2.5 MB

16-bit mono/8-bit stereo

5 MB

2.5 MB

1.26 MB

8-bit mono

2.5 MB

1.26 MB

630 KB

1,000 KB = 1 MB

Often, these reductions in file size result in a loss of sound quality because many of them actually remove information from the file. Because the Internet is a huge, varied, and sometimes slow network, most audio distributed across the Web is reduced in file size to improve transmission speed, and therefore most Web audio is of reduced quality. However, if longer download times are acceptable, audio quality does not have to suffer.

Reducing the Sampling Rate

Some sounds, such as a solo instrument, a sound effect, and a straight narration, to name just a few, do not require a CD-quality sampling rate of 44.1 kHz. For these sounds, 32 kHz may suffice.

Reducing Bit Depth

Audio with a narrow dynamic range, such as speech, simple sound effects, and certain instruments, does not require greater bit depth. A resolution of 8 bits could support such sound with little difficulty. Of course, the wider dynamic range associated with music requires a higher bit rate—16-bit and higher.

Reducing the Number of Channels

Stereo audio uses twice as much data as mono. If an audio file can be handled in mono, it makes sense to do so.

Reducing Playing Time by Editing

Obviously, reducing playing time reduces file size. Of course, this is not always feasible or desirable, but when the opportunity arises, editing the file will help. In this case, editing refers to eliminating extraneous bits of data (as opposed to conventional editing). Three ways to handle such editing is through compression, noise reduction, and equalization.

Compression

Compression (in relation to dynamic range as opposed to data) reduces the distances between the peaks and the troughs of a signal. This raises the level of the overall signal, thereby overriding background noise.

Noise reduction is best handled by listening to the silent parts of a file to check for unwanted noise. If it is present, that section of the file can be cleaned, taking care to retain all desirable sound.

Equalization comes in handy to add the high frequencies that are usually lost in data compression.

Audio Data Compression

Audio data compression is the most common way to reduce the size of large audio files by removing psychoacoustically unimportant data. The process is facilitated by a codec, a device that computes enormous mathematical calculations (called algorithms) to make data compression possible.

Several audio compression formats are in use today; the most familiar are those developed by the Moving Picture Experts Group (MPEG). The MPEG approach to compression is based on what is known as acoustic masking. It works by establishing what the brain perceives the ear to be hearing at different frequencies. When a tone, called a masker, is at a particular frequency, the human ear cannot hear audio on nearby frequencies if they are at a low level. Based on this principle, MPEG developers determined various frequencies that could be eliminated, thereby thinning out the audio data contained in a file.

MPEG Layer 3 technology is commonly known as MP3. It is considered an excellent system for speech, sound effects, and most types of music. It supports a variety of sound applications and Web authoring environments. MPEG is not the only option available to the online audio producer.

File Formats

Because programs that play digital audio need to recognize the files before they can play them, files converted into digital audio must be saved in a particular format. The process adds an extension (a period followed by a two- or three-letter descriptor) to the file name, identifying it as an audio file. It also adds header information, which tells the playback program how to interpret the data. File identifiers specify the file name, file size, duration, sampling rate, bit depth, number of channels, and type of compression used.

A large number of both proprietary and open source formats are available on the Internet, and they come and go; a few, however, have been in relatively widespread use for a while (see Figure 14-3). These include MPEG formats (.mpa, .mp2, .mp3, .mp4), Windows Media Audio (.wma), WAV or Waveform audio format (.wav), RealAudio (.ra), QuickTime (.mov), and Flash (.swf) as well as Free Lossless Audio Codec (.flac) and Audio File Format Ogg Vorbis (.ogv, .oga, .ogx, .ogg, .spx).

Example 14-3. Common file formats and their associated extensions.

Lossless File Formats

Extensions

Wav

.wav

Aiff (Audio Interchange File Format)

.aif, .aiff

RealAudio

.ra

Broadcast Wave Format

.wav

Windows Media Audio

.wma

Free Lossless Audio Codec

.flac

Lossy File Formats

Extensions

MPEG Audio

.mpa, .mp2, .mp3, .mp4

ATRAC (used with mini disc)

 

Flash

.swf

Sun

.au

Ogg

.ogg

Metafile Formats

(Metafile formats are designed to interchange all the information needed to reconstruct a project.)

AES-31 (AES is the Audio Engineering Society)

OMF (open media format)

Nonstreaming Versus Streaming Audio

The difference between nonstreaming and streaming audio is that streaming allows audio data to be sent across a computer network so that no interruptions occur at the receiving end for what appears to be continuous audio playback. In practice, however, interruptions do occur with slower connections.

Downloadable nonstreaming formats require that the user first download the entire file to some form of media, be it a hard disk or a thumb drive. A plug-in application then plays the file. No specific server (transmitter) software is needed.

A key principle behind streaming technologies is buffering. As the player, which is built into or added onto a user’s Web browser, receives encoded data, it stores it in a buffer before use. After data builds a substantial buffer, the player (such as RealPlayer) begins to play it. All incoming data is first stored in the buffer before playing. Thus, the player is able to maintain a continuous audio output despite the changing transmission speeds of the data. Sometimes the buffer runs empty, and the player has to wait until it receives more data.

Streaming is useful if the desire is to have audio play automatically within a Web browser or Web page. Nonstreaming files often need to be started manually by the user and often take some time to download before playback can begin.

Online Collaborative Recording

Integrated Services Digital Network (ISDN) made it possible to produce a recording session in real time between studios across town, across the country, or overseas. Various software applications allow multiple users to connect to each other in real time and record or perform together. These systems generally require very fast connection speeds and robust server and client setups.

The Internet has also made it possible for individuals to collaborate by sharing files to produce such audio materials as spot announcements, music recordings, and sound effects. An entire project can be posted to a secure server and downloaded by one or more members of the production team, who can provide additional audio and then edit and mix the project.

The possibilities of this technology seem nearly limitless. With online collaborative recording, it will be possible to assemble a team of top sound artists from across the globe without requiring travel, relocation, or adjustments to personal schedules. Musicians from different cities or states could be engaged for a recording session controlled from a central control room. These musicians could report to studios near their own homes yet still perform together as a group just as they would in a single studio.

It will also be possible to arrange a recording session in one country but monitor and mix it elsewhere. Wireless technology makes it possible to record sound effects in the field and transmit them instantly to a studio control room. As Internet technology and connection speeds improve, the sound quality in such collaborative experiences will equal that of the best studios.

Podcasting

Podcasting is a development in Internet technology that allows users to create and distribute their own audio (and video) productions over the Web. The term podcasting is a combination of the words pod, referring to the iPod sound player, and casting, short for broadcasting. It points to the essential idea behind podcasting: to create an audio broadcast suitable for use on a portable MP3 player like the iPod.

From a technical perspective, podcasting is more related to Internet technology than to audio production. Much attention has gone into the development and the deployment of the various Web technologies that make podcasting possible, but less attention has gone into the good use of sound in a podcast. Typical podcasts feature poor-quality audio and amateurish production values. However, some podcasts are well done, usually on large, heavily trafficked websites that feature professionally produced audio—narration, music, and sound effects. They are essentially professional radio broadcasts distributed via the Internet.

Though the current emphasis in podcasting is with Web technology and Web distribution, the podcaster is still faced with many of the same limitations as any other developer who distributes audio over the Internet. File size is still of paramount concern. Though fast connections make the download of podcasts relatively speedy, it is still advantageous for a podcaster to compress audio and reduce file sizes to reach as wide an audience as possible. It is also important for the podcaster to consider that users may not be willing to upload a large podcast to an iPod or other portable MP3 player already full of other media. A smaller file size leaves a smaller footprint on an MP3 player, making storage and playback easier for the listener.

Most audio podcasts are distributed in MP3 format, which is playable on virtually all computers and MP3 players, although some MP3 players require a specific type of encoding for certain MP3 files to be played. Despite this limitation, MP3 is still the file type of choice for podcasting.

Creating a podcast MP3 is relatively straightforward. Computer technology notwithstanding, the equipment required is the same used in conventional radio broadcasting. What accounts for differences in sound quality is whether the podcaster employs consumer-grade equipment with no signal-processing capability or any of the higher-quality podcasting equipment packages available.

Conceptually, a podcast can take any number of forms. Some are released in sequence or on a schedule—daily, weekly, or monthly. Many are simply experiments or onetime releases. It is becoming increasingly common for news outlets to create podcast versions of some broadcasts (both audio and video) so that users can listen or watch on the go. Whatever the format, if the podcaster is serious about the product and building an audience, it pays to keep good Internet sound production practices in mind.

Audio Production for Mobile Media

Producing audio for mobile media such as iPods, MP3 players, cell and smart phones, and PDAs involves numerous considerations, especially where they intersect with multiple platforms and the parameters of Internet delivery. Unlike production for a state-of-the-art cinema experience or home entertainment center geared to the highest fidelity, producing audio for mobile media presents particular technical and aesthetic challenges that are generally without historical precedent. It is a paradox that with the opportunity to produce better sound than ever before—and to hear it through great systems—too many listeners are hearing their digital audio in compressed form, much of which reduces sound quality. To exacerbate the condition, many listen through mediocre headphones or earbuds, often in noisy ambient environments, or through a less-than-sonorous computer speaker or loudspeaker system. In relation to aesthetic satisfaction from mobile media, less is definitely not more.

You will recall that mobile devices are designed to handle compressed audio—based on the need to manage access speed and sound quality according to available bandwidth and storage capacity. Although this may seem purely a technical matter, the implications affect creative and aesthetic decision-making at every stage of production.

Playback through mobile media is often through small loudspeakers with a limited frequency response and dynamic range (such as inboard and outboard computer speakers of varying quality and cell phone speakers). The latter are designed with the human voice in mind and therefore have a correspondingly narrow frequency response. Earbuds are capable of handling a broader spectrum, but they too are limited in playing back acoustically rich and detailed audio because of their size and unnatural positioning as a sound source at the outer edge of the ear canal. Headphones, particularly those with noise-canceling capability, offer the better sound option for individual listening.

Editing and mixing audio for mobile media generally require careful and selective equalization in the upper and lower ranges of the program material; highs and lows are rolled off, and the midrange is emphasized. Dynamics usually must be compressed so that program material is heard clearly through tiny loudspeakers.

Apart from the technical challenges, the conditions for listening to mobile media for those on the go are not necessarily conducive to focused, attentive listening. People are more likely to listen to an iPod on the bus, while riding a bicycle, or while walking down a city street than in the solitude of a quiet room. These environments are often acoustically competitive and many sounds are masked by ambient din. To compound the situation, a listener may be engaged in any number of other activities—reading, texting, exercising, and so on. In a word: multitasking. The listening experience is therefore compromised on many levels and exists in a multisensory continuum of stimulation.

There is also the incongruous problem of the audio being too good compared to the video. That is, in some reproducing systems, it is bigger in “size” than the video and its sonic impact imbalances the sound-picture relationship.

As more and more content is created specifically for mobile media, from short-form webisodes to audio/video blogs, podcasts, and games, a new aesthetic is emerging. For the audio producer working with mobile media, the sonic palette remains somewhat restricted and the production parameters limited, at least until the technical means afford greater possibilities with respect to aesthetic ends.

Main Points

  • Computer technology in general and the Internet in particular have provided the means to produce and disseminate audio on an unprecedented scale.

  • The delivery and the playback of audio over the Internet relies on a system of computers configured and connected in a way that allows them to communicate with one another—in other words, a network.

  • A local-area network (LAN) is a network configured for a small, localized area such as a home or business. A wide-area network (WAN) is a network configured for a large geographical area.

  • A server is a computer configured on a network to provide information to other computers. A client is a computer connected to a network configured to make requests to a server.

  • Among the factors relevant to high-quality audio are the connection speed and the file size.

  • Reducing file size can be done by file manipulation or data compression.

  • File manipulation includes reducing sampling rate, bit depth, number of channels, and playing time.

  • Playing time can be reduced by editing through compression of the dynamic range, noise reduction, and equalization.

  • File formats facilitate the saving of digital audio files.

  • The difference between nonstreaming and streaming is that streaming allows audio data to be sent across a computer network with no interruptions at the receiving end, although in practice, interruptions do occur with slower connections.

  • The principle behind streaming technologies is buffering.

  • With streaming technology, the transmission process passes through the encoder, the server, the Internet, and the player.

  • Downloading nonstreaming data is usually slow and therefore limited to small files.

  • It is possible to do collaborative audio production online by uploading and downloading such audio materials as music, voiceovers, and sound effects. An entire project can be posted to a server and then downloaded by one or more members of the production team, who can provide additional audio and then edit and mix the project.

  • Podcasting is a development in Internet technology that allows users to create and distribute their own audio (and video) productions over the Web.

  • File size is of paramount concern to the podcaster. Though fast connections make the download of podcasts relatively speedy, it is still advantageous to compress audio and reduce file sizes to reach as wide an audience as possible.

  • Most audio podcasts are distributed in MP3 format, which is playable on virtually all computers and MP3 players.

  • Sound quality in podcasts varies widely for several reasons: The emphasis is more on achieving widespread distribution than on audio production values; equipment runs from consumer-grade to professional; and many podcasters have little or no experience with audio production techniques.

  • Unlike production for a state-of-the-art cinema experience or home entertainment center geared to high fidelity, producing audio for mobile media presents particular technical and aesthetic challenges that are generally without historical precedent.

  • Playback through mobile media is often through small loudspeakers with a limited frequency response and dynamic range.

  • Editing and mixing audio for mobile media generally requires careful and selective equalization in the upper and lower ranges of the program material; highs and lows are rolled off, and the midrange is emphasized. Dynamics usually must be compressed so that program material is heard clearly through tiny loudspeakers.

  • The conditions for listening to mobile media for those on the go are not necessarily conducive to focused, attentive listening.

  • There is the incongruous problem of the audio being too good compared to the video; in some reproducing systems, it is bigger in “size” than the video, and its sonic impact imbalances the sound-picture relationship.

  • For the audio producer working with mobile media, the sonic palette remains somewhat restricted and the production parameters limited, at least until the technical means afford greater possibilities with respect to aesthetic ends.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.56.28