4Audio Interfaces

4.1  Introduction to Audio Interfaces

In the previous chapter, practical examples were given for each of the layers concerned with the transport of data from one place to another. Many of the examples are related directly to the audio studio, such as those describing fibre and other types of cables. IP is also used within the audio industry, for instance in the AES24 recommendation, which details the control of sound systems over computer networks and is discussed in more detail later.

This chapter presents and explains the most common technologies and mechanisms used specifically for the transfer of audio data, with particular emphasis on those candidates used for delivering QoS for streams of data, as described for performance oriented. Such interfaces are already prevalent within the audio industry. Where possible, these are compared to the 7-layer model for reference to the computer-networking counterpart discussed in Chapter 3.

The technologies within this chapter are presented with the intention of exposing the inner mechanisms for the purposes of utilization and comparison with other communications technologies.

4.2  Audio Interfaces

As concluded in Chapter 1, standards play an important role in most technical fields, including audio and digital audio. Many of the audio interfaces presented in this chapter are based on a core set of standards, but contain enough differences such that two apparently similar products will not interconnect directly. For instance, the Sony and Philips initiative and the Audio Engineering Society (AES) produced related works, with the S/PDIF standard now considered to be a consumer deployment of the harmonizing digital audio standard IEC 958.

4.2.1  Topology

Professional digital audio equipment used in some recording studios is connected in a manner that reflects the unidirectional flow of signal towards an endpoint, such as a tape device as shown in the configuration in Figure 4.1. There may be no provision for data to be carried in the opposite direction between devices. Instead, the return path takes an alternative route through equipment, towards a new endpoint, the loudspeakers. Multiple destinations may be envisaged in the event that surround systems deliver digital information to the loudspeaker, or recording equipment is added for instance. Added to this is the necessity to monitor and change the assignments in small percentages during any mixing process, perhaps in a repeatable performance by recording control data.

Figure 4.1  Schematic-style diagram of signal path to multiple endpoints. Audio signal effectors connected together form a network with an intended direction to the signal flow towards the endpoint, such as loudspeakers (shown) or recording device.

To support a studio-related operation in this manner, the data must also remain synchronized throughout the entire process, whilst ideally maintaining as little delay as possible.

At a simpler level, the routing of audio through various paths, such as inside a mixing console, can be likened to routing over links in a network. The biggest difference when cabling audio connections compared to a network is that there is very much a direction to the audio, depending upon the connection of inputs and outputs. The engineer or operator makes adjustments to the audio signal’s destination within the console. An example of this is routing a single audio signal through successive buses, or to route the signal to some other processing such as a compressor or delay, via an effects send and return as shown in Figure 4.2.

Figure 4.2  Simplistic representation of audio routing assignments, showing connections made inside a console. In this configuration, channels assigning output to the auxiliary bus send the signal to an external effects rack before returning to the auxiliary return, and on to the stereo master output.

This is as opposed to computer networks where devices broadcast data directly onto a shared medium, and information contained within the transmission itself determines the destination.

Audio devices within professional audio studios are usually connected together directly, using a single dedicated cable between each device in a point to point topology.

The most common interfaces that make any external digital connections within Figure 4.2 describe point to point protocols. Several devices may be connected, although this is achieved by daisy chaining through further machines and each device receives the audio data in turn. As such, audio interfaces are designed to transfer data in real-time streams, with no QoS requirement for multicasting or destination information contained within the data stream. This differs from the QoS of computer networks, which developed with the efficiency of communication channels in mind. Computer networks are designed principally for the purpose of file transfer, with a QoS that is not time critical in nature.

The fundamental difference between the two types of interface is described by the difference in QoS. The area between the two is explored more fully in references to voiceover IP (Held, 1998), Internet radio, the Internet2 project, and IPv6. The discussion specifically concerns the ability of computer network technology, particularly the Internet, to deliver real-time multimedia information. It is also worth considering the QoS for other data types such as synchronization and control data as well as digital audio data, since these could also utilize simplified interconnection.

4.2.2  Cable Strategy

Point to point devices do not normally share a cable and attempt to transmit onto it at the same time. Instead, the connection is unidirectional. Unidirectional (or simplex) means that data only travel in one direction down the cable. Instead, the point to point connection employs a second communications channel, which might be an extra cable pair or fibre where an RX/TX connection achieves two-way communication. RX and TX cables are identical, since the TX (transmit) of one device connects to the RX (receive) of the other device, and represents input and output of the data stream.

In computer networks, the receipt of data is checked for errors so that lost information can be requested and retransmitted. In digital audio point to point connections, there is a minimum mechanism used, since the protocols, interfaces, devices, and topology are all built to service the QoS requirements of streamed media.

There are a number of types of digital interface in use throughout the professional and consumer audio worlds. Some of these are standards, and some of these are proprietary, or manufacturer specific. All of those in common use are designed to carry audio most often encoded into 44.1 kHz 16-bit word length at the very least. Most are also designed to carry other resolutions, including 32 kHz, 48 kHz, 96 kHz and up to 24-bit word length. Most carry streams of stereo information, and some can carry several channels at once.

It is not intended to detail the exact specifications of all the standards here, and descriptions of the interfaces are deliberately limited to outline information. The interested reader is directed to The Digital Interface Handboo. by Rumsey and Watkinson and the further reading section.

4.2.3  Audio Standards

Within the audio industry, the Audio Engineering Society has led the way in determining digital audio transfer standards. The AES is a professional society and not a standards body; its committees are accredited by ANSI and its recommendations have been the basis for many standards, especially those involving two-channel audio interfaces. Standards bodies such as ANSI, BSi, EIAJ (Electronic Industries Association of Japan) and IEC (International Electrotechnical Commission) administrate and co-ordinate the opinions of all interested parties when creating a standard.

The AES first published its viewpoints on two-channel digital transfer in the document AES3–1985. This work was undertaken to cover professional applications, and parallel work on a consumer version performed by Sony and Philips resulted in the S/PDIF interface for the CD system slightly earlier, in 1984.

Many of the standards bodies produced standards based on the AES3–1985 recommendations and each body assigned its own document identity to the standard. Therefore, the same standard may be called by a number of different names. Furthermore, although the standards are very similar there are enough differences to ensure that interconnection cannot always be assumed.

One of the important concepts to understand is that the format of the data is not necessarily related to the electrical characteristics of the interface as explained by the modularity of the 7-layer model in Chapter 2.

4.2.4  AES/EBU

The AES/EBU interface is perhaps one of the most popular standards in use in the audio facility and is described almost identically in four documents: AES–1992, IEC 958 (Type 1), CCIR Rec. 647 and also in EBU Tech.3250E.

The standard allows for two channels of digital audio to be transferred over a balanced interface. The interface specifies serial transmission of data, and the clock signal is contained within that data.

At the physical layer of the 7-layer model, the interface can use a range of media types, such as balanced or unbalanced, optical and coaxial cables.

Balanced Interface

All the standards referring to professional or broadcast use specify a balanced electrical interface conforming to CCITT Rec. V.11, and there are similarities between this specification and the RS422A standard, but although RS422 drivers are used in many cases, they are not identical.

A 110 Ω input impedance was originally recommended for output, cable, and input. Later revisions changed the wording slightly to recommended that the impedance value be the same as the transmitter and the transmission line.

The standard specifies audio XLR-3 connectors (IEC 60268–12 – see Notes and further reading), as used commonly in audio studios and illustrated in Figure 4.3.

Figure 4.3  IEC 60268–12 connector (also known as XLR or AES/EBU electrical connectors).

Pin 1 is the shield and pins 2 and 3 are the balanced data signal. The polarity is not essential, although manufacturers generally stick to the convention that pin 2 is assigned as ‘+’ and pin 3 is ‘–’.

4.2.5  S/PDIF and IEC 958 Type 2

S/PDIF is becoming more common in equipment with higher specification due to market forces. As the price of digital electronics falls, or more accurately as more digital power is offered for the same money, so it has become cheaper to offer digital audio manipulation that was previously only available to the higher budgets of the professional audio world. These types of electronics include sound cards for PCs which contain digital signal processors (DSP) designed for the manipulation and routing of audio. The kind of equipment available now for a few hundred pounds can match the functionality of equipment previously sold for several thousands of pounds, and follows the general trend of technology products.

Standards and Administration

The AES along with the European Broadcast Union (EBU) produced almost identical documents administrated together by the International Electrotechnical Commission (IEC).

The IEC is a worldwide organization for standardization comprising national electrotechnical committees. The object of the IEC is to promote international co-operation on all questions concerning standardization in the electrical and electronic fields. The IEC collaborates closely with the International Organization for Standardization (ISO) in accordance with conditions determined by agreement between the two organizations.

The consumer interface developed by Sony and Philips (S/PDIF) coincides with the AES3–1985 recommendations and IEC 958 is closely related to the S/PDIF interface. The main difference between the professional and consumer standard is that the consumer interface uses an unbalanced connection and recommends an impedance of 75 Ω. The standard does not specifically mention that unbalanced connections should be used within consumer products.

An optical interface is also available as a possible medium for transmission and is mentioned as ‘under consideration’ within the original IEC 958 committee documents. Since then the standard has been completely reviewed and renumbered accordingly, with the latest revision split into several parts. The new publication number is IEC 60958. This has four parts, three of which were published in 1999.

Part 1 describes a serial, simplex, self-clocking interface for the interconnection of digital audio equipment for consumer and professional applications, using linear PCM coded audio samples. It provides the basic structure of the interface (IEC 60958–1 – see Notes and further reading).

The second part is entitled IEC/TR3 60958–2 (1994–07) Digital audio interface – Part 2: Software information delivery mode. This part proposes a software information delivery mode to be used for the transmission of basic software information. This is a transmission format that uses a channel status bit, assigning mode bits 6 and 7 as identification bits.

Part 3 describes consumer applications and Part 4 describes professional applications intended for use with shielded twistedpair cables over distances of up to 100 m.

Physical Media

The standard was designed as an interface for the transfer of digital audio data to be included on consumer products such as CD players and DAT machines. Implementations of the unbalanced interface use RCA phono connectors (Figure 4.4) and coaxial cable, although optical fibre is also used on some audiophile equipment. Media filters are available to convert from fibre to coaxial cable or vice versa.

Figure 4.4  RCA connectors.

Optical Interface

The generally adopted optical interface is described within the document EIAJ CP-340 and consists of a transmitter using a wavelength of 660 ± 30 nm with a power of between –15 dBm and –21 dBm. Receivers complying with the specification should correctly interpret the data down to –27 dBm.

The preferred connector is specified in EIAJ RCZ-6901 and is shown in Figure 4.5.

Figure 4.5  EIAJ RCZ-6901 AES/EBU preferred optical connector. One connection is required for each direction of transmission Ð transmit and receive (TX/RX).

This interface is usually found in consumer products such as DAT recorders, sound cards (and other PC peripherals), audiophile standalone D/A converters and CD players.

The transmission mechanism is normally a light emitting diode (LED) which is cheaper to implement than the laser version. The receiving device contains a photodetector. The TOSLink interface from Toshiba is a popular implementation using a 0 to 5 volt source, with an identical data structure to the electrical interface.

Coaxial Interface

The possibilities for transferring AES3 data over coaxial cable are discussed in the AES SC-02–02 Working Group on Digital Input/Output Interfacing (Meeting, May 1999), where it is acknowledged that the provisional document AES-3id–1995 is very similar to SMPTE 276M–1995. This document specifies a 75 Ω video-like coaxial cable to carry audio signals over distances of 1000 m. The signal level is similar to that of video at around 1 volt, although the data structure remains the same as AES3.

Initial tests of the format were successful in transmitting an AES3 digital audio signal over 1300 km without any noticeable corruption.

Data Link

The lack of any requirement to account for shared access to the transmission medium within these interfaces results in a far simpler data link layer for audio interfaces.

Encoding

When data are placed onto the cable, time slots 4 to 31 are encoded in bi-phase mark scheme. The Manchester encoding mechanism is an example of mark coding, where a clock is incorporated with the data, thus having the advantage of ensuring the correct understanding of bit timing and boundaries during the transmit/receive process, which is known as clock recovery.

Each bit is represented by a symbol comprising two consecutive binary states; the first state of a symbol is always different from the second state of the previous symbol. The second state of the symbol is identical to the first, if the bit to be transmitted is logical 0. However, it is different if the bit is a 1 (see Figure 4.6).

Figure 4.6  Bi-phase mark coding implementation in IEC 60958.

The original version of the standard from 1985 specified a voltage of between 3 and 10 volts but the standard was changed in order to conform more closely to the practice of many manufacturers of connecting an RS-422 driver directly between the two legs of the source.

Framing

Audio data are sampled and each sample is then placed within a word of a fixed length using the PCM mechanism. The recommendations describe two samples, one from each of the two stereo channels, placed within a subframe and transmitted over one sample period.

The function of the subframe is to indicate the start and end of each sample. The starting signature of the subframe consists of one of three patterns, which deliberately breaks the rules of biphase mark coding, in order to make it easily identifiable by the receiving device. A further 4 bits of additional data are also carried within the subframe, and this extra space can be used for a number of purposes such as a buffering zone, in case increased word lengths are used during the sample phase. Other information contained within the subframe is the validation bit, channel status, a user assignable bit and a parity bit.

Some flexibility is accounted for within the standard, allowing payloads of 20 and 24 bits or fewer. A subframe consists of 32 time slots, numbered from 0 to 31 as shown in Figure 4.7.

Figure 4.7  IEC 60958 subframe format. Source: IEC 60958–1 Digital Audio Interface Ð Part 1: Genera. (1999).

Slots 0 to 3 carry one of the three permitted preambles and slots 4 to 27 carry the audio sample word payload. The MSB is designated as slot 27.

When a 24-bit coding range is used, the LSB is located in slot 4 and when a 20-bit coding range is used, the LSB is located in time slot 8, with time slots 8 to 27 carrying the audio sample word. Slots 4 to 7 are then left for other applications, and known as auxiliary sample bits. If the source provides too few bits, the auxiliary bits are set to 0.

Error Checking

The parity bit is set such that the number of 1s within a word is always even, and in the event that a parity error is detected, an error handling technique is invoked. In the event of an error burst, where multiple parity errors are detected, muting is applied.

Information and instruction are encoded into the stream using the channel status bit, which is removed by the receiving device and stored in consecutive order to create a 24-byte word, every 192 frames. Each binary bit or group of bits within this 24-byte word has a specific meaning relating to, for instance, the negotiation of sample rates, or some other interface instruction or operation.

Channel Status

Apart from the difference in physical interconnections, another key difference between consumer and professional interfaces is mentioned in the context of the channel status bits. The data format of the subframe specified for the consumer interface is the same as that used in the professional format, although the use of the channel status bit is significantly different. The second byte in the consumer interface is set aside for category codes, which are used to determine the type of consumer usage. The user bits of the subframe carry information such as track identification and cue point, and ensure that the track start ID is transferred along with the audio data.

4.2.6 AES10Ð1991,  Standard Multi-Channel Interface

In 1991, a number of digital audio manufacturers combined to propose a multi-channel serial interface with the working title of AES10–1991. The interface described within the document became known as MADI (multi-channel audio digital interface). The MADI interface is also based on the AES3–1985 recommendations and is designed to be transparent to the AES/EBU data.

General Description

MADI allows for up to 56 channels of audio information, and has applications within large-scale digital routing systems and the interconnection of multi-channel audio equipment. MADI uses serial transfer of information and a much higher transfer rate in order to move the increased amount of data around.

Physical

MADI is intended to be asynchronous in nature and therefore devices are locked to a common clock signal, which uses the same reference signal specified within the AES/EBU recommendations.

A new recommendation within MADI, required because of the increased amount of data, is that the data rate is specified as 125 MB/s, regardless of the sampling rate or number of channels.

The maximum cable length between two MADI devices is specified at 50 m, although longer distances can be achieved by using fibre as the interconnecting medium.

Frames

In order to achieve similarity with the AES3 recommendations, AES10 uses the same basic subframe structure with either 20- or 24-bit audio data, along with the same status bit structure from AES3. In order to transfer multiple channels, there are some important differences that are worth detailing.

The first 4 bits of the subframe do not break the rules of bi-phase mark coding in order to mark the start of a frame as in the AES recommendations. Instead, the bits are used as header information. Additionally, the AES/EBU frames are linked together to form a superframe containing up to 56 AES/EBU type subframes (Figure 4.8).

Figure 4.8  MADI superframe, showing time-division multiplexing by channel. Source: Rumsey, Francis and Watkinson, John (1995) The Digital Interface Handboo., Second Edition, Focal Press, p. 127.

As per AES3, audio is sampled using the PCM sampling method and placed within a fixed length word. Within the MADI superframe, audio samples must not use different sampling rates, and so when a change to one channel’s sample rate is required, MADI enabled devices change the sample rate for all channels at once, so retaining synchronization.

Addressing

The word is surrounded by the AES3 subframe structure, and each of the 56 subframes is placed side by side to create the superframe. Audio channels are correctly identified within the superframe by using this simple ordering process. As such, a primitive addressing scheme is achieved without the need for address space within the bit structure of the frame.

The regularity requires a constant data rate within the communication channel, and a padding mechanism is added to fill any unused gaps. The regularity in the frame structure enables the MADI frame structure to be used in audio routing applications. By using this stable structure, multi-channel matrix mixers and routers have been developed to direct the audio channels to the correct destination (see the Studer, Pro-Bel, etc. product ranges).

4.3  Control Data

The difference between digital audio data, MIDI, synchronization and any other kind of control data is a distinct one. Digital audio data contain audio information which, when decoded, can be detected by the ear. Control data, however, are simply data designed to control a device. As mentioned in Chapter 1, the device may be a toy car or a musical instrument, and the control data used to control it are inevitably designed specifically for that purpose.

4.3.1  MIDI

The musical instrument digital interface (MIDI) protocol provides a standardized and efficient means of conveying musical performance information as electronic data, which has been in use for some time. MIDI information is transmitted in ‘MIDI messages’, which instruct a music synthesizer or sound module on how to perform a piece of music, comparable to the human task of reading sheet music. The synthesizer receiving the MIDI data generates the actual audio.

General Description

The MIDI interface on a MIDI enabled device will generally include three different MIDI connectors, labelled IN, OUT, and THRU. The devices are connected in a chain, with the OUT or THRU port of the first device connected to the IN port of the second device. The OUT or THRU port of the second device in the chain will be connected to the IN port of the third device and so on. The MIDI data stream originates from a source, such as a musical instrument keyboard, or MIDI sequencer. A MIDI sequencer is a device which allows MIDI data sequences to be captured and stored by saving files on, for example, a computer. The files can then be edited, combined, and replayed. Commonly this is implemented within software on a PC, although dedicated sequencing devices are also available. Figure 4.9 shows a simple MIDI system, consisting of a MIDI sequencer and/or MIDI sound modules. It should be noted that a sequencer or DAW could replace the device represented as a workstation. In the case of a DAW, the cabling is further confused as the audio network returns to this machine, creating a loop connection between control and audio networks.

Figure 4.9  Simple MIDI network, as shown within box B, illustrates the concept of a separate cabling requirement for control data, when combined with the appropriate audio network in box A.

Standards and Administration

The MIDI 1.0 Detailed Specification provides a complete description of the MIDI protocol.

Although the MIDI Specification is still called MIDI 1.0, the original specification was written in 1984 and there have been many enhancements and updates to the document since. Besides the addition of new MIDI messages such as the MIDI Machine Control and MIDI Show Control messages, there have also been improvements to the basic protocol, adding features such as Bank Select, All Sound Off, and many other new controller commands.

Until 1995, five separate documents covered basic MIDI, additions (MSC & MMC), Standard MIDI Files and General MIDI.

In January 1995, the latest versions of these documents were compiled together into the 95.1 version. The basic MIDI specification that was used within the 95.1 compilation was version 4.2, which was a compilation of the Detailed Specification v4.2 document and the 4.2 Addendum. Version 95.1 integrated the existing documents and fixed some minor errors.

The MIDI Manufacturers Association was formed in 1984 as a trust to maintain the MIDI specification as an open standard and provides forums for discussion of proposals aimed at improving and standardizing the capabilities of MIDI-related products. The MMA provides a process for adoption and subsequent publication of any enhancements or clarifications resulting from these activities (MIDI Manufacturers Association Inc. – see Notes and further reading).

Physical Interface

According to the MIDI 1.0 Specification, the only approved physical connector is a 5-pin DIN plug as shown along with the pin designations in Figure 4.10. It is also possible to send MIDI messages using other connectors and cables and due to the limited space on many PC adapters, many manufacturers use either a serial port or a joystick port to connect to MIDI instruments. A few MIDI instruments are actually equipped with an 8-pin mini DIN serial port, which makes it possible to connect those devices directly to some computers. However, the MMA (MIDI Manufacturers Association) does not currently approve the use of any other connectors for MIDI 1.0. Furthermore, although many Sound Card MIDI adapters are available, not all are designed according to the electrical standards defined by the MMA.

Figure 4.10  MIDI specified connector and pin assignments.

Data Interface

The MIDI data stream is a unidirectional asynchronous bit stream transmitted at 31.25 Kbits/s. Note that many MIDI keyboard instruments include both the keyboard controller and the MIDI sound module functions within the same unit. In these units, there is an internal link between the keyboard and the sound module, which may be enabled or disabled by setting the ‘local control’ function of the instrument to on or off, respectively. When set to off, the instrument will not sound when it is played, but MIDI messages representing the performance will be transmitted in the normal way, allowing other sound modules to be played remotely.

The limitation imposed by the speed of the interface can cause problems, especially with the advent of more complex and realtime controls being made over instruments in the studio. For instance, it is not uncommon to perform all operations on a musical score within a computer running some form of sequencing software. The software may control all aspects of many sounds, including very slight variations in obscure real-time parameters. Some or all of the parameters of the sound may change constantly throughout a performance, and each slight change will be sent to the instrument via a new MIDI message. When this happens for a number of different parameters and musical instruments on the same MIDI network, then the number of messages quickly mounts up, and some messages may be lost as devices correct timing problems, or playback occurs in an untimely fashion. In the worst cases, careful routing should be considered so that data are selectively transmitted over different cables.

MIDI is particularly interesting in terms of transfer, since it is the first example from the audio industry to introduce any real addressing information into the bit structure of the data.

The physical interface is divided into 16 logical channels by the inclusion of a 4-bit channel number within the applicable MIDI message types. A MIDI enabled device, such as a musical instrument keyboard, can generally be set to transmit or receive on any one of the 16 MIDI channels. A MIDI sound source, or sound module, can be set to receive on specific MIDI Channels. In the system depicted in Figure 4.9, the sound module would have to be set to receive the channel that the keyboard controller is transmitting on in order to play sounds.

Although only 16 channels are available in the original MIDI specification, some sequencer software can support enhanced versions of the MIDI specification, allowing multiple networks and increased channel numbers, thus allowing instruments to be split onto physically separate networks. In such cases, each physical network will support 16 channels exactly as per the MIDI 1.0 specification. These enhancements are usually in the form of software routing programs that can assign a MIDI message output to one or other hardware MIDI physical interfaces designed to work with the software.

Addressing

The limitation for the number of channels comes from the 8-bit status information header of each frame, as described in the specification. The first 4 bits of any MIDI frame indicate the message type, and the second 4 bits indicate the channel number. Since only 4 bits are assigned for the channel number, the maximum number of channels that can be represented is 24 (i.e. 16).

Figure 4.11 illustrates the structure of MIDI messages, which can be up to 3 bytes in length, although not all messages require the frame to be this long.

Figure 4.11  General structure of a MIDI message. The ÔsssÕbits are used to define the message type, the ÔnnnÕbits are used to define the channel number, whilst the ÔxxxxxxxÕand ÔyyyyyyyÕ bits carry message data. Source: Rumsey, Francis (1994) MIDI Systems and Contro., Second Edition, Focal Press, p. 42

Each device on the network will be assigned to a channel number, although this does not have to be unique for each device, allowing multiple devices to respond to any particular message.

Using Figure 4.9 as an example again, two devices are connected together in point to point fashion, using a single MIDI cable. A performer plays instrument A, and this device is set to transmit information about the performance to MIDI channel 07. Provided that instrument B is set to receive on channel 07, then it will respond to the data being sent from instrument A, and will also sound. A maximum of seven devices can be attached in this way and if each device is set to receive to channel 07, each device will sound in response to the performance.

4.3.2  AES-24

AES-24 makes possible the control and monitoring of, via a digital data network, different audio devices from disparate manufacturers using a unified set of commands within a standard format (AES-24–1–1999, 4.1 Function – see Notes and further reading).

General Description

AES-24 is a recent work undertaken by the Audio Engineering Society. It is an application layer protocol, as described by the title, although links to other layers cannot be avoided, and the relevant definition in context of the 7-layer model is ‘With appropriate software, AES-24 commands are capable of being carried by most modern transport networks’. (AES-24–1–1999, 0.8 Transport networks – see Notes and further reading). This suggests that the full AES-24 committee work will cover aspects of the top three layers of the 7-layer model.

Standards and Administration

The Audio Engineering Society has a structure of technical committees responsible for discussing different areas of audio development. One such council is titled the Technical Committee on Networking Audio Systems.

Each council may spawn a working party for any area that it feels is worthy of more investigation, and each such committee may decide that certain aspects of their scope are better left to other subcommittees, in order that best efforts are concentrated within a particular area, as intricacies unfold.

SC-10, formally known as the Audio Engineering Society Standards Committee (AESSC) SC-10 Subcommittee for Sound System Control, is the only control-related standards group within the AES. The group was the outgrowth of an earlier AESSC working group titled the WG-10 Working Group for Sound System Control, formed in the early 1990s.

In 1992, following the successful publication of AES15–1991, AES Recommended practice for sound-reinforcement systems – Communications interface (PA-422), the SC-10 Subcommittee on Sound System Control saw the necessity to upgrade the standardization of sound system control to use the higher speed networks and efficient programming techniques then coming into use. SC-10 put forward a vision of a protocol that would be extensible and interoperable. Because the next available AES numbered standard was then AES24, the new protocol was dubbed AES-24 (Audio Engineering Society, Inc., 1999).

AES-24 is intended to produce four definitions, whose titles explain their roles in the overall development of the single protocol:

AES-24–1

AES standard for sound system control – Application protocol for controlling and monitoring audio devices via digital data networks – Part 1: Principles, formats, and basic procedures.

AES-24–2

Data types, constants, and class structure.

AES-24–3

Transport requirements.

AES-24–4

Internet protocol (IP) transport of AES-24.

Part 1 of the standard explains the concepts and defines a hierarchical structure, in the fashion of object orientation. The application of this structure is described in more detail in Part 2 of AES-24. Although the remaining two parts are not fully defined, Part 3 would appear to describe the necessary QoS and interfaces to the transport layer, and Part 4 to consider the use of IP as a network protocol. Unlike MIDI, no attempt is made to define a physical interface.

Object Hierarchy

The protocol imagines devices consisting of objects. Each object will perform some control or monitoring function, such as increasing, decreasing or monitoring the output level of a particular audio channel. Some or all of the objects are presented to the network and those that are made available can be controlled by messages from the network, or can transmit messages onto the network.

Each object is an instance of a class, and objects contain methods, parameters, and events. Each object is identified by an address unique within the device, and each device will therefore require a unique address on the network (again unlike MIDI, but similar to digital data networks). Each object also contains data that defines its status. The objects presented to the transport network by a device define the logical transport interface of the device.

Considerations

Without strong support from manufacturers, the momentum for a common control protocol came from the consulting and systems integrator community, who stood to benefit the most from a standard.

Without a clear ability to profit from a standard, it was difficult to keep the attention of the very companies for whom it was intended. In the end, AES-24 languished, although some of the ideas presented within the committee structure have been adopted within other significant industry initiatives.

4.3.3  Advanced Control Network

In recent years, a parallel effort to AES-24 has evolved within the Entertainment Services and Technology Association (ESTA – see Notes and further reading). Their work on a common control method for entertainment systems is now headed in a similar technical direction to AES-24. In contrast to the SC-10 AES-24 effort, approximately 50 companies from the entertainment industry financially support the ESTA Technical Standards Program.

The advanced control network (ACN) is intended to provide the next generation standard for the distribution of data in lighting control networks. However, ACN is not limited to lighting with work undertaken for audio support for control and stage automation.

The advanced control network (ACN) protocol task group has the direct involvement of nine companies who provide engineering-level support for the development effort. Most notably, the ACN task group has asked for liaison support from the AES in their effort to include sound system control in their protocol.

Michael Karagosian (MKPE Consulting – see Notes and further reading) was appointed the chair of SC-10–02 and also worked with the ACN group within ETSA, creating a coupling between the standards-making groups, and allowing the work performed by either to cross over.

4.3.4  Conclusion to Control Data

MIDI deals with the physical layer and assumes a dedicated network, whilst AES-24 makes no presumptions about how the messages arrive. Extending this, ACN specifies IP as the interface of choice, allowing the protocol to exist on many different media and network types. This means that the format of messages and how they can be interpreted is concentrated upon, allowing IP to handle the network and transmission functionality.

MIDI was limited by the technology that was available and yet solved the whole interconnection problem for a limited set of instructions, whilst AES-24 and ACN build on modern techniques to create a more flexible set of instructions, leaving the lower layer technicalities alone.

In terms of Internet working, MIDI has an address space of only 4 bits, making the total possible number of addresses just 16. AES-24 and ACN assume that a modern data network can be capable of carrying the messages, and makes no comment on their transport.

4.4  Synchronization and Timecode

Synchronization information is intended to allow several devices to operate with a common understanding of time. This is especially required in video applications, for instance where speech needs to match correctly with the video, so that when someone on the screen is talking, the words sound at the same time. Any delay beyond a human perceived tolerance will be immediately noticeable and result in a less than perfect experience for the viewer.

Synchronization is extremely important in the audio and visual fields, and especially so in post-production, which involves a combination of both fields. Several synch standards are in use within the audio industry, with several more related to the visual industries. Specification is through timecode, which allows the identification of a particular position in time, related to a start position used for locating a particular video frame in a recorded sequence for instance.

A timing signal can be generated in two main ways. The first method is to use a dedicated synchronization reference signal generator designed for the job. There are two main advantages to using such devices. The first is that they are often designed to send out a variety of different synchronization signals, each synchronized exactly to the other, since each signal is generated from the same timing. The second advantage is the supply of multiple ports on the unit with which to connect to other devices. In this way, the signal need not be looped through several devices in a single chain, and can be patched into different equipment with ease.

The second method for providing a clock signal is by assigning one of the items of equipment to become the master clock signal. Most items of equipment, such as DAT recorders and DAWs that have a need to be synchronized, already come supplied with an internal clock that can be used to send to other devices by setting the device to master and all the other devices to slave. This method does not usually include any kind of information about time, relative to an actual time, or any other fixed point (such as start of reel, as in the relevant sections).

This clock master arrangement commonly provides an immediate understanding of time between two devices, but does not generally provide a tool for positioning in time, and so the structure of the signal may be somewhat simplified compared to timecode proper, and is more correctly referred to as a clock signal.

Synchronization techniques take many forms and different mechanisms are used for different purposes. It is not the intention to describe all the intricacies of each technique, since this subject would fill a complete book on its own (Ratcliffe, 1996).

Instead, a selection of techniques will be examined in the context of shared access, addressing, and distribution.

As we have seen from previous chapters, it is necessary to synchronize two devices that are exchanging information in order that streams of binary information can be interpreted accurately over time. At this level, synchronization is required in order that two communicating devices understand where the boundary for each bit lies, so that consecutive bits can be identified and interpreted.

To achieve the identification of bit boundaries, techniques such as bi-phase mark encoding identifies the bit boundary with a transition in the bi-state, from high- to low-level signals for instance.

Such techniques become an integral part of the stream of information, and perform the specific requirement of synchronization between the two devices for the purpose of communication, but do not necessarily perform any synchronization with any other part of the process of audio or A/V production.

4.4.1  The Society for Motion Picture and T elevision Engineers

SMPTE is an international, technical society devoted to advancing the theory and application of motion-imaging technology including film, television, video, computer imaging, and telecommunications.

The Society was founded in 1916 as the Society of Motion Picture Engineers and the ‘T’ was added in 1950 to embrace the emerging television industry. The Society is recognized as a leader in the development of standards and authoritative, consensus-based, recommended practices and engineering guidelines.

The synchronization standard used throughout the audio and visual industry has the title SMPTE 12M–1995: for Television, Audio, and Film – Time and Control Code (available from http://209.29.37.166/stds/index.html email: [email protected]) and specifies a digital time and control code for use in television, film, and accompanying audio systems operating at 30, 25, and 24 frames per second.

Within the standard, time representation in a frame-based system is described within clauses 4, 5, and 6. To illustrate the problems of understanding time, clause 4 begins with a definition of NTSC time. NTSC is the American system of television, and NTSC time is defined against real time as 1 s NTSC time = 1.001 s real time. Whereas, in those sections that cover 25 and 24 frames per second systems, 1 s is defined as being exactly the time taken to scan 25 or 24 frames, respectively.

Clause 7 of the proposal describes the structure of the time address and control bits of the code, and sets guidelines for the storage of user data in the code. This consists of 16 4-bit groups, split into two halves of eight 4-bit groups. The first of these contains the timecode represented as hours, minutes, and seconds in such a way as to use 26 bits. The remaining 6 bits are used to determine the operational mode of the time and control code. Clause 8 specifies the modulation method and interface characteristics of a linear timecode source.

Clause 9 specifies the modulation method for inserting the code into the vertical interval of a television signal and clause 10 summarizes the relationship between the two forms of time and control code.

SMPTE reference signals include several standards such as: SMPTE 303M, SMPTE RP 154–1994, SMPTE RP 176–1993, SMPTE 274M–1995, SMPTE 295M–1997 and SMPTE 296M.

During 1998 discussions within the AES SC-06–02 working group on IEC61883–6 (mLAN – see IEEE 1394), the proposal noted that traditional SMPTE timecode may not have sufficient precision or accuracy to act as a timecode reference going forward. Furthermore, it was noted that adoption of a global time standard by SMPTE might fit well into further refinement of timecode standards. The SMPTE Workgroup: SMPTE Reference Signal Workgroup, began work on this and proposals centred on using GPS time and navigational data (SC-06–02-C task group on Synchronisation in IEEE 1394 – see Notes and further reading).

It should be noted that different visual standards have different bit assignments. In the 625-line 50 Hz system used in Europe (PAL) and the 525-line 60 Hz system used in the United States (NTSC), bits 10, 11, 27, 43, 58, and 59 do not carry time or user data.

4.4.2  Longitudinal Timecode (L TC)

When videotape recorders were initially developed, a voice-quality audio track was incorporated along with the high quality audio track, designed for the purposes of talkback. The playback rate of 30 frames per second, along with the bandwidth of the track, allowed a digital signal of 2400 bits/s to be recorded. This yields an 80-bit word, permitting 280 combinations to be represented in each frame.

LTC requires 26 bits (226) to represent frame accurate time in hours, minutes, and seconds. The spare capacity becomes user-assignable bits, and these are grouped together in 4-bit words and are used for a variety of purposes, such as date, take or reel numbers.

When applied to video and audio recording, the code can be continuous and evenly spaced and runs the length (or longitude) of the tape, much like a system of evenly placed pulses. Alternatively, the code represents the time of day, which is the implementation used most often on multiple reel video production. In this case, information included within the code represents the time of day that the video sequence was shot. Each code word starts at the clock edge, immediately before the first bit (bit 0) with 80 bits per frame. The available data rate varies depending upon the implementation, so for instance 24 frames/s systems achieve a rate of 1920 bits/s, whilst 30 frames/s systems yield 2400 bits/s.

The second application for LTC is within television recorders, where different international standards have resulted in variations on the original.

The EBU and the Society for Motion Picture and Television Engineers (SMPTE – see Notes and further reading) standards have been incorporated into IEC standard 421:1986. This has been implemented within the EBU as standard N12:1994, in the UK as BS 6865:1987, and in the USA as SMPTE 12M:1995 (which also encompasses High Definition Television uses). The bit assignments for the entire 80-bit packet are shown in Figure 4.12.

Figure 4.12  SMPTE LTC bit assignments within the 10-byte word. Eight of the bytes carry time and control data, and two carry synchronization and direct information. Source: Ratcliffe, John (1997) Timecode, A Users Guid., Second Edition, Focal Press, p. 32.

4.4.3  Vertical Interval Timecode (VITC)

One limitation of LTC that may not be immediately apparent is that it is difficult to read when the tape is stationary or being moved very slowly, such as may occur when trying to find an exact point within the tape in which to perform an edit.

The solution is to use those lines within the vertical scanning interval within the visual medium which are not used to carry test signals or other information to carry a new timecode. When the tape is stationary, the rotating head can read and therefore generate the timecode information.

As the active line period in 625/50 and 525/60 systems is around 52 μs, it is possible to incorporate a 90-bit code into one or more spare lines; however, the actual lines specified within the two systems is different. Data begins with two synchronizing pulses followed by 8 bytes of time and user information, which is in turn followed by 1 byte of cyclic redundancy check code as shown in Figure 4.13. The CRC word takes the place of the synchronization word used within LTC and although there is no automatic error correction, other techniques are employed to improve the immunity to errors.

Figure 4.13  The VITC word comprises 8 data bytes containing time and control information, followed by a single byte for error detection. Each byte is preceded by two synchronizing bits. Source: Ratcliffe, John (1997) Timecode, A Users Guid., Second Edition, Focal Press, p. 42.

4.4.4  AES11

Understandably, the AES also has some recommendations regarding synchronization of digital audio signals, and these are to be found in the document AES11–1991. The document describes synchronization in terms of sample frequency and phase synchronization and recommends that all machines should be synchronized to a reference signal taking the form of a two-channel interface signal of a stable frequency and within defined tolerances. The paper also recommends that each machine has a separate input for the synchronization signal.

Multiple signals are considered to be synchronous when they have the same sampling rate, although small differences are accounted for by allowing phase errors to exist between the reference clock and the digital audio data. This allows for effects such as propagation delay in the cables and so on.

The frame boundaries at the start and stop of the frame of the input signal must be within ±25% of the reference signals frame boundary and the output should be within ±5%.

Two grades of signal are specified, grade 1 and grade 2. Grade 1 has a specified long-term frequency tolerance of ±1 ppm and is intended for larger facilities, which may run several studios from a single reference signal. Grade 2 is intended for single studios, where a greater need for accuracy is not required, and specifies ±10 ppm.

4.4.5  MIDI and Timecode

As we have seen, there are two types of MIDI message: system and channel. System messages start with a 4-bit system message code, followed by a message type code. A system message has its header set to 1111, and 16 types of system message are available (although not all are implemented). MIDI clock signal is transmitted relative to the rate of the music being played through the MIDI interface and is defined as 24 clock periods per quarter note (ppqn – crotchet). These messages are best thought of as a metronome, as they contain no time-related information. The start message causes a sequencer to start playing its sequence from the beginning, and the stop message stops the sequencer playing. The continue message is used to tell the sequencer to start playing at whatever position in the sequence the pointer has reached. The pointer can be positioned anywhere within the sequence by counting the intervals from the start of the sequence.

Synchronization between externally generated clock signals and MIDI can be achieved, although this is sometimes difficult because of the tempo-related clock signal used within MIDI.

It is often a requirement for equipment using MIDI and IEC timecodes (such as longitudinal timecode and vertical interval timecode covered in more detail in the following sections) to be interfaced, especially in post-production environments. To accommodate this, a system of representing real-time within MIDI has evolved called MIDI timecode (MTC).

MIDI Timecode

MTC is specific to MIDI enabled equipment and is designed to be transmitted over the MIDI interface. Most modern MIDI interfaces can convert longitudinal timecode to MTC for the purpose of synchronizing a computer sequencer to a tape or other recording machine.

In order for MTC to be accepted by MIDI machines, it must take the same form as other MIDI messages. Therefore, it must have a status byte and data bytes. There are two main types of message sent over MTC. The first and most common is the running timecode and is known as a quarter-frame message. This can be likened to sending seconds over the MIDI interface.

The second type of message is a one-off information message such as might be sent when rewinding a tape, to indicate the time position that has been reached during the rewinding process. This is sent as a universal system exclusive (sysex) message. Each message type is identified by the header byte.

MTC Quarter-Frame Messages

The quarter frame message is preceded by a Systems Common Header (hF1) and is used to send out real-time data and so is regular over time.

Longitudinal timecode and vertical interlude timecode assigns two binary groups each to hours, minutes, and seconds. This is too much information to fit into a MIDI frame, and so the information is split into eight different frames. In simplified terms, the frame is made up as illustrated in Figure 4.14.

Figure 4.14  General format of the quarter-frame MTC message. Source: Ratcliffe, John (1997) Timecode, A Users Guid., Second Edition, Focal Press.

The first bit is a zero, and the remaining 7 bits of the word represent whether the message contains frames, seconds, minutes or hours. The next 4 bits represent the actual decimal value of that particular time division. To reassemble the data from the eight quarter-frame messages into an understandable time value, the messages are paired up in the receiving device to form 8-bit bytes.

Full-Frame Messages

The system exclusive header (FØh) precedes the full-frame message. Full-frame messages are generally used in situations where it is impractical to send the quarter-frame messages. An example of this would be during a fast rewind of a tape machine, where too much data would be sent over the slow link if a message was sent for every time position in the sequence which the pointer covered during the rewind process. In these instances, a one-time update of the current position of the tape (relative to time) is sent at representative intervals in a full sized frame of 10 bytes.

There are three types of full frame message. These carry real-time data, binary user group data associated with the IEC code, and finally set-up messages.

Full-frame messages are a minimum of 10 bytes, and are prioritized and so may be sent out in the middle of a message stream.

Synchronization with other Sources

The problem with synchronizing MIDI to external synchronization signal sources is in converting tempo-related messages into real-time-related messages.

The two main approaches are to use a synchronizer with an integral timecode to MTC converter, or to use a dedicated synchronizer to convert timecode into MIDI clock or MTC, thus enabling the operation of a sequencer to control the remainder of the MIDI set-up.

4.4.6  Conclusion to Synchronization and Timecode

Unsurprisingly, none of the standard mechanisms for synchronization contain any form of addressing, since synchronization is a matter of distributing information about time. Therefore, distributing a synchronization signal is generally a matter of checking with a source reference, in the same way that weights and other standard units of measurement (including time) all relate to a source for reference. As a system of measurement, these sources are considered infallible and not subject to change, and all other references to the base measurement must be made against the source reference. All other references to that unit of measurement (such as rulers, weights, and clocks) can eventually be traced back to the source reference.

This can also be done with measurements of time, although this is obviously impractical when many electrical devices are distributed over a great distance. Instead, all the devices swap and compare their understanding of time, and negotiate amongst themselves to establish a common understanding. This is not as complex as it sounds and the method is frequently employed in both the computer industry and the audio industry.

The important concept is that the digital information is discrete, which in this case means that the process for encoding the analogue information is clocked against an assumed time. If the information is then decoded using a different measurement of time, the mechanism will not necessarily error (unless programmed to do so), but will instead assume that the time is the same. This provokes thoughts of playing back a record at a different speed, but the intention is to illustrate that if the two clocks are a tiny fraction of a second out of time with each other, then the result will be a less than perfect time relation during playback. In multi-track and A/V situations, this may result in audio tracks that are out of synchronization with the visual experience. However, if two machines are forced to compare clock signals with each other, then a compromise between the two can be reached. In audio production, one machine is generally assigned as the master or source reference and the other machines must follow.

In computer networks, another method used for achieving an understanding of time results in devices voting on what the time is. In this case, certain rules arbitrate the voting system, meaning that not all the attached devices get to vote. For instance in some networks, segments of the network that are partitioned from the rest of the network by a slow speed link will have their votes discarded, or may not be entitled to vote at all (Novell Netware 4+ Operating System). The agreed time will be equal to the average of the votes. If only two devices were attached to the network, then the time would be half the difference in time between the two machines.

Voting schemes are used in many instances within computer networks to reach agreements on various aspects of network operation. In token ring networks, for instance, the first device to identify that the token is not being transmitted will send out a request to vote on which device should become the token master.

4.5  Conclusion to Audio Communications

The international standardization of data structures and encoding methods (such as PCM) in digital audio data is useful since it represents a solid final format that is readily understood and utilized. The advantage is that when all the transport information and encoding is discounted, the audio data remain in their internationally agreed digital form. This is useful since the data can be surrounded and encoded in a new way to be transmitted from one device to another using a different interface. A simple form of this processing would be in a patchbay, designed to strip one protocol and add another, thus translating from one interface to another. This functionality is similar to network bridges and routers, which allow data to be transmitted over different kinds of medium.

This has been used to some effect with matrix routers using the MADI recommendations. Although this standard does not specify any addressing information within the frame structure, the identity of an audio channel can be ascertained by its position within the super-frame, and so routing can occur, since the source of the channel can be identified in this way.

Notes and Further Reading

AES-24–1–1999. 4.1 Function, page 16. 1999–03–02–1 print. Audio Engineering Society, Inc., 60 East 42nd Street, New York 10165, USA. http://www.aes.org.

AES-24–1–1999. 0.8 Transport networks, page 6. 1999–03–02–1 print Audio Engineering Society, Inc., 60 East 42nd Street, New York 10165, USA. http://www.aes.org.

Audio Engineering Society, Inc. (1999) Trial-Use Release of Proposed Sound System Control Codes.

Audio Engineering Society, Inc. International Headquarters, 60 East 42nd Street, Room 2520, New York 10165–2520, USA. AES recommended practice for digital audio engineering – serial transmission format for linearly represented digital audio data.

ESTA Administrative Office. 875 Sixth Avenue, Suite 2302, New York 10001, USA. http://www.etsa.org.

Held, Gilbert (1998). Voiceover Data Networks. McGraw Hill Text.

IEC60268–12 (1987–03). Sound system equipment. Part 12: Application of connectors for broadcast and similar use. Central Office of the IEC. 3, rue de Varembé, PO Box 131, CH-1211, Geneva 20, Switzerland. http://www.iec.ch.

IEC 60958–1 (1999–12) Digital audio interface – Part 1. General Central Office of the IEC, 3, rue de Varembé, PO Box 131, CH-1211 Geneva 20, Switzerland.

MIDI Manufacturers Association Incorporated. PO Box 3173, La Habra, CA 90632–3173. http://www.midi.org.

MKPE Consulting. 23679 Calabasas Road 519, Calabasas, CA 91302–1502, USA.

Ratcliffe, John (1996) Timecode, A Users Guide. 2nd edition. Focal Press.

SC-06–02-C task group on Synchronisation in IEEE 1394. Report of Synchronisation Task Group to SC-06–02 regarding project AES-X60. 9 September 1998.

SMPTE. 595 West Hartsdale Avenue, White Plains, NY 10607, USA. http://www.smpte.org.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.190.219.65