CHAPTER 21

Surround Sound and Control Rooms

Cinema surround – its origins. TV surround – the differences. Music-only surround – its needs. Stereo and surround – the compromises. Perception and surround. Fold-down complications. Rear channels – the different concepts. Low frequency options. Close-field options. Study of an actual design. Dubbing theatres. Room compatibility issues. The X-curve. Room equalisation: the concept and the problems.

At least as far as the music-only market is concerned, the concept of surround sound seems to have slowly descended into a mire. In 2011 it has become very difficult to find high-fidelity surround recordings in any commercially available format. What exists has largely become only limited-interest, special-order-only items. Whatever is currently available as soundtracks to video and film releases is almost invariably data-compressed in a lossy manner (meaning that all the original quality of the recording cannot be reproduced), and hence does not comply with the concept of high fidelity. What is more, professionally, the market for music-only surround control rooms has largely disappeared. Whatever surround rooms that are currently being built in recording studios are mainly in the form of post-production rooms, with arbitrarily placed loudspeakers, and often with no means of even reproducing SACDs or DVD-As (96 kHz, audio-only DVDs). Sub-woofer positioning and calibration is haphazard, and response uniformity is almost non-existent. Remote gain controls are even found for sub-woofers in professional facilities, so that low-frequency levels can be ‘adjusted to taste’ between one recording and another, or to ‘reference’ recordings which, themselves, were probably mixed in equally arbitrarily calibrated rooms. Attitudes have degenerated to a point where there is almost no reference for making surround mixes, and domestic listeners are left to their own devices to get whatever they can out of a recording.

Nevertheless, this chapter will look at the different surround concepts available, both for music-with-picture, general cinema, and music-only, and an attempt will be made try to understand what could be achieved if only the music industry were interested. It should become obvious that perhaps one reason for the currently sad state of affairs is that many of the concepts were over-simplistic in the way that they were ‘sold’ to both the industries and the public. The reality of trying to achieve true high fidelity surround sound is something which is very difficult and expensive to put into practice. Unfortunately, this did not coincide well with a post-2000 world which was heading in the direction of asking for ‘more, and cheaper’ rather than for ‘better quality’, even if the latter cost only a little more.

Historically, it was the cinema industry which led the way in the serious use of surround sound. Walt Disney’s animation film ‘Fantasia’ featured multi-channel surround audio 30 years before the record companies made their first, albeit short-lived, forays into quadrophonics in the early 1970s.

21.1 Surround in the Cinemas

Dolby did the industry a great service many years ago by clearly specifying their intentions for their cinema systems and issuing adequate guidelines to mixing personnel. They did not try to establish a system that would do all things for all people. Instead, they chose systems that would, within a restricted range of use and in reasonably controlled acoustics, give a good representation to the cinema-goers of the film directors’ creative wishes. They defined the reasonable limits of surround reproduction at an early stage, and history has shown them to have been remarkably insightful given the chaos of quadrophonics from which they developed their early systems.

A few important facts regarding mixing for Dolby cinema surround are given below.

1.  The most important signals will usually be coming from the front, because the action is on the screen, and the directors do not want people to be distracted from it by important sounds coming from rear locations. This would tend to excite the human reflex response of turning to face the sound, which may be warning of danger in real life, but it is not much use in a cinema, except in IMAX, perhaps.

2.  The mixes will be played back in theatres which comply with predetermined acoustic and electroacoustic requirements (although the degree of compliance is somewhat variable)

3.  Due to the sheer cost of film production, mixing will usually be carried out by skilled, experienced, and knowledgeable personnel, who will almost invariably limit their creative exuberance to getting the very best out of what they have to work with. They will not get too carried away creating wonderful effects in the mixing room which may be detrimental to the experience in the theatres. The success of a film tends to ride on how many people see the first screenings in the cinemas and recommend the experience to their friends. Creating effects which are not universally obvious for all the audience, or at worst are confusing, does not help this situation.

4.  The objective is usually to create a one-off, big performance, and it generally means that high-quality programme material and equipment are available to achieve the high standards that are normally expected.

5.  The whole concept of the various Dolby surround formats (and others which follow generally similar guidelines) is to deliver a balanced programme to a group of people, with no particular ‘sweet spots’. This they have proved themselves very capable of doing, over many years and around the world.

21.2 TV Surround

TV and video, at first consideration, seem to need the same general requirements as cinema because once again the final audience will be sat in front of a screen with moving pictures, and will be surrounded by (usually) five loudspeaker channels. However, there are other things to consider with regard to domestic ‘viewing’.

1.  Reproduction will usually not take place with SPLs of over 90 dB.

2.  Background noises in the listening environment will usually be higher than in cinemas.

3.  Because of the two above points, dynamic ranges will be much more restricted.

4.  Many broadcasting companies have their own set of standards, and some times are forced by international agreements to follow standards that are not always beneficial for the sound quality.

5.  Reproduction will be presumed to be largely on poor-to-medium quality domestic loudspeaker systems, and getting a reasonably good sound for all is fundamental if the revenue of advertisers is what keeps the TV channels in business. The high impact of cinema audio is neither a requirement nor a practicable concept for TV mixing.

The above, and many other points, generally mean that rooms designed for TV or domestic video production will be much more compromised in terms of overall sound quality. It is no doubt that it is the existence of so many restrictions which often leads to a lamentable attitude about sound quality from many people in the world’s TV industries.

One therefore cannot easily mix for surround TV in a room designed for big-screen, high dynamic-range productions. Neither would it seem wise to try to mix audiophile quality music recordings in rooms designed for TV/video sound mixing, where many of the necessary subtleties for music mixing would almost certainly not have been considered during their design. This does highlight the fact that surround room design becomes very specific to the goals of the reproduction circumstances, and gives some insight into why the variations shown in Figures 14.10 to 14.23 have come into existence.

21.3 Music-Only Surround

Now we come to the big question; rooms for mixing music-only, high fidelity surround. Unlike in the cinema, TV and video worlds, there are no set standards. In rooms for high-quality stereo music mixing there are several schools of thought. There are the ‘Live-End, Dead End’ types of rooms (see Chapter 17), and earlier designs by people such as Jensen, which were intended to give a clean first-pass to the sound from the loudspeakers, with a diffuse room decay from the rear, and to varying degrees from the sides. The ‘Non-Environment’ concept (see Chapter 16) attempts to make the room as anechoic as possible to the loudspeakers. Only the floor and front wall are reflective in order to give life to speech and actions in the room, thus avoiding the creation of an uncomfortably dead ambience in which to work. Sam Toyoshima and Eastlake Audio both opt for something that is rather similar to the Non-Environment concept, but with a careful distribution of reflective surfaces (principally for the higher frequencies) scattered about the room to add a touch of extra ambience. The rear walls are highly absorbent in both cases. It has to be said that well-designed and well-constructed types of all of the above rooms, and others besides, in the hands of skilled recording staff can produce stereo recordings of the very highest quality. None of them are perfect though.

As designers have their own individual hierarchies of priorities, each type of room has its own special strengths, such as the resolution of ultra-fine detail, the ambient ‘feel’ of the room, the low frequency behaviour of short duration sounds, or other aspects of performance. Nevertheless, all of the better rooms of each type, when the recording staff know the rooms, can produce results which travel well to domestic situations. However, what they all have in common is that they are bi-directional: the front halves differ from the rear halves. In no case can one simply transfer the monitor loudspeakers to the rear wall and still have a room that sounds as good. Yet, this is exactly where rear loudspeakers are needed in a surround room that is intended to have fully symmetrical five-channel monitoring. Therefore, if all of the very best stereo mixing rooms need bi-directional acoustics, this would seem to lead to only two possible conclusions.

1.  In surround rooms with bi-directional acoustics (like the good stereo rooms), the rear loudspeakers responses will not be as good as the front loudspeaker responses.

2.  In surround rooms that do have fully symmetrical acoustics and monitoring, the frontal stereo cannot be as good as in the best bi-directional stereo rooms.

It would seem that the first choice would be the better compromise for high-quality music mixing, because even in a surround mix the frontal sound stage is the most important in 95%, or more, of recordings. To compromise our current high-quality stereo for a more enveloping sound is to trade quality for quantity. The second option described above is reminiscent of an old comedy on British television many years ago, about the textile industry, called ‘Never Mind the Quality, Feel the Width’. Some surround room designs seem to be saying, ‘Never mind the quality, feel the space’. Whether or not that is a backward step is perhaps a subjective issue.

In the classical recording world, the orchestral layout is a fundamental part of the music, which is designed to deliver the composers’ emotions to the audience. Nobody within the orchestra hears the true, intended balance of instruments, so having the orchestra wrapped around the listener does not seem to be a worthwhile goal. This suggests that for most classical recordings the surround channels will be delivering purely ambience, and only rarely for special effects will they be reproducing the direct signals from musical instruments. In most classical works where off-stage sounds are used, they are usually intended to be ethereal in quality, so they will be of an ambient nature, which is well suited to surround in the bi-directional rooms. These rooms may not be appropriate for a close-mic’d wraparound orchestral recording though, because of the lack of ‘symmetrical’ monitoring.

Based on all the above, it would appear that the design of surround mixing rooms for music-only recordings should not differ fundamentally from the design of rooms for Dolby cinema mixing. The object should be to try to achieve the best that can be achieved from the rear channels without compromising the response of the frontalstereo, be it two, three, or five channels. The only significant difference in design may be in the choice of the front wall materials in the absence of a screen. In surround rooms, the front walls still need to be solid to act as baffle extensions for the loudspeakers, but they should also be irregular to break up any specular (discrete) reflexions of the sound striking them from the rear loudspeakers.

The most high fidelity (i.e. highly faithful to the original) surround sound reproduction can only really take place in anechoic surroundings, but even the craziest of surround audiophiles are unlikely to have anechoic chambers for domestic listening. Nevertheless, it would seem to be a professional attitude for the music industry to try to generally monitor in conditions that are better than could be expected in domestic reproduction because, eccentric as many audiophiles are widely deemed to be, it is hard to deny their right to expect that the recording standards should live up to their equipment standards. There is no justification for the widespread television attitude in a professional music recording industry, but there now seems to be ominous pressure (such as from MP3) to lower many of the general quality aims. This is a risky situation though, because if compromise is forced on the end result, enthusiasm will be lost at the production end of the chain, and that would be ruinous for creativity. Standards erosion must be resisted!

To maintain the quality levels of stereo hi-fi in commercial surround recordings requires vigilance and discipline. Producers and engineers are likely to be disappointed if they expect to be able to do whatever they want with the distribution of instruments whilst hoping to produce results of a quality equal to the current quality of the best stereo. (And to be even more disappointed if they expect equally balanced reproduction in most people’s homes.)

There is a huge carrot dangling in front of studio designers to produce ‘the ultimate quality’ in symmetrical surround mixing rooms. Time will surely find the rooms lacking, though. This is because achieving fully symmetrical five-channel monitoring with no compromises to the level of quality of the frontal two-channel stereo; and to have the rooms also usable for multi-format mixing (cinema/music/TV, etc.), basically cannot be done. Control room design for surround mixing is therefore a very format-based concept. Until we get more rationale than the situation shown in Figures 14.10 to 14.23, the difference in the concepts of surround control rooms will lead to great confusion, with people being tempted to do the wrong things in the wrong rooms; or even the wrong things in any room.

21.4 An Interim Conclusion

From many points of view it would seem that the best answer for the future of surround is for everybody to follow the basic five-channel cinema format. It has many things going for it.

1.  It is proven.

2.  It is widespread.

3.  It was created by professionals after a great deal of knowledgeable and rational thought.

4.  It can do (or is capable of doing with little adaptation) almost all that surround can reasonably be expected to do.

5.  Much of what it will not do is not possible anyway, which becomes clearer when the finer points of surround are better understood.

6.  It requires no separate systems for music and cinema.

7.  It is more room-tolerant than many other systems.

8.  It does allow some flexibility in the choice and location of the rear/side loudspeakers.

It is worth remembering there are many things that one can dream of doing on other systems, but there is not much that one can actually do well which cannot be supported by the current cinema formats.

21.5 The Psychoacoustics of Surround Sound

The relatively stable phantom images of conventional stereo only function well when they subtend an angle of about 60° centred on our noses. This fact was grossly under-appreciated during the quadrophonic era of the 1970s. An enormous number of people (the author included) attempted to pan musical signals to all points around the room on quadrophonic pan-pot ‘joysticks’. It was a great source of mystery to many people why the front–left/rear–left panning only stabilised when at fully front or fully rear positions, with all points in between yielding images which flip-flopped from one extreme to another at the slightest movement of the head. In fact, it is only in the horizontal frontal listening arc where we generally have good localisation. Over this arc, it can be very good, with many people being able to resolve differences in position to an accuracy of 1° for some sounds. When a loudspeaker is placed behind the head, the accuracy of localisation is reduced, and phantom images between two or more loudspeakers cannot be stable. The perceived frequency response will also be different from that perceived from similar loudspeakers in front of the listener.

In surround listening, whether to live instruments or recordings via loudspeakers, it is a fact that perceived tonality will change according to source position. However, in many people’s heads, there seems to be an idea that if symmetrical monitoring conditions can be achieved, then this will lead to more ideal surround sound monitoring. A big problem arises when surround mixes monitored under such circumstances are heard in stereo or mono. What was perceived as correct when heard from behind may be inappropriate in both level and frequency balance when folded down to fewer channels and heard from a frontal direction.1 In such cases, it is hard to see why distributed rear loudspeakers, such as those used in the cinema world, would be at any great disadvantage to discrete loudspeakers, because neither relate perfectly to the perception of a folded-down mix.

It has been recognised that the ambient or discrete-instrument type of surround mixes are better served by correspondingly distributed or discrete loudspeaker sources at the rear, but the existence of these conflicting concepts is largely due to the fact that it has also been widely recognised that five channels are not enough to provide the best of both worlds. Ten channels are seen to be a minimum for truly flexible surround sound systems, but it is deemed too unwieldy to be commercially acceptable.

When used for ambient mixes, the two-discrete-loudspeaker option for the rear channels often does not work as well as some of the other rear channel loudspeaker options (see also next section). Therefore, the whole concept of discrete rear loudspeakers does not appear to have much in its favour (except for the fact that market pressures are pushing for it), but as we shall see later, it can sometimes win by default.

Dolby cleverly recognised these conceptual weaknesses in the 1970s when they introduced ‘Dolby Stereo’, which was really a four-channel surround format (see Figure 14.11). In those days of analogue-only technology, the quadrophonic vinyl-disc systems used rather poor phase encoding techniques to try to put four channels on a stereo disc. Dolby saw the nonsense of it all, but then effectively used a similar phase encoding technology to produce not the left–front/right–front/left–rear/right–rear of quadrophonics, but a left–front/centre–front/right–front/single-channel-rear format. They put one of the loudspeaker channels at the centre–front location, which stabilised things immensely, and split the mono rear signal between several loudspeakers, widely distributed, which could give quite a spacious ambience. For what it was, the system worked well. Once again, therefore, the cinema people moved one jump ahead of the music industry on the subject of surround. The advent of this system could have saved quadrophonics, but the music business was already largely fed up with it, and a generation was to pass before they again attempted to re-invent the rather useless ‘square wheel’ under a new name – surround. With digital technology, the cinema world soon moved on to three front channels and distributed stereo surround, which represented yet another march forward.

21.6 Rear Channel Concepts

It has been shown that multi-channel surround is more realistic than two channel surround (surround here meaning the channels other than the front channels), although two surround channels can be made more effective as ambience channels by using diffuse source techniques rather than two, simple, discrete loudspeakers. Tomlinson Holman proposed another concept for domestic use; single dipole sources to each side and slightly to the rear of the listener (see Figure 14.17).2 These sources presented their nulls to the listener, who would receive only reflected energy from the room loudspeakers, and hence a more diffuse sense of spaciousness. However, this concept will only work in rooms with reasonably reflective surfaces. In a dead room, the surround channels would almost disappear.

David Bell successfully used single discrete loudspeakers as the rear channels in small post-production rooms by hanging them from the ceiling, facing away from the listening position, but pointing directly at proprietary, wall-mounted diffuser panels to reflect energy back into the listening area.3 The author successfully used distributed mode loudspeakers, which are naturally diffuse sources, in a film laboratory screening room that met Dolby specifications.4

These things are all ultimately subjective, but a large body of opinion believes that discrete instruments occasionally played through the above diffuse surround systems suffer less loss of realism than two-channel ambience played through a pair of discrete rear loudspeakers. The consumer market may ultimately dictate the de facto ‘standards’ to be used, but the diffuse sources referred to in the previous paragraph could well be more tolerant of listening room acoustics than the fully discrete, five identical loudspeaker option which seeks to achieve ostensibly identical monitoring from five locations, whether the sources are perceived to sound identical, or not.

Unfortunately, it seems that the consumer end is becoming ever more chaotic. The majority of people just assemble their systems and hear whatever they hear. In some ways, the concept of domestic ‘fidelity’ to an original recording quality appears to be vanishing. In fact, some of this is probably due to the lack of reliable guidelines for the production end of the industry. Just as the public got tired of the format wars of quadrophonic vinyl discs, it seems to be getting equally tired from the lack of clear leadership from the (now dwindling) surround-music-production industry.

21.7 Perceived Responses

Clearly, the five different surround-channel systems so far discussed will all tend to produce different responses, both measured and perceived, at the listening position(s). So, let us now consider the responses of the various approaches in different acoustical conditions.

21.7.1 The Simple Discrete Source

Taking the simple, single loudspeaker first, the response at the listening position will be dependent upon the mounting conditions and the nature of the surfaces that face the loudspeaker.

Let us assume that we have put a full, five-channel system of identical loudspeakers in an anechoic chamber, mounted at the recommended points on the perimeter of a circle. An omni-directional measuring microphone at the listening position would pick up the same frequency response from each source. On the other hand, a listener at the same position as the microphone would perceive less high frequencies from the rear loudspeakers than from the front loudspeakers, due to the pinnae (outer ears) being more responsive to high frequencies from a frontal direction. As there would be no reflexion of sound, the rear loudspeakers could be perceived as single discrete sources. Nevertheless, these conditions would be the ones under which the overall flattest response could be expected from all the loudspeakers.

In a stereophonic type of control room with a dead front end and a live rear end, the situation would be somewhat different. At high frequencies, the frontal loudspeakers would project directly into the ears of the listener, and excite the reflected field from the rear, but the rear loudspeakers would do neither. The perception of the rear loudspeakers would tend to be dull and lifeless. The situation at low frequencies would depend on the mounting conditions. If all the loudspeakers were flush mounted, the front loudspeakers would, hopefully, face a rear wall with enough low frequency diffusion to ameliorate the effect of standing wave resonances, but the rear loudspeakers would face a front wall which would typically be quite solid and reflective at low frequencies. Response disturbances could be expected due to the strongly reflected waves interfering with the direct waves.

If the loudspeakers were free-standing, then many of the same results would obtain, except that a less uniform bass response could be expected. The omni-directional low frequency radiation would travel behind the front loudspeakers and reflect off the front wall, again causing response irregularities due to the interference of the direct and reflected waves. The rear radiation from the rear loudspeakers would hopefully have a less disturbing effect due to the presence of the rear wall diffusers, but the forward moving radiation would still suffer interference from the front wall reflexions.

Somewhat differently, in a room with a wideband reflecting front wall and absorbent rear wall, such as described in Chapter 16, the rear loudspeakers would subjectively sound brighter than in either the anechoic or the dead front-end conditions, due to the high frequencies reflecting from the front wall and back to the listener from a forward direction. A measurement microphone at the listening position would show a less flat mid-range response than in a room with an absorbent front end, due to the interference of the direct and reflective waves. At low frequencies, the mounting conditions would give rise to different responses from front and rear loudspeakers, due to the rear absorber wall providing a less effective baffle extension to the loudspeakers than that enjoyed by the front loudspeakers (see Chapter 11). This could be legitimately equalised to some degree, but not precisely. For free-standing loudspeakers, the rear channels may be perceived to be flatter at low frequencies than in a room with a reflective/diffusive rear wall, due to the relatively non-reflective, adjacent rear surface.

Tomlinson Holman, and others, have proposed that rooms for surround should be of generally lower decay time than conventional listening rooms, both to avoid colouration of the recorded surround and to help to ameliorate the variable low frequency response problem. Less ambient ‘help’ from the room is needed for surround sound because the ambience is usually already recorded in the surround channels. Holman and others have also suggested that the room should be made more acoustically uniform by the careful and appropriate distribution of reflective/diffuse materials on the surfaces of the walls and ceilings. These are eminently sensible suggestions for a more uniform perceived response from all of the loudspeakers, but they cannot avoid the criticism that the frontal loudspeaker response, which is usually of prime importance in the vast majority of music recordings, cannot be as good as in the finest, acoustically bi-directional stereo rooms. However, the proponents of the ideas were talking about surround listening rooms, and not about stereo compatible control rooms. The compromising of the frontal responses to the benefit of rear responses may well be valid in the context of some surround-only rooms, but it must be accepted that a trade-off exists. The degree to which this trade-off is beneficial or otherwise may be heavily dependent upon programme material and personal tastes. We are also being driven towards mono sub-woofers.

21.7.2 The Multiple Distributed Source

This is the Dolby cinema approach. In anechoic conditions, the main difference between this and the discrete source concept is that it would still be perceived as a distributed source, because that is exactly what it is, (although the precedence effect could have a tendency to pull the image towards the nearest loudspeaker of a group). Depending upon the precise distribution of the loudspeakers, it could be the case that the distributed sources would sound brighter than the single discrete rear sources, because some of the loudspeakers could be expected to be more directly pointing towards the ear canals, as can be seen from Figure 14.13. The response of a measuring microphone at the listening position in anechoic conditions would be less flat than from the single discreet source because of the constructive and destructive interference due to even minor path length differences from the different sources. Subjectively, however, this may not be a problem at all, but a response down to 20 Hz could not be expected from the use of multiple, smaller loudspeakers.

In the case of either extreme of stereo control room design, whether the front walls be reflective at high frequencies or not, the response from multiple distributed sound sources would tend to be less different from that which could be expected from a discrete rear source in the different rooms. Surface mounting of the multiple distributed sources is normal, because flush mounting tends to be rather structurally complicated, and multiple free-standing sources tend to be too much of an obstruction to everyday work and general activities.

Subjectively, the distributed system seems to work very well, especially for ambient and special effect surround. When the rear channels are fed via a signal delay, as in the Dolby system, the effects can be very lifelike, because for any sounds in all the channels the precedence effect will tend to ensure that front-originating sounds cannot be pulled back into the surrounds. This can be effective even when a listener is closer to one of the surround loudspeakers than to the most distant frontal loudspeaker, as can be the case in many cinema and home theatre installations, especially where an audience of more than one person can often be the norm. For home use, though, the multiple distributed surround loudspeaker concept is cumbersome.

21.7.3 Dipole Surround Loudspeakers

This option is shown diagrammatically in Figure 14.17. Clearly, in anechoic conditions this choice would be somewhat of a nonsense, because with the nulls facing the listener (see Figure 11.3(a)) and no reflexions returning, not much would be heard. In a ‘stereo’ style of control room, the side of the dipole facing the hardest surface would give rise to most of the high frequencies. Obviously, therefore, the most suitable type of room for using such a system would be one with relatively evenly spread reflective surfaces, such as tends to be found in many domestic rooms. This is not too surprising, because Tom Holman initially proposed this technique for domestic use, where it does have a lot of potential, but it is hard to see how a flat frequency response could be expected at the listening position from such an arrangement using the purely reflected energy from arbitrary boundary conditions.

21.7.4 Diffuse Sources

For the purpose of this discussion, the true diffuse sources such as the DML (Distributed Mode Loudspeaker) and the approach of a discrete loudspeaker pointing at a wall-mounted diffuser can be lumped together (see Figure 14.21). They could both be expected to deliver a relatively flat response to the listening position, almost despite the nature of the room acoustics. The DML behaves rather differently from other loudspeakers in that the initial SPL drop with doubling of distance tends to be more like 3 dB rather than the conventional 6 dB. (Due to the source area.)

Diffuse sources have wide radiation patterns over an extended range of frequencies, and tend to suffer less from the effects of room resonance and standing wave interference. The DMLs do tend to suffer from a rather curtailed low frequency response, but they can be so advantageous as ambient sources that the extra effort involved in adding a common sub-woofer to them would seem to be well worthwhile.5 Essentially, however, they consist of a radiating surface made from a material which exhibits a very dense modal activity spread throughout its surface. They are energised by moving coils, but not in the sense of a conventional magnet and chassis system. Despite being a mass of resonances, the early part of the impulse response is remarkably rapid. Since the late 1990s these loudspeakers have been causing some reassessment of conventional thinking, and in many areas they have been well received, especially in the creation of ambience effects.

Obviously, the discrete loudspeaker pointing at a wall-mounted diffuser will behave more like a conventional source at low frequencies, where the radiation pattern tends to be that of an omni-directional compact source. Its response in this region may differ from that of the DML, but the low frequency responses of surround systems in general are something of a minefield, so perhaps that is what we should now look at in more detail.

21.8 Low Frequencies and Surround

Figure 21.1 shows a typical layout for a Dolby Digital theatre. This is the archetypal 5.1 system, where the ‘point-one’ (0.1), or low frequency effects channel, is fed to a dedicated sub-woofer system. In cinema mixing for Dolby Digital, DTS (Digital Theatre Systems) and SDDS (Sony Dynamic Digital Sound), what goes to this channel is determined by the mixing engineer. However, in the Dolby Stereo analogue system (which despite its name is a matrix surround format), the sub-woofer is fed from a low frequency management system in the processors, somewhat like in domestic ‘home theatre’ systems.

It will be noted from Figure 21.1 that the sub-woofer is set off-centre. This is done to try to avoid driving the room symmetrically, where the tendency would be to drive fewer modes more strongly due to the equal distance from a centre loudspeaker to the two side walls. The off-centre arrangement tends to produce more response peaks and dips but of lesser magnitude than would be the case for a symmetrical drive. The degree to which the off-centre location of a sub-woofer can be detected by ear is usually a function of the upper frequency limit. Below 80 Hz it is generally very difficult to detect the source position, but as the cut-off frequency rises, the low frequency source position can become more noticeable. Below 50 Hz, localisation is impossible.

image 21.1

Figure 21.1:

Three-dimensional conceptualisation of a Dolby Digital theatre.

image 21.2

Figure 21.2:

Dolby recommendation for siting two sub-woofers; one of them one-fifth of the room width from one side wall, the other one-third of the room width from the other side wall. This not only avoids the localisation of a single sub-woofer towards one side of the room, but also avoids the symmetrical driving of room modes by the central placement of the sub-woofer(s).

To eliminate noticing the off-centre source location whilst maintaining an asymmetrical room drive, Dolby now recommend the use of two sub-woofers, fed from the same electrical signal. One should be placed one-third of the distance across the room from one side wall, and the other placed one-fifth of the distance across the room from the opposite side wall, as shown in Figure 21.2. The use of one or two large sub-woofers in such cinema installations takes into account the fact that cinemas, whilst being designed to at least reasonable acoustic criteria, are not so heavily controlled as music control rooms. Fewer low frequency sources are therefore more practical than three or five full-range sources. They also address the need for a large area, relative to the size of the room, to be covered by a respectably even sound-field, so that the paying customers all receive their money’s worth. The existence of sweet seats and poor seats, acoustically speaking, would not be commercially viable. Large sub-woofers also offer the power required for explosions, and other sound effects that would tax full-range loudspeakers to their limits in larger theatres.

There is thus little in common between the dedicated ‘low frequency effects’ (LFE) channels of digital cinema and the low frequency extension (LFE) sub-woofers used in Dolby Stereo systems. In fact, just to confuse matters further, the high definition television standards have another LFE definition – Low Frequency Enhancement! The first and third of these LFEs are discrete channels; the second is not. However, the first is obligatory for reproduction because it may contain signals which are both important and not appearing in the other channels, whereas the second and the third do not contain ‘essential’ material (just some extended LF) and so their reproduction is optional. None of this confusion helps the general acceptance of surround sound. Somewhat disgracefully, there is no compatibility between the three LFE concepts. This is an insult to the general public!

Domestically, the single sub-woofer is a commercial necessity resulting from the great reluctance of many households to accept the presence of five large full-range loudspeakers in domestic living rooms. (Even if they could find sonically good places to put them that did not block a doorway or cut half the light from a window.)

21.8.1 Music-Only Low Frequencies

In the mixing of surround sound for music only there is no such luxury as specifying the means of playback and the environment in which it will be heard, let alone any means of enforcing compliance with any specifications. Sub-woofers, therefore, tend to add a complication to the mixing environment if traditional, full range monitors are not used. They add another crossover point, and mono sub-woofers lack the undoubted benefit which stereo bass can add to certain music, especially in terms of the spaciousness, which is, somewhat perversely, surround sound’s raison d’être.

Nevertheless, the situation still exists that a mix done on one concept of system is likely to sound quite different when played back in a different control room, when the number of sub-woofers can vary from zero to four, and the full range loudspeakers (if used) can vary (normally) between two, three and five. In fact, the four-channel (no centre-front) option is another possibility, and does still see use, albeit in a slightly modified geometry from the old quadrophonic (square) layout. (See Figures 14.10 and 14.11.) At the time of writing (2011) some people are still mixing to a four-channel format which resembles the five-channel format of Figures 14.19 and 14.20, but without the centre-front loudspeaker.

It would seem that the optimum arrangement for a music-only control room for the highest quality monitoring would use five full-range loudspeakers and no sub-woofer. However, Chapter 14 discussed the behaviour of multiple loudspeakers in rooms, and from that discussion it can be understood that one must be careful not to route individual low frequency signals to any more channels than absolutely necessary if playback compatibility problems in other rooms and on other systems are to be minimised. Moreover, only in rooms with well-controlled acoustics can the full-range, five-channel option be heard to be superior to the single (or double) sub-woofer option. In rooms with less ideal low frequency control, the single sub-woofer option can reduce the variability of a sound as it is routed to different locations (which at low frequencies would tend to drive different modes to different degrees). Whether such poorly controlled rooms should be in use forserious surround mixing is a moot point, but the fact is that many of them are used for such purposes. It should also be added, here, that attempting to mix on a sub-woofer/satellite system using bass-management can be very risky. These systems can often be so far from the reality of what is on the recording medium that they could hardly be considered to be ‘monitoring’ anything other than their own idiosyncratic sound.

21.8.2 Processed Multiple Sub-Woofers

Some of the larger manufacturers of professional monitoring loudspeakers have accepted the fact that there is now a tendency for many people to mix all forms of music recordings in rooms with little acoustic treatment, especially at low frequencies where treatment can be both expensive and space consuming. Companies such as JBL and Genelec have put much effort into the design of low frequency loudspeaker systems which are intended to deal with room problems in such a way that, whilst not reaching the standards that can be achieved in well-controlled rooms, certainly improve the conditions of monitoring when compared to conventional loudspeakers in poorly treated rooms.

One such system, described by Toole,6 was headlined on the cover of the AES Journal as ‘Adapting to Acoustic Anarchy in Small Spaces’. In the paper, he described a system of multiple sub-woofers, spaced in various ways around the room and equalised both individually and globally. By this means, some unwanted modes can actually be cancelled, thus improving the time response of the room. The pressure amplitude flatness and distribution around the room can also be greatly improved, even in relatively poor rooms.

In most cases, perhaps only one resonant mode will be particularly troublesome below 100 Hz. One of Toole’s techniques for dealing with such relatively simple problems involves locating a pressure node (a point of minimum sound pressure) of the offending mode, and placing the sub-woofer at that point, where it will only weakly couple to the mode. (This of course presumes the use of a monopole sub-woofer, which is a volume/velocity source. A dipole sub-woofer would require being located at an antinode [a point of maximum sound pressure], as dipoles are pressure gradient sources.) Obviously, some phase adjustment will have to be available because moving the sub-woofer nearer to or farther away from the listener will affect the overall time response of the system. The other technique involves using two sub-woofers, one placed on each side of the node, because on one side of it the pressure is falling whilst on the other side the pressure is rising. Therefore, two woofers, one on each side of the node and connected in parallel will destructively drive the mode, thus greatly reducing its effect.

The next step involves using four sub-woofers below 80 Hz, and Toole claims that research has shown that there seems to be no benefit in using more than four sub-woofers. These are fed via signal processors which, as mentioned previously, both individually and globally equalise the loudspeakers. Obviously, with systems such as those being described here, the low frequency signal must be mono. Nevertheless, Toole claims with considerable justification that if first class acoustic conditions are neither available nor achievable for practical reasons, then fast, flat, mono bass below 80 Hz may be greatly preferable to non-flat, resonant bass in stereo. He also makes a strong point of the fact that any type of response correction by sub-woofer can only be effective below about 100 Hz and when the troublesome modes are reasonably well separated. Above 300 Hz the loudspeaker dominates the response, but in the gap from 100 Hz to 300 Hz acoustic treatment tends still to be the only viable solution. Fortunately, in this range, control measures tend not to be so bulky as treatment below 100 Hz, but it definitely requires something more than merely sticking some foam on the walls.

The fact is that it still remains difficult to achieve ‘true’, compatible low frequencies for surround mixes, which is the reason why so many palliative measures are offered from so many sources. Concepts which may approach reality are so far from end-user playback environments that mixing often becomes something of a lottery. Different styles of mixing also suit different low frequency loudspeaker arrangements, so that it adds yet another set of variables. The optimum way of dealing with the low frequencies for high fidelity, five-channel music systems is a problem that still has not been solved at audiophile level.7

21.9 Close-Field Surround Monitoring

Not unlike the way that many people resorted to the use of small, close-field loudspeakers in an attempt to escape from the problems of stereo monitoring variability from studio to studio, the use of satellite loudspeakers, on stands in the close-field, and a common sub-woofer, has found widespread use for multi-channel mixing. In this case, though, one of the driving forces behind the choice was the lack of purpose-designed surround control rooms with adequate full-range monitoring. The reasons for this dearth of facilities have been:

1.  Lack of clear guidelines/standards for the design of music-only surround rooms.

2.  Because of 1 (above), there has been a corresponding lack of people willing to invest in the building of dedicated surround rooms, which may be short-lived in use if the ‘wrong’ layout is chosen.

3.  As the ideal needs of surround rooms and stereo rooms are not entirely compatible if the highest performance is required, people have not been willing to compromise their stereo rooms whilst surround sound has remained only a challenger to the market supremacy of stereo.

4.  Good dedicated surround rooms require the commitment to a considerable and long-term investment, and the recording market has shown no clear intention of being prepared to pay significantly more for surround mixing than for stereo facilities. For many studio owners, only earning the same rate as for stereo recording does not warrant taking the risk of investing on such a shaky basis.

Thus many commercial surround mixing rooms are, in fact, nothing more than stereo mixing rooms in which a satellite/sub-woofer system has been installed, perhaps with the addition of a few acoustic contrivances to help to (or at least to appear to) control a few acoustic irregularities. The fact that this can be passed off as professional mixing is due partly to the fact that the mixes will be expected to pass through a surround mastering facility in order to make them sound like the perceived, accepted, current norm. However, it can also be got away with because of the fact that the situation in the domestic playback circumstances is variable to the point of absurdity. If the mixes are not up to the highest standard, then who is going to know? If nobody knows, then nobody is likely to complain, but is this a professional attitude? The impression given is more that it is all a bit shoddy.

Manufacturers of domestic equipment and programme material have done little to help the situation. Ludicrous situations have arisen whereby the promotion of audiophile quality DVD ‘A’ discs have only been found to be viably marketable by making relatively cheap DVD video players read them digitally, whilst only passing the output through D to A converters of the lower resolution used for the DVD videos. In many cases it has been almost impossible to tell apart the compressed audio of the ordinary DVD audio channels and the high sampling rate/high bit-rate of the potentially vastly superior DVD A when passed through the cheap converters.

It may well be that this is not too different from using common master recordings/mixes for Compact Cassette and CD release. Those who choose to buy the appropriate playback equipment get the appropriate results … hopefully. On the other hand, Compact Cassettes never claimed to be superior to CDs, but the marketing of the surround formats certainly contained many implicit suggestions that a whole step forward was to be expected from DVDs vis-à-vis stereo CDs. This is certainly not the case for DVD A when played on DVD video systems. In fact, it would take a whole step forward in the world economy before people would be able to afford five loudspeakers, five amplifier channels and all the associated processors and converters which were of equal quality to the two of everything required for stereo. The development and specification of professional music surround facilities has been greatly hampered by the badly conceived marketing hype that has tried to force a new medium on a public who were not exactly crying out for it.

21.10 Practical Design Solutions

Figure 21.3 shows a control room which was designed by the author for high-quality, music-only surround use. By discussing some of the design options and choices it will be possible to highlight some of the concepts and compromises that have been touched on in the previous sections of this chapter.

The room was required to be principally a stereo room that could be used for high-quality surround recordings and mixing. It was considered to be very important for surround use to have three, full-range, flush mounted monitor loudspeakers, all at the same height. The reasons for setting the mid-range loudspeaker drivers of all three loudspeakers at a height of about 147 cm above the ground was discussed in Subsection 20.7.1 and illustrated in Figure 20.8. The intention was to make no compromise which could reduce the quality of the two-channel stereo. The wall on the right-hand side of the room contains a window with a view of the local woodland, and at the rear of the left-hand wall is a glass door which leads to the studio rooms. However, the directivity of the loudspeakers was considered to be narrow enough not to expect problems from these surfaces.

image21.3

Figure 21.3:

A music-only surround room: Producciones Silvestres, Catalonia, Spain: (a) The empty room; (b) With a view of the forest.

As the stereo was of great importance, no effort was made to change the bi-directionality of the room acoustics, and so the rear wall was made maximally absorptive, in accordance with the chosen control room philosophy. The side walls were made relatively absorbent, except for the two windows and the glass door. The glass was all of the 12 mm laminated variety, and the windows were angled quite steeply upwards, to attempt to persuade any reflected energy to head in the direction of the absorbent ceiling. It must be remembered that even plane surfaces give rise to a certain degree of scattering at high frequencies, but from these positions it was considered improbable that too much energy would return to the listening position from any of the loudspeakers. Had the rear loudspeakers not been needed, the windows would also have been angled towards the back of the room, to tend to direct the reflexions from the front loudspeakers into the rear trap. This was really the only significant compromise that was made to the room design for the benefit of the surround performance. The front wall was made with an irregular surface of stone, to help to reduce any tendency for specular reflexions with the rear loudspeakers, although this in no way compromises the frontal stereo performance.

The centre loudspeaker, it should be noted, should be connected to powered-up (switched-on) amplifiers, even when not in use during stereo recording and mixing, to avoid loudspeaker resonances from affecting the response of the other loudspeakers. If the amplifier is not connected to the low frequency drivers, or is connected but switched off, the loudspeaker cones and the tuning ports would be free to resonate at their natural frequencies. With power to the amplifier, the very low output impedance acts as a brake on the movement of the loudspeaker cones, holding them rigidly. The port resonance is less of a problem because the receiving/radiating area is smaller, and the excitation tendency (at the sub-20 Hz tuning frequency) is less likely. In general, loudspeakers should never be left in control rooms if the amplifiers to which they are connected are not switched on, because they can affect the sound from nearby loudspeakers both by absorption and coloration, due to their resonant tendencies.

Anyhow, what we have described so far is really a ‘three-channel stereo’ room with full range monitoring from 20 Hz to 20 kHz, built to the principles discussed in Chapter 16. To convert this into a surround room we therefore need to add a system of suitably chosen and mounted rear loudspeakers, and, in this case, the general consensus was to set the rear loudspeakers at around 120° either side of centreefront.

One hundred and ten degrees, or less, would have obstructed doors and windows to an unacceptable degree, which again highlights the fact that ideal surround mixing rooms are best built as such, and should not be compromised by access to studio rooms or views of the forest. Again, the cinema people have a better approach – dedicated mixing rooms – but the tight budgets of the music industry tend to require rooms of more flexibility in use. Nonetheless, in this case, nobody really believed that positioning surround loudspeakers 10° aft of ‘normal’ would significantly alter the perception of the sound in such a room.

21.10.1 The Choice of Rear Loudspeakers

Here, the concepts discussed in Section 21.6 can be considered again in the context of a specific room. The overall room design was as shown in Figure 21.3. It should be obvious that symmetrical monitoring would not be possible. The front loudspeakers are set into a solid, very rigid front wall, whereas the rear loudspeakers, even if identical to the front cabinets, would perform differently because they would be set in absorbent surroundings (see Figures 16.1 and 16.2). They would not enjoy the low frequency loading provided by the front wall acting as a baffle extension. The effects of such loading differences were discussed in Chapter 11, and whilst it is true that the reduced low frequency loading could be equalised in the feeds to the rear loudspeakers, this could require up to four times the amplifier power to do so. This may or may not be a great problem, but the intransigent problem is that of the first reflexions from the opposing wall.

The rear absorbent trap is designed to minimise the effect of the reflected energy from the front loudspeakers from interfering with the direct signal. With the rear-mounted loudspeakers facing a solid front wall, nothing can effectively be done to prevent the response irregularities caused by the reflected wave, and no conventional equalisation could flatten the response. The mid and high frequencies would also face different terminations at the front and rear of the room. Therefore, even notwithstanding the differences in perception in terms of the frequency balance of signals arriving at the ears from the front or from the rear, the loudspeaker/room combination itself could not deliver identically balanced signals to a measuring microphone at the listening position.

To enable such a symmetrical monitoring condition to exist would, in the opinion of all the people concerned with the design of this room, have required unacceptable compromises to be made to the stereo performance. This also applied to the frontal sound-stage performance of a surround mix, which was also considered to take precedence over the rear channel performance. The option of identical loudspeakers all round was therefore abandoned.

The dipole option, as described in Subsection 21.7.3, in this case would result in virtually zero sound coming from the rear because of the highly absorbent rear wall. Little would arrive directly from the loudspeaker to the listening position because of the null in the plane of the baffle. Almost all of the audible output from the loudspeakers would therefore be by reflexion from the front wall, which would produce no surround sound at all, only confused stereo. Not surprisingly, this option was also rejected.

The multiple loudspeaker (cinema) choice was rejected because this was a music-only studio. Despite the fact that the cinema technique has much to offer to music-only mixing, the music industry in general has not woken up to this fact. The option was therefore rejected on the basis of lack of acceptance by the clients, but not from a system-engineering viewpoint.

The single diffuse arrangements, such as the DMLs or loudspeakers pointing towards wall-mounted diffusers, were considered carefully. The DML option was finally rejected due to the lack of low frequency response unless very large panels were used. The option of using smaller DMLs with a common sub-woofer was rejected on the grounds of unnecessary complexity. The much larger panels were rejected because of worries about large reflecting surfaces at the rear of the room creating problems with the frontal stereo, although their hemispherical directivity over a wide frequency range could have allowed the panels to be angled such as to minimise this effect. Unfortunately, this would also have taken up valuable space in a smallish control room of just under 50 m2, but this option remained under discussion until the final choice was made. It was certainly a serious contender.

As the rear of the room was relatively dead, acoustically, and the designated mixing/listening area was so small (6 m2 or thereabouts) not much benefit was seen from the option of pointing a loudspeaker at a diffuser panel on each sidewall. Ultimately, the decision was made to use a pair of single, conventional, discrete loudspeakers, but effectively only by the rejection of all the other options. (Although see note at end of this section.)

The owner then consulted his clients, and nobody seemed to be intending to use the rear channels for bass guitar or bass drums. Therefore, in order to minimise any interference with the pure stereo use of the room, it was convenient to use relatively small loudspeakers that could be mounted on stands at the rear of the room. The actual model of loudspeaker was chosen for its sonic compatibility with the front monitors. Their ability to produce around 108 dB SPL at 1 m, down to 70 Hz, was considered sufficient in a room where nobody listening seriously would be more than 4 m from them, hence around 100 dB at the listening position from each surround loudspeaker would be guaranteed. The peak response was about 6 dB higher.

It was also acknowledged that certain clients might wish to use their own choice of satellite and sub-woofer systems; and that the studio may in the future purchase its own such system. In this case, the rear loudspeakers could be moved on their stands to a closer position in order to serve as the rear loudspeakers for the satellite system. The thinking behind the two choices of monitor system was that, as in much stereo recording, the large, full range monitors could be used during the recording process, to track down any distortions or noises and to check the low frequency balances. The satellite systems could be used for the ‘domestic reference’ and for those who desired to mix on them. The large system could also be used to ‘vibe’ the musicians (or other personnel) when necessary.

Although no claims are being made that the system described above is definitive, the description and the discussion about the thinking process that led to its final design can perhaps be useful to help to outline the options and typical compromises that go into the design of surround sound control rooms. (In 2006, two years after the opening of the studio, the rear loudspeakers were changed to relatively large DMLs [60 cm × 40 cm], mounted in the rear corners of the room.)

21.11 Other Compromises, Other Results

Figure 21.4 shows a small screening room in a film laboratory using DML panels for the surround. In this instance the compromises produced different results, because the needs were different. The room had to meet the specifications for Dolby Digital, and hence the surround sound-field over the area of 26 seats could not be allowed to vary by more than 3 dB. The 3 dB drop per doubling of distance in the close-field of the DMLs, together with their wideband hemispherical directivity pattern, made them an excellent engineering choice. Their diffuse nature, and the relative inability of the audience to localise them audibly, made them a good psychoacoustic choice. The fact that they were readily received for the overall natural impression of the surround tracks also made them a good subjective choice.

In the case of this screening room, the typical choice of multiple surround loudspeakers, as normally used in cinemas, was rejected because it was considered that it would be difficult to prevent the audience from localising the surround sound on the nearest loudspeaker. This was due to the narrower directivity angles of typical cinema surround loudspeakers and their extremely close proximity to the seats in this room. It was also considered problematical to get the required evenness of coverage unless a very great number of loudspeakers were used, again because of the directivity of the typical surround loudspeakers.

Also, in this room, the seating was all much closer to the surround loudspeakers than to the front loudspeakers, and the precedence effect suggests that the source of any sound routed to the front and the surround loudspeakers (which in fact is a relatively rare thing in cinema mixes) would be localised in the surrounds. However, as mentioned earlier, in the Dolby processors there is always up to 100 ms of delay to the surround feeds. This ensures that in any theatre where the difference in the distance to the listener from the surround and front loudspeakers is less than about 30 m, the sound will always be localised in the front loudspeakers.

image21.4

Figure 21.4:

Front and surround monitoring in the Dolby Digital screening room at the Tobis film laboratories, Lisbon, Portugal: (a) Front monitor distribution; (b) The DML panels, high on the side walls, used as diffuse surround sources.

Once again, the cinema people did their homework and came up with a carefully conceived standard. However, the delay, which works well in the cinema, could limit the options for music-only surround if lead instruments were to be put in the rear channels. With ambience in the surrounds, the effect of the delay can be advantageous, but obvious timing difficulties would be encountered if an ensemble were to be split across front and rear sources.

image21.5

Figure 21.5:

Non-Environment concept using multiple rear sources. Versatile 5.1 channel, full-range surround monitoring system, with multi-format monitoring capability. Left and right surround channels split between two groups of four loudspeakers, or, alternatively, perhaps with diffuse radiators, such as distributed mode loudspeakers. All loudspeakers full-range, except the optional ‘effects’ sub-woofer.

Nevertheless, a room for ambient surround mixing could be extremely effective if built according to Figure 21.5. The only precaution necessary would be to ensure that any reverberation fed to the surround loudspeakers was fed via a short delay, otherwise it may arrive before the direct sound, but as this is relatively standard practice, it should cause no extra complications.

In the design of a surround-sound control room, therefore, many things need to be taken into account; it is not a simple exercise. However, the process is exacerbated by the lack of consensus or standards in the music industry about precisely what the goal is intended to be. The quality of the frontal stereo, be it two or three channels, is still of prime importance in most cases, so it is likely that most rooms used for surround mixing will be unlikely to compromise this merely to benefit the rear channels, from which our hearing is anyhow much less discriminating. The question of ‘how to surround’ then becomes a function of the likely type of programme and the basic philosophy of the room acoustics. It is a very complex subject, which is made no easier by the electroacoustic complications created by multiple sources, as described in Chapter 14.

The fact is that the psychoelectroacoustics of surround is a far more complicated subject than is the case with stereo, and not least because of the much increased number of possibilities for making poor choices based on lack of a thorough understanding of the problems. Furthermore, the choice of ‘only’ five channels is not sufficient to provide the optimum ability to compromise to the multiple and often conflicting requirements without significant loss of performance. An interesting development of the concept is shown in Figure 19.20, which can take the clarity of perception to an entirely new level. Very significant reductions in the level of intermodulation distortions is also reported, which is hardly surprising.

21.12 Dubbing Theatres

Currently, the most impressive surround environment is probably the cinema. Traditionally, rooms for mixing cinema soundtracks (‘dubbing theatres’ as they are usually referred to in British English) have borne little relationship to music mixing rooms because the perception of sound whilst watching a picture on a large, distant screen is different from both the perception of sound in a small room whilst watching a small screen, or listening to the sound without picture in any room. Some of the reasons for this are explained in more detail in the following sections, but there is a tendency these days for cinema theatres (i.e. the public performance theatres) to become smaller and acoustically drier. As a consequence of this, cinema dubbing theatres, whilst still being large, are now often very close in size to some of the smaller theatres, so this offers the prospect of excellent compatibility between the mixing and reproduction environments.

In these smaller, drier cinemas, provided that they have good quality loudspeaker systems, the perception of detail in the soundtracks is considerably greater than it has tended to be in larger, more reverberant theatres. In any professional industry it would seem to be a requirement that the professionals stay one step ahead of the consumers, and many older dubbing theatres can now be shown to be more coloured in their acoustics than many of the newer theatres for public screening. This is leading to a trend for more precise dubbing theatres, although the cinema industry is still very conservative, so changes tend to happen rather slowly.

Figure 21.6 shows the acoustic control measures inside a low decay-time dubbing theatre (Cinemar Films, in Milladoiro, Spain). The ceiling consists of three layers of materials supported on two independent systems of wooden beams. Progressively, from the inside, each subsequent layer is less absorbent and more reflective, so the inner layers absorb more and the outer layers isolate more. The inner layers, below the first set of beams, consist of felts and deadsheets, the intermediate layers are of plasterboard and deadsheets, and the outer layers are of plasterboard, heavier deadsheets and plywood. In the case shown, the inner ceiling is rather low in terms of conventional width/length/height ratios for Dolby theatres, but in acoustic terms, if the ceiling is very absorbent, then it is really not there, so its height is somewhat irrelevant. Dolby now generally have no problem with this concept.

image21.6

Figure 21.6:

The dubbing theatre at Cinemar (Milladoiro, Spain) under construction: (a) showing the front monitor wall; and (b) the distribution of surround loudspeakers.

Figure 21.7 shows the level versus distance of the above-mentioned room in one-third octave bands, measured at 1m intervals between a point 2 m from the screen until 40 cm from the acoustically transparent fabric covering the rear absorber. As can be seen, the modal activity is minimal, and the tendency towards a low frequency build up in the vicinity of the rear wall only begins to become evident below the 31.5 Hz band. Despite its size, this room is still only intended for use in the zone behind the mixing console, but, as mentioned earlier, perceptions of sound can change with the size of the picture and the distance to the screen (as described in Subsection 21.12.3), so for this reason the room needs to be big. The completed room at Cinemar is shown in Figure 21.8. The owners had chosen this concept specifically because of their own experiences in a similar room which they had frequently rented (which was already fully approved by Dolby) in which they could make fast and reliable decisions during the mixing process, and the subsequent excellent translation of the work from the studio to the cinemas. That room, which differs principally in its provision of variable acoustics in the front half of the room, is shown in Figure 21.9

image21.7

Figure 21.7:

The measurement at 10 m is only 40 cm from the face of the rear ‘trap’. Except for a small rise around 100 Hz, all of the frequency bands continue to fall into the trap above 40 Hz, showing how the rear wall of the room acoustically disappears.

image21.8

Figure 21.8:

The completed dubbing theatre at Cinemar Films, Milladoiro, Spain (2006).

image21.9

Figure 21.9:

The dubbing theatre at Soundtrack (Sala 15), Barcelona, Spain, showing the rotating acoustic control devices in the front half of the room, as outlined in Figure 7.9(a)(ii) and (b). The front half can be made more acoustically bright by rolling up the carpet and rotating the triangular panels to expose a reflective or a diffusive surface. In this mode, up to about 25 string instruments can be recorded to picture.

Relatively large rooms tend to give rise to mixes which are more compatible with large-screen cinemas compared to those made in smaller rooms. However, without the limitations imposed by the need for a relatively even coverage over a large audience area, a dubbing theatre with few seats can take advantage of the fact that only a limited area of the room requires an even sound coverage. This allows better optimisation of the response over the critical listening area, yielding more detailed perception of the sounds being mixed, and therefore enabling the decision-making process to be both faster and more reliable. Furthermore, without the need for the main loudspeakers to exhibit very wide directivity for the even coverage of the audience, lower-powered and more precise mid and high frequency loudspeakers can be used.

There are some people who worry that a lower decay time in the mixing theatre, compared to a typical commercial cinema, would lead to incorrect decisions being made about the appropriate levels of reverberation to add to a soundtrack, but experience has shown that the problem does not exist in practice. Once again, the extra precision has been shown to make decision-making much easier, hence saving time and effort, easing the work-load on all concerned and allowing them to concentrate more on other important matters. These concepts were controversial in the world of music recording in the 1980s, but experience again showed that low decay times and fast transient responses were very necessary if things such as low-level coding artefacts and digital processing errors were to be noticed early enough to prevent the public release of flawed recordings. No effect was noticed regarding decisions about the application of artificial reverberation.

The surround loudspeakers shown in Figure 21.6(b) are 10 JBL 8340A, standard cinema surround loudspeakers. They are each driven by separate amplifiers, which allows not only their precise level settings, but also the simple switching by the Dolby processor between 5.1, 6.1 and 7.1 modes of operation. Obviously, where little reverberant integration of the loudspeaker outputs is occurring within the room itself, any imbalances in the sounds arriving from each loudspeaker would be more noticeable than normal, so individual amplifiers help considerably here.

21.12.1 Room-to-Room Compatibility

In general, once reverberant colouration is added by the room to the output from the loudspeakers, many changes take place in the perceived sound. Although some degree of reverberation is often considered to be desirable in public cinema theatres, in order to give a more uniform sound character over the entire audience area, in the mixing rooms the reverberation can mask low level details and other problems and noises on the recorded soundtrack. Room sizes, also – surprisingly to many people – can affect the perceived frequency responses. Therefore, if a film is mixed in a room of different size and reverberation/decay time to a room in which it is subsequently shown, the frequency response of the loudspeaker systems may need to be adjusted if the most similar perception of frequency balance is to be achieved. However, this is a totally separate and additional issue to the one of reverberant colouration changing the sound character.

Acknowledging this fact, in the 1970s, the cinema industry began investigations into room-to-room compatibility, which later led to the standardisation of the ‘X-curve’ as the frequency response characteristic to which the soundtracks would be mixed and shown. The X-curve, (described further in Subsection 21.12.2) is not a clearly defined line: it is a line which is banded by considerable tolerance. Consequently, when a dubbing theatre is being certificated by Dolby, the approximate curve is achieved by measurement, but the final response adjustment is made by ear, with the Dolby engineers using a range of well-known test recordings to make the definitive decisions about frequency balance. There are too many variables in this equation to work solely to prescribed measurements but unfortunately, in public cinemas, such care and attention in the set-up is rarely applied.

Usually, as the room sizes increase, the high frequency response of the loudspeaker system must be reduced if a flat perception of the sound is the goal. Also, as the reverberation or decay time of the room increases, the high frequency response once again tends to need to be reduced. Consequently, a large reverberant room will tend to need a significantly more attenuated high frequency response as compared to a small, dry room. But there is also a level-related question which affects cinema studios or theatres of different sizes: ‘Why does 85 dBC at the listening position in a small room tend to sound louder than 85 dBC at the listening position in a large room?’ After all, SPL is SPL; or is it? The question will be discussed further in Section 21.12.3.

As yet, there have been no definitive answers to this question, but there are some very different small/large room characteristics which may give rise to such perceptual differences. Imagine that we are in a small room, listening 4 m from the loudspeakers and 3 m from each side wall, which by definition if they are not anechoic will produce some reflexions. The early reflexion paths will probably differ from the direct signal paths by less than 5 m – perhaps much less – so they will be heard about 10 ms after the direct signal and with a frequency balance which will depend solely on the nature of the reflective surfaces. Air absorption at high frequencies tends to be around 1 dB at 10 kHz for every 5 m travelled, and so for short distances it can be ignored for first-order reflexions.

If we now go into a larger room where we are listening at 12 m distance from the loudspeakers and 8 m from each side wall, the situation becomes very different. The first reflexions may arrive from the surfaces (presumed here to be of a similar nature to those in the aforementioned small room) with a delay of around 25 ms, arriving via pathways of 8 or 10 m more than the direct path. So, even the very first of the first order reflexions may be around 2 dB down at 10 kHz compared to the direct-signal/reflected-signal balance in the smaller room. The reflexions in a larger room will also be separated more in time, both from the direct signal and from themselves, than the reflexions in a smaller room.

We therefore have a situation where the reflexion density is higher in a smaller room (i.e. the reflexion arrival times are more closely spaced) and the reflexion levels will be higher relative to the direct signal because they have travelled less distance, and so have expanded less (that is to say, their intensity will be higher than for comparable reflexions in a large room). What is more, the reflexions in a small room will be brighter sounding because the longer the reflexion path the more air absorption will take place at higher frequencies, so the reflexions in a larger room will tend to suffer a greater high frequency roll-off relative to the direct signal.

To briefly recapitulate the situation:

1.  In a small room, the reflexion density will be greater than in a larger room; that is, the reflexions will arrive closer in time to the direct signal, and to themselves, tending more to reinforce the perceived loudness of the direct signal.

2.  In a small room, the relative reflexions (i.e. first-order, second-order, third-order, etc.) will arrive more closely in level, with less difference between their relative levels than in a larger room.

3.  In a small room, the reflexions will arrive with a greater high frequency content due to suffering less air absorption than the respective reflexions in a larger room of similar acoustic treatment.

A consequence of all of this is that the direct-to-reflected level difference in a small room will tend to be less than the respective proportions in a larger room. Therefore, if we set a level of 85 dBC at the listening position in a small room and in a large room, with both rooms having similar surfaces and absorption characteristics, there will be a higher proportion of reflected energy in the small room (compared to the direct energy) than would be encountered in a larger room, and the reflexions would also exhibit more high frequency content, giving rise to the differing sensations of loudness in the large and small rooms. The differences in reflexion density, reflexion level, and reflexion frequency response are powerful cues which human auditory systems can use to extract information about the distance to the source of the sound. Anthropologically speaking, a loud sound which is perceived to be close tends to suggest a greater potential danger than a sound which is equally loud but perceived to be more distant. The closer event requires a more immediate reaction, hence a greater perception of loudness can be a greater stimulus to act, and thus in some cases a benefit to survival. The above characteristics of a sound-field undoubtedly give rise to different sensations of loudness, but what has so far not been produced in any standardised form is a correction adjustment for level versus room size. A suggestion for a possible solution to this problem will be discussed in Section 21.12.3.

In anechoic chambers, the effect of room size on perceived sound level with distance should be zero, at least within the frequency ranges where the rooms are truly anechoic. However, it will be shown that when we listen to sounds which are associated with a moving picture, the visual cues can indeed introduce perceptual differences which can affect our opinions of precisely how loud a sound may be.

21.12.2 The X-Curve

Cinema theatres, for the public performance of feature films, tend to be rather different from domestic rooms which are used for listening to the high fidelity reproduction of music. Almost anybody who has ever been to a large cinema will know that the sound in the cinema is rarely reproducible at home. In large rooms, the perception of the sound is different from the perception in small rooms, so the cinema industry has, since its earliest days, tended to mix film soundtracks in rooms which were representative of, or in effect were, cinema theatres.

Optical soundtracks have been the ever-present means of recording sound to film, and even Dolby Digital film soundtracks still have optical, analogue tracks alongside them, for back-up in the event of a digital system failure. Aspects of film reproduction such as the projector slit height, electrical noise-filters, negative/positive print-loss filters, loudspeaker characteristics, losses through the perforated projection screen, and even air absorption losses in the theatres (almost 1 dB for every 5 m at 10 kHz) all meant that the maintenance of a given frequency response from a soundtrack mixed in one room and played back in a significantly different room would not be a likely outcome. Therefore, if the dubbing theatres (the mixing studios) were generally similar to the public screening environments, the mixing engineers could at least mix and equalise the soundtracks in a reasonably representative environment in order to achieve the most natural or most desirable audio quality. This historically meant using significant high frequency boost, to compensate for all the losses in the optical reproduction chain, but care had to be taken to avoid the boost leading to distortion. The high frequency roll-off of the reproduction did, however, serve a useful function as a form of noise (hiss) reduction in the days before Dolby noise reduction came on to the scene. Perhaps, somewhat surprisingly, the significant high frequency roll-off in the overall reproduction did not sound as dull in the typical, large, reverberant cinema theatres of those days as a visual inspection of the frequency response would suggest. In practice, the ‘Academy Characteristic’ as shown in Figure 21.10 was the general response curve for cinema reproduction systems. A roll-off began around 1.5 kHz, which fell to about – 18 dB at 8 kHz, so the HF attenuation was quite severe.

It has been traditional to split the sound reproduction chain into the A-chain, from the playback head to the output of the projector; and the B-chain, from the output of the projector to the audience. What we are principally dealing with when we discuss modern cinema equalisation (as modern A-chains are essentially flat) is the B-chain, but historically the A + B chains could even be 20 dB down at 8 kHz, and this could lead to excessive distortion when equalised to sound natural in the cinemas, or it could lead to excessive noise if the recorded levels were reduced in order to avoid the distortion. There was a very fine line between excessive noise and excessive distortion.

image 21.10

Figure 21.10:

1948: ‘Academy Characteristic’ for Altec lansing system

In 1971, during the mixing of ‘A Clockwork Orange’, experiments were made in Elstree Film Studios, near London, to work with a wider range response. This film was the first to use the Dolby A-type noise-reduction on all its pre-mixes and mixes, and ways were being sought to exhibit this to its full advantage. The state of affairs then current is shown in Figure 21.11, which shows the responses of nine Hollywood dubbing theatres in 1974, normalised to the response at the 500 Hz crossover point of their loudspeaker systems. Also, magnetic A-chains tended to have a flat response, whereas optical A-chains had a high frequency roll-off, and usually no compensation was made in the B-chain for the A-chain differences. The situation was obviously open to improvement.

image 21.11

Figure 21.11:

1974: Nine Hollywood dubbing stages – B-chain only (normalised at 500 Hz).

In the 1971/2 Elstree experiments, some large KEF loudspeakers with respectably flat responses were placed about 2 m from the listening position, and whilst listening to a flat soundtrack their responses were indeed perceived to be acceptably flat. Reproduction was then switched to the main theatre loudspeakers, some 12 maway, and their response was equalised in order to achieve the maximum compatibility with the sound from the KEFs. To the surprise of the experimenters, a slope of around 3 dB/octave from 2 kHz upwards appeared to give the best compatibility with the flat, close-field system. The experimental curve (hence the X-curve) which achieved this compatibility is shown in Figure 21.12, as drawn in 1972. Repeating the experiments in different sized rooms led to the family of curves shown in Figure 21.13, and still there is no absolute explanation of why this should be so, although Ioan Allen8 has suggested that it could be due to some psychoacoustic phenomenon involving far away sound and picture, or perhaps the result of increasing reverberation which generally follows room size. No doubt the reflexion density also plays a part, as mentioned in Section 21.12.1.

The tendency was for the roll-off to be greater both in large rooms and in rooms with longer reverberation times. A small, low decay time room tends towards needing a flat response, whereas a large room with a generous reverberation time would need considerable high frequency roll-off in order to sound subjectively similar in equalisation to the small, dry room. Remember though that in the world of cinema dialogue is pre-eminent, and its intelligibility is all-important. The short sounds of dialogue do not receive reverberant enhancement in terms of loudness, so the natural sound via a relatively flat direct response from the loudspeakers may be more important than the flatness of the combined loudspeaker/room response, because it is the direct response which gives rise to the clear perception of dialogue. Hence, in drier rooms of any size, the overall curve tends to be flatter as it does not have to take into account the normally sloping reverberant response which tends to fall as the frequency rises. In large reverberant rooms, it was considered that the direct response may have to be made excessively bright if attempts were made to achieve a flat response to pink noise in a naturally bass-heavy room, and obviously the general tonal nature of the dialogue needs to reasonably match the rest of the soundtrack. The X-curve is therefore not a fixed curve, but it is a curve with upper and lower tolerance limits, which allow it to be tailored to different rooms in order to maintain the overall perceptual compatibility. Figure 21.14 shows the recommendations published in 1998, as extended to 16 kHz and with a second knee at 10 kHz, but the final adjustment still needs to be made by ear by experienced people. Figure 21.15 shows the typical equalisation for the surround loudspeakers in order to give good compatibility with the X-curve response of the behind-the-screen system. Note how with increasing room size (distance from listeners) and increasing reverb, it is the turnover frequency that is adjusted, not the slope. Subjective assessment had shown that the surround loudspeakers tend to need a brighter characteristic than the screen channels (i.e. less HF roll-off), which could be due to the public being closer to the surround loudspeakers, and hence more in their direct field.

image 21.12

Figure 21.12:

1972: The first wide-range B-chain characteristic – later called the X-curve.

image 21.13

Figure 21.13:

SMPTE 202M – 1982 – Corrections for theatre size.

image 21.14

Figure 21.14:

SMPTE 202M – 1998 – X-Curve extended to 16 kHz with second break point at 10 kHz.

The set up of cinema monitoring systems is clearly not trivial, and as Allen pointed out ‘if material were to be mixed in a small room, in a large theatre it would have to have information about the content as it varied between short-duration (speech) signals, and long duration (music and possibly effects) to have perfect playback translation’.8 Large dubbing theatres are therefore still the only viable mixing environments for high-quality cinema soundtracks, because the perception of the sound between large rooms and small rooms varies so much.

image 21.15

Figure 21.15:

Adjustments for room size and distance to surround loudspeakers – adjusting the turnover frequency as opposed to adjusting the slope.

Of course, human perception of the relative balance of highs and lows also depends upon level, as shown in Figure 2.1, so it is essential in film production that the mixing and the public performances should take place at the same levels if the equalisation compatibility is to be maintained. After the introduction of digital soundtracks, 18 dB of headroom became available above the normal 85 dBC reference level. Many film directors began to abuse this by using it throughout the film – (if the plot is weak, turn the level up!) – which led to many cinema owners turning down the volume from the standard setting. The level reduction in the cinemas was partly due to the avoidance of distortion from some marginal reproduction systems, but also because of complaints from the cinema-goers about excessive level. Unfortunately, one result of turning the level down in the cinemas is that the quieter parts of the dialogue can become unintelligible at the reduced levels. The general situation is still rather arbitrary, and discussion about the standards continues, especially in the light of the arrival of digital cinema.

21.12.3 Sound Level versus Screen Size

It was mentioned previously that in anechoic chambers there should be no difference in the perception of a sound from sources at different distances, at least not until those distances became so great as to introduce significant, high frequency, air absorption losses. Nevertheless, in 2008, at the University of Vigo, Spain, experiments were carried out in a hemi-anechoic chamber which did show that people expected to hear a lower sound level as a screen was moved further away from them, step by step. This test was part of a series of tests relating to screen size and distance versus the ‘appropriate’ sound level.9 The conclusions were that:

1.  At a fixed distance, as the screen size increases, the sound level necessary for a realistic combination of sound and picture also increases.

2.  For a fixed screen size, as the distance from the viewer increases, the sound level necessary for a realistic combination of sound and picture decreases.

3.  For a fixed viewing angle, where the screen size increases as the distance from the viewer increases, the sound level necessary for a realistic combination of sound and picture also increases. This concept is shown diagrammatically in Figure 21.16.

4.  For a fixed sound level, as the screen size increases, the amount of low frequencies necessary for a realistic combination of sound and picture also increases.

5.  Little evidence was found of any connection between the screen distance or dimensions and the appropriate level of high frequencies.

One of the authors of the paper (Christian Beusch) proposed the curve shown in Figure 21.17. This was developed from a curve that had been proposed in an earlier paper.10

image 21.16

Figure 21.16:

Same relative picture size for the viewer; same viewing angle; different distances from the screen. It should be self-evident that the appropriate sound levels for a ‘natural’ perception would be different in eachcase.

A look back at Figure 21.16 will put into a ‘common sense’ perspective the findings of points 3 and 4, above. It is not difficult to imagine a battle scene in an action film, taking place on a 12 m wide screen at 20 m distance in a big room, and for explosions to reach a very exciting 115 dBC. However, if, as shown in the figure, the scene were to be repeated at a viewing distance of 85 cm on a 50 cm (20 in) television screen, nobody in their right minds would wish to watch it with 115 dBC of accompanying low-frequency rumble down at 20 Hz. In nature, small things do not make loud noises. What is more, as mentioned towards the end of Section 21.12.1, loud noises at close distances are usually a sign of danger, and so can be psychologically disturbing. The curve shown in Figure 21.17 has been experimentally tested and it does tend to suggest that there is no one calibration level that suits all sizes of screens at all distances. As the screen gets closer to the viewer, the ‘natural’ sound level begins to fall off more rapidly. This is a clear indication of why many soundtracks sound ‘too loud’ in small cinema rooms. It also suggests several reasons why the cinema industry has always tended to mix in large rooms:

image 21.17

Figure 21.17:

Proposed curve relating screen distance to calibrated sound level for a 45° viewing angle.

1.  In smaller rooms the soundtrack may be mixed with insufficient level for large cinemas.

2.  In smaller rooms the soundtracks may be mixed with insufficient low frequency content for large cinemas.

3.  Because of the reflexion density question, mentioned in Subsection 21.12.1, the dialogue levels may receive greater reflective support in smaller rooms, and hence may be mixed with insufficient dialogue level for large cinemas.

4.  There is also a question of dynamics. As the background noise in most cinema rooms will be around 30 dBA, if the soundtrack is mixed at a lower average level in a small room, the difference between the average level and the noise floor will be less than in a large room, and so the dynamic range of the mix will be reduced.

The accumulation of these factors relating to human perception suggests that it may always be necessary to mix in big rooms for big rooms; which is exactly what the cinema industry has always tried to do, although without ever clearly explaining the underlying reasons why. So, it seems that large dubbing theatres will still be required for as long as feature films are shown in large, public cinemas.

21.12.4 Room Acoustics and Equalisation

When the X-Curve was proposed in the early 1970s, not only the public cinemas but also the dubbing theatres where the soundtracks were mixed were in a rather poor state of inter-room compatibility. The introduction of the X-Curve as a response goal, and a standard means of measuring it within the rooms, certainly brought a measure of order to the chaos, but what was known in 1971 was a long way short of what is known in 2011. Measurement and analysis techniques have advanced greatly, as have the knowledge of psychoacoustics and the means of introducing corrective measures. In 1971 there was no such thing as a parametric equaliser. What were just becoming available however were third-octave real-time analysers and third-octave equalisers. There was also a school of thought developing which held that the ear ‘heard’ in critical bands of one-third-octave width. If that were the case, then if the energy in each one-third-octave band was equal, each band would be perceived as being equally loud. We now know that this is not the case, but in the early 1970s, to many people, one-third-octave analysis and equalisation appeared to be the solution to the room-to-room compatibility question. In fact, the use of third-octave equalisation became very widespread, not only in cinema rooms but also in recording studios and live concert sound, but it has since been largely abandoned by all except the cinema world. The problems with loudspeaker equalisation in general are discussed in Section 23.1.1, but in any case, one-third-octave equalisation, at least for the purpose of sound system alignment, is simply too coarse to function as intended.

The simple fact is that rooms cannot be equalised. In any room other than an anechoic chamber, the response at any position will be different, and in most cases significantly different. If we begin with a flat source, such as an excellent omni-directional loudspeaker radiating wideband pink noise, the response will be different at all places in the room. The acoustic waves expand three-dimensionally and subsequently interact with their own reflexions and the room resonances. If, therefore, the response is different at all places, it must follow that there is no equalisation setting which can correct for anything except for one point in space at a time. This can only be considered to be local equalisation, and not room equalisation.

It cannot be over-stressed that a measuring microphone and analyser combination has, in many cases, very little similarity to an ear/brain combination. There is now a very strong body of opinion which asserts that human beings tend to ‘hear through’ rooms. This is best explained by considering the case where the source of sound is ‘natural’, which for discussion purposes could be someone playing a baritone saxophone, which is a harmonically rich instrument covering a wide range of frequencies. If the instrument was being played at the front of the room, then to a listener walking around the room it would be evident that it was the same instrument playing. Taking measurements at many places around the room would show that the responses at each place would be different, and in some cases grossly different. The degree of difference would be totally out of proportion with the degree of perceived difference by the listener, who would merely hear the same instrument, but modified by a different room acoustic from place to place. Never would the listener be fooled into thinking that a different instrument was playing, not even if listening whilst blindfolded. In all cases, the first sound heard for any new note or change in tonality would be the first arriving sound (the direct sound), which by definition must arrive anechoically because it arrives straight from the instrument, whereas all reflexions and resonances suffer arrival delays. It is the first arrival of any sound that the brain ‘locks on to’, and that sound has a characteristic ‘fingerprint’ to which all the subsequent sound is referenced. This is why acoustic, natural sounds sound natural. A listener could say that an instrument was perhaps not sounding as good as it could in a given room, but it could never be said that the instrument sounded unnatural.

If we now change the situation to one in which the source of sound is a loudspeaker playing a flat, anechoic recording of the same baritone saxophone, and given that the loudspeaker has roughly similar directivity characteristics to the instrument, there is no reason to expect much difference in the overall assessment of the sound in the room. However, if we were to now ‘correct’ the response for the room, by playing pink noise through the loudspeaker and measuring the response from our ‘prime’ listening position, everything would begin to change. First of all, it is almost unthinkable that the analysing equipment would show a flat response from the unadjusted system given all the reflexions and resonances involved. The tendency would be (given current thinking and practices) to equalise the system for a flat response. The application of the equalisation would distort (change) the frequency response of the loudspeaker system. As the loudspeaker would be the source of all the sound, it follows that the direct sound first arriving at the listener would change; the frequency balance of the reflexions would therefore also obviously change; the relative balance of the room resonances would change; and the overall reverberation characteristic would change. What is more, the relative effect of all of these changes would be different at all places in the room, and none of them would be the same as when the baritone saxophone was being played live. How, one could well ask, could this ‘corrected’ sound be more natural than the instrument itself? Patently, it cannot be so.

Let us now consider taking the saxophone on a tour of ten cinema theatres, all with rather different acoustics. In each room, the overall sound will be different, but the fact that the saxophone was always the same one (or at least a very similar one) would be very evident. If we then made a second tour with the excellent loudspeaker reproducing the recording of the saxophone, we could to a large degree experience a similar result. If, however, we then made a third tour, before which the loudspeaker had been equalised to give a flat response at a designated measuring/listening position in each room, the result would be that the recording of the saxophone sounded significantly more different that when it was reproduced flat. The fact is that with ten different equalisations in the ten different rooms we would be listening to ten different sources. When the sources are different, it is virtually impossible for the in-room responses to be the same. What is more, when the ten sources are different, the ear/brain has ten different references, and so perceives ten different sounds, and not the same sound in ten different rooms, which is what it should be doing. Brains understand this, even if only sub-consciously, but spectrum analysers do not! The situation addressed in Section 11.5 is not equalisable. Rooms are not equalisable.

If room-to-room compatibility is the goal, the sources must all be the same. If rooms have gross acoustic problems, they must be addressed by acoustic solutions. If the sources are the same, and the room problems are not gross, then room-to-room compatibility is almost assured. Equalisation is unnecessary except for any desired changes to the overall response.

A study of twenty rooms around Europe in 2010, all Dolby certified, indicated the damage being done by the current one-third-octave equalisation regime.11 Figure 21.18 shows the results of the measurements at 2 m from the screens, and at two-thirds distance from the screen to the rear wall, (which is the standard calibration position), for 11 dubbing theatres. Figure 21.19 shows similar measurements for nine commercial cinema theatres of various sizes. In effect, the measurements at the 2 m distance broadly represent the direct sound from the loudspeakers. It can be seen very clearly from the figures that an enormous amount of damage is being done to the integrity of the direct sounds in order to try to force the far reverberant field measurements to conform to a predetermined standard by means of ‘room’ equalisation. Given the differences in the 2 m measurements in Figures 21.18 and 21.19, there is absolutely no hope that such different direct-sound responses could sound similar. As has already been explained, if the direct sounds are not the same, then all hope of overall compatibility of the sounds from room-to-room is lost. The problem is that the pink noise measurements are simply lumping together the direct sound with all of the reflexions and resonances and general reverberation. The analyser cannot separate these components, but the ear can!

image 21.18

Figure 21.18:

Overlaid, third-octave responses of 11 Dolby Digital dubbing theatres.

image 21.19

Figure 21.19:

Overlaid, third-octave responses of nine Dolby Digital public cinemas.

By contrast, Figure 21.20 shows the responses at 24 different positions in a music club.12 In all cases the source was the same loudspeaker, which had a relatively flat response. Although there is no great difference in the degree of variability between the responses shown in Figures 21.18, 21.19 and 21.20, the tendency is for all the responses of the first two figures to sound different, but for those of Figure 21.20 to sound very recognisably as coming from the same source, which indeed they were. This relates back to the aforementioned discussion about the baritone saxophone. If the source is the same, the perception in any part of any reasonably controlled room is that of the same source being heard in different acoustic conditions. The brain knows what part of sound is the source, and what is the room.

image 21.20

Figure 21.20:

Third-octave responses at 24 different places in a music club. In all cases, the source was the same loudspeaker mounted in the same position. The responses appear to vary considerably but the general impression of the sound was the same at all locations. The position of the plots on the drawing relate to their position in the club, the shape of which is outlined.

Conversely, even when the reverberant-field measurements are reasonably similar, if the frequency balances of the sources are different, the perception may be of different sources, which, in effect, is what they are. If the baritone saxophone excites disagreeable resonances in certain rooms, then for every place where there may be an unpleasant ‘honk’ on certain notes, there would be other places in the room where those same notes were weak. This situation applies whether we are referring to the real instrument or its reproduction via our appropriate loudspeaker. In the latter case, one could equalise the loudspeaker to reduce the resonance, but by doing so it would virtually kill the note at other places where it was already weak, and also upset the attack of the instrument at all locations. This is not a correction!

The situation is that for room-to-room compatibility, the direct sounds must be the same and rooms must be reasonably well damped, and not excessively reflective. If the rooms are not reasonably well controlled, equalisation cannot make them much more compatible, one with another. The only truly viable use of equalisation is to compensate for loading effects, such as where the left and right loudspeakers receive more low frequency loading from the side walls than do the centre loudspeakers, or to introduce desired corrections to the direct response from the loudspeakers. The acoustic losses due to the screen being in front of the loudspeakers can also be legitimately compensated for by equalisation, but it is customary to consider the screens as a part of the loudspeaker systems because they can have an effect on the overall directivity of a loudspeaker. In other words, the directivity of the loudspeaker, alone, cannot be presumed to be the same when placed behind a perforated projection screen. Obviously (or at least it should be obvious), the loudspeakers should exhibit a smooth and uniform directivity pattern which is adequate for the intended coverage area. If this fact is not respected, the probability is that the direct sound will not be flat in all the areas of the room and that the reflexions from the surfaces of the room will not havesmooth frequency responses. It is this failure which gives rise to the generally uneven spectrum of the reverberant responses in many rooms, and which it turn calls for the ‘absurd’ equalisation which is customarily found to have been applied in so many cinemas after ‘aligning’ the rooms by means of pink noise measured in the far, reverberant fields of the rooms.

A more complete argument can be found in the papers referred to in the above text.11,12

21.12.5 Dialogue Levels and Room Equalisation

As previously mentioned, it has been traditional to mix cinema soundtracks in large mixing theatres in order that the many variables involved in transposing a mix from the studios to the public cinemas could be minimised. However, the more recent needs for compatibility with smaller theatres and subsequent DVD releases have further highlighted the problems of the compatibility of playback in rooms of different sizes. The X-curve has long been used for cinema loudspeaker playback, and has also (at least in principle) employed different high frequency roll-offs depending on room size and decay time, but another room-size related problem – that of the perceived level of the dialogue varying relative to the music and effects – has led many people to ask whether a further compensation needs to be defined and applied.

In general, as rooms become larger they tend to exhibit a decay time which rises at low frequencies. They also tend towards their first reflexions arriving later than those in small rooms, and the subsequent reflexions are more separated in time. It has been noted by many people that the relative level of the dialogue in a film soundtrack which has been mixed in a large room can seem excessively loud when reproduced in a small room. The situation is complicated to define, due to the number and the interaction of the variables involved. However, Allen, in his 2006 paper,8 demonstrated how the reverberation in a room would develop in response to pink noise, and three figures from his paper are reproduced as Figures 21.21(a), (b) and (c). Given a reverberation characteristic as shown in Figure 21.2(a), it can be seen from Figures 21.21(b) and 21.21(c) how the first arriving signal, having a flat response, is subsequently subjected to a reverberant build up which is moderate at mid frequencies but greater at low frequencies and less at high frequencies. The signal is effectively amplified, and the spectrum is modified by the reverberation characteristics of the room.

image 21.21

Figure 21.21:

(a) Typical medium to large size theatre reverberation characteristic. (b) Pink noise build up over time in medium to large size theatres. (c) Frequency response changes with duration of signal.

The short sounds of the spoken word are essentially in the range of 100 Hz – kHz, but their duration is too short to exhibit signs of reverberant build up. They are not steady-state signals like the pink noise, and so do not inject energy into the reverberant field for long enough to drive the reverberation to its full level. They are also above the frequency band that would suffer the greatest reverberation in a complex, wideband soundtrack. Consequently, in such cases, the low frequency build up could tend to mask the short sounds of dialogue and reduce the intelligibility as compared to dialogue at the same level but with less ambient or musical accompaniment. The natural tendency, therefore, is for the person making a mix for a soundtrack to elevate the dialogue levels when mixing in a room with a longer and low frequency-dominant reverberation time, as compared to mixing in a less reverberant room with less low frequency build up. This is due to the fact that despite the large and small rooms both being equalised flat, (but only, of course, to the steady-state pink noise signal) the time-smeared reverberant response exhibits a greater masking effect. As a result of this, when such a soundtrack is played back in a smaller, drier room, the dialogue levels may seem to be excessively high in terms of their relative balance with the rest of the soundtrack because the anticipated low frequency reinforcement does not occur. This is yet another complication to the large-room/small-room compatibility issue.

When a room is ‘equalised’ to be flat in response to a pink noise signal, with the equalisation of the loudspeaker system compensating for the reverberant build up at low frequencies, the time-history of the response would tend to be as shown in Figure 21.22, also taken from Allen’s paper. The tendency would be for a direct signal to be reduced in bass and increased above around 2 kHz, leading to a generally thinner, harsher sound, which is exactly what has been reported by many people as being the nature of some of the sounds when auditioned in small rooms after being mixed in larger, more reverberant rooms. However, from the previous discussions it can be seen that the degree to which these effects may occur are highly dependent on the nature of the individual components of a soundtrack. The situation is by no means either clear or simple.

image 21.22

Figure 21.22:

What would happen to first arrival signal if pink noise was tuned for a flat steady-state response.

Nevertheless, in real life, if a human being were to speak in rooms of different sizes, the characteristics of the direct sound would not change. Therefore, to linearly distort (upset the frequency balance of) the direct sound in order to compensate for room effects, which although affecting more steady-state sounds do not make a significant change to the perception of the spoken word, can only be detrimental to the uniformity of the perception in different rooms. That is to say, once a large room has been equalised to the X-curve with pink noise, such that the direct response from the loudspeakers is similar to that shown for the first arrival in Figure 21.22, the re-recording (film soundtrack) mixers will probably equalise the dialogue to give it a more natural characteristic – in effect adding the inverse of the first-arrival response. When the soundtrack is played back in a smaller or drier room, in which the loudspeakers have been equalised with a flatter general characteristic, the dialogue may then tend to be perceived with a much ‘heavier’ sound. The concept is described more fully in Section 11.5. It follows from this that the older, larger, more reverberant dubbing theatres would not seem to be the ideal rooms in which to mix complete soundtracks for the generally drier acoustic conditions of modern cinemas.

A further indication from the above discussions is that even if the X-curve, or any other standardised equalisation, continues to be used, it would be better applied in the close field of the loudspeakers so that the direct sounds, in all cases, are more similar from room to room. It is in the similarity of the direct sounds where the compatibility lies; not in the similarity of the highly complex, far-reverberant-field responses.

Therefore, taking a mix from a large room, with a reverberation characteristic similar to that shown in Figure 21.21(a), and playing it back in a smaller room with a drier, flatter decay characteristic, we could observe the following:

1.  The overall response of a large, more reverberant room with a significant rise at low frequencies (in the RT) will consist of a direct signal which is bass light and, perhaps, slightly treble heavy. People mixing in such rooms will probably compensate for this effect by adding equalisation to the dialogue, and other short-lived sounds, which will restore the natural frequency balance of the direct signal. Subsequently, on playback in a smaller room which has been equalised with a flatter direct response, the dialogue will sound to have been boosted by the applied equalisation, which would not have been necessary in the flatter direct monitoring conditions of the smaller, drier room, and which therefore sounds excessive.

2.  The dialogue will receive no significant reverberant support in either a large or a small room, so the effect of reverberation can be largely discounted when considering the dialogue only. However, the music and low frequency ambience would not receive as much reverberant support in a small, dry room as in a larger, more reverberant room, and so may sound weaker. Note that even if the levels of these signals had been boosted at low frequencies by the flatter direct signal from the loudspeakers in the smaller room, the faster overall decay may well still leave the dialogue more exposed.

3.  The dialogue may receive more support from early reflexions in the smaller room due to the close proximity of reflective surfaces, because the reflexions arriving within 40 ms of the direct sound will reinforce its level. This would perhaps be noticed more on the short sounds of the dialogue than on the longer sounds of the music and ambience, because the latter two had received reverberant support during the mix whilst the dialogue had not.

4.  Air absorption at high frequencies would be less in the smaller room, so any reflexions would tend to be brighter than in a larger room, once again giving a boost to signals which were being supported by the higher reflexion density.

From the differences in perception described above, it would appear to be self-evident that no simple application of a predetermined family of third-octave curves could lead to any uniform sound quality in the different rooms.

It should be noted that one of the main reasons why these points are so pertinent to cinemas, as opposed to most other forms of public performance via amplified sound systems, is that the cinema situation is somewhat unique in that there is no operator to adjust the sound level and equalisation for each performance. These characteristics are ‘frozen’ in the dubbing theatre, where the mixing personnel have only a preset monitor level control. Mixing is carried out at the (typically) Dolby reference level, which means that all mixing is carried out at the sound levels that would be expected in similarly calibrated cinemas. In principle, the projectionist in the public cinemas should leave the gain control at the calibrated setting (No 7) for all performances of all films, and he or she has no equalisation available if the dialogue lacks presence or is screaming at the audience. Conversely, in a discothéque, or other music venue, the DJ or sound operator will have level and equalisation controls available to adjust to taste any piece of music being played. It is the fixed nature of cinema projection which puts so much emphasis on the need for standardisation from the post-production to the public screening environments.

21.13 Summary

The lack of consensus about what format music-only surround should take has not been a help to the design of appropriate studios.

The most practical solutions for surround sound came from the cinema industry.

TV and video surround formats take into account their own sets of requirements.

Surround for music-only recordings would seem to serve little purpose if the frontal stereo channels are compromised.

The need for monitoring systems using five identical loudspeakers seems to be questionable, because the application and perception problems do not lead to symmetrical performance.

Five channels is a commercially imposed limit, which is only part way between two-channel stereo and what surround should be. Things get much better with ten channels, but it seems to be commercially impossible to implement.

Many surround (rear/side) formats have been proposed and tested. All tend to give different results, with their strengths and weaknesses suiting different music and circumstances. No outright winner has emerged.

In rooms where the acoustics have been designed to be as uniform as possible for the surround monitoring, the frontal stereo performance invariably seems to be compromised.

Conversely, in the better performing stereo rooms, the different concepts yield different responses and perception of the surround (rear) loudspeakers, and so provide no standard reference.

It is generally accepted that surround sound control rooms should have short decay times – perhaps the shorter the better until a point of ambient discomfort is reached.

The choice of separate low frequency sources or a single sub-woofer can be very dependent on the room acoustics. Perhaps the greatest degree of reality can be achieved by separate sources in highly controlled rooms.

‘Processed multiple sub-woofers, in mono below 80 or 100 Hz, can offer considerable improvement in response uniformity when highly controlled room acoustics cannot be provided.’

‘Three-channel stereo’ rooms can be built with realistic rear loudspeaker responses which do not compromise the stereo performance. For ‘stereo and ambience’ surround mixes they can give excellent results.

Cinema dubbing theatres tend to need to be large because perception of the subjective sound character can vary greatly with room size.

As cinemas become smaller than they have been in earlier times, the decay times of both the cinemas and the dubbing theatres have been tending to go down.

The perception of high frequencies tends to increase as both reverberation time and room size increases, giving rise to the need for high frequency roll-offs in the reproduction systems of large, reverberant rooms.

The X-curve is now the recommended standard for the equalisation of cinemas and dubbing theatres, being appropriately modified to room size and decay time.

The surround loudspeakers follow a different equalisation curve to the behind-the-screen systems, changing their turnover frequencies rather than their rates of roll-off.

Sources must be as similar as possible to each other if room-to-room compatibility is the goal.

Rooms cannot be equalised.

Cinema mixing is intended to take place at the same sound pressure levels as it will be heard by the public in the theatres.

Different room sizes can give rise to the perception of different ‘ideal’ sound levels for any given picture.

References

1  Newell, Philip, ‘A Load of Old Bells’, Acoustics Bulletin (Journal of the UK Institute of Acoustics), Vol. 26, No. 3, pp. 33–7 (2001)

2  Holman, Tom, ‘New Factors in Sound for Cinema and Television’, Journal of the Audio Engineering Society, Vol. 39, No. 7–8, pp. 529–39 (July/August 1991)

3  Chase, Jason, ‘Hi-Fi or Surround. Part Two’, Audio Media (European Edition), Issue 92, pp. 122–6(July 1998)

4  Newell, P. R., Holland, K. R. and Castro, S. V., ‘An Experimental Screening Room for Dolby 5.1’, Proceedings of the Institute of Acoustics, Reproduced Sound 15, Vol. 21, Part 8, pp. 157–66 (1999)

5  Bank, Graham, ‘The Distributed Mode Loudspeaker (DML)’ in Borwick, John (ed) The Loudspeaker and Headphone Handbook, 3rd Edn, Chapter 4, Focal Press, Oxford, UK (2001)

6  Toole, Floyd, E., ‘Loudspeakers and Rooms for Sound Reproduction – A Scientific Review’, Journal of the Audio Engineering Society, Vol. 54, No. 6, pp. 451–476, June (2006)

7  Newell, Philip R., Holland, Keith R., ‘Surround-Sound – The Chaos Continues’, Proceedings of the Institute of Acoustics, Vol. 26, Part 8, pp. 135–147, Reproduced Sound 20 conference; Oxford, UK (2004)

8  Allen, Ioan, ‘The X-Curve: Its Origins and History’, SMPTE Journal, Vol. 115, Nos 7 & 8, pp. 264–275 (2006)

9  Newell, Philip R.; Holland, Keith R.; Neskov, Branko; Castro, Sergio; Desborough, Matthew; Torres-Guijarro, Soledad; Pena, Antonio; Valdigem, Eliana; Suarez-Staub, Diego; Newell, Julius P., Harris, Lara; and Beusch, Christian, The Effects of Visual Stimuli on the Perception of ‘Natural’ Loudness and Equalisation, Proceedings of the Institute of Acoustics, Vol. 30, Part 6, pp. 15–26, Reproduced Sound 24 conference, Brighton, UK (2008)

10  Newell, Philip R., Holland, Keith R., Neskov, Branko; Castro, Sergio; Desborough, Matt; Pena, Antonio;Torres, Marisol; Valdigem, Eliana; and Suarez, Diego, The Perception of Dialogue Loudness Levels Within Complex Soundtracks at Similar Overall Sound Pressure Levels in Rooms of Different Sizes and Decay Times, Proceedings of the Institute of Acoustics, Vol. 29, Part 7, pp. 125–139, Reproduced Sound 23 conference, Gateshead, UK(2007)

11  Newell, Philip; Holland, Keith; Torre-Guijarro, Soledad; Castro, Sergio; and Valdigem, Eliana, Cinema Sound:A New Look at Old Concepts, Proceedings of the Institute of Acoustics, Vol. 32, Part 5, Reproduced Sound 26 conference, Cardiff, UK (2010)

12  Newell, Philip; Holland, Keith; Newell, Julius; and Neskov, Branko, New Proposals for the Calibration of Sound in Cinema Rooms, Presented to the 130th Convention of the Audio Engineering Society, Preprint No 8383, London, UK (May 2011)

Bibliography

Holman, Tomlinson, 5.1 Surround Sound – Up and Running, Focal Press, Boston, USA and Oxford, UK (2001)

Toole, Floyd E., Sound Reproduction: The Acoustics and Psychoacoustics of Loudspeakers and Rooms, Focal Press, Oxford, UK (2008)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.136.186