10
Functions of Intonation in Discourse

ANNE WICHMANN

Introduction

Prosody is an integral part of the spoken language. It conveys structure and meaning in an individual utterance, and it also contributes to the structuring and meaning of discourse. It is this latter aspect that is increasingly being seen as an important dimension of language learning. According to Levis and Pickering (2004: 506), there is “growing recognition that traditional sentence-level approaches may not be able to meet the needs of language teachers and learners”. Indeed, there are several studies, as reported in Piske (2012), which suggest that “learners profit to a larger extent from instruction that focuses on suprasegmental aspects of pronunciation” (2012: 54). The purpose of this chapter is therefore to outline some of the ways in which prosody, and intonation in particular, serves to structure spoken texts, manage interaction, and convey pragmatic meaning.

Theoretical and methodological frameworks

There are different approaches to the study of prosody and the results are often contradictory. Prosody research is driven not only by different theories of language and human interaction but also by different goals. Early studies, especially in the nineteenth century and before, focused on speech as performance. Speaking was thought of as an art, a rhetorical skill that was crucial for success in politics, in the Church, and in the theatre. A crucial part of the art was known as modulation – described in impressionistic terms, with little clear indication of what the speaker should actually do, other than to “establish a sympathy” with the audience (Brewer 1912: 83). More recent twentieth century analyses of English intonation were pedagogical in focus, driven by the needs of non-native rather than native speakers; this pedagogical tradition persists, for example, in the work of John Wells (2006), and is clearly of continued importance wherever English is being learnt as a second or foreign language.

In recent decades, with advances in technology, a new motivation for speech research has emerged. This is the desire to design computers that can synthesize human-sounding speech and also understand human speech. Applications of such work are, of course, limited to certain styles of speech: spoken monologue (“reading aloud” continuous text) and goal-oriented dialogue, such as service encounters. Casual conversation, on the other hand, is the main focus of work in interactional linguistics (derived from conversation analysis), with especial interest in how conversation is managed by the participants, reflecting the fundamentally cooperative nature of human communication.

For each of these approaches to discourse prosody there is a range of phonetic features that are thought to be important. Early voice-training manuals refer impressionistically to pace, pitch, and loudness in rather global terms, as the properties of stretches of discourse. The British pedagogical literature, on the other hand, and the British system of intonation in general, describes intonation (and it is usually only intonation and not the other prosodic components) in terms of localized contours – holistic movements such as fall, rise, and fall-rise. These pitch movements are the property of accented syllables and associated unstressed syllables, and it is the choice of contour, its placement, and its phonetic realization that makes an important contribution to discoursal and pragmatic meaning. The American autosegmental system describes the same local pitch movements, not in terms of holistic contours but in terms of their component pitch targets. Thus a rising contour is decomposed into a low target point followed by a high target point, and what is perceived holistically as a rising contour is the interpolation of pitch between those two points. The autosegmental theory of intonation (Pierrehumbert 1987; Pierrehumbert and Hirschberg 1990) has become the standard in most areas of prosody research. In addition, however, the advances in signal processing and the automatic analysis of the speech signal mean that there is a renewed interest in more “global” features, i.e., phonetic features that are the property of longer stretches of speech. These include the average pitch of an utterance or sequence and also long-term variation in tempo and amplitude.

Speakers clearly have a wide range of prosodic resources at their disposal: pitch, loudness, tempo, and voice quality, and can exploit them in various ways. Misunderstandings or loss of intelligibility can arise from errors related to both the phonological inventory and its phonetic implementation, and from choices at both local and global levels. Research in all of these many areas and in a variety of theoretical frameworks therefore has the potential to reveal how we use prosody, and thus raise awareness of its importance among teachers and learners.

Sentence types and speech acts

Although native speakers are rarely conscious of the intonational choices they make, they can certainly tell if something is unusual and does not correspond to what they perceive to be the norm. This can be illustrated by a high-profile pattern in current use. Over the last 30 years a pattern of intonation has been spreading in English that is a source of great annoyance to older speakers (always a good indication of language change!). This is often called “uptalk” and refers to the use of a rising contour at the end of a statement, instead of the expected falling contour.

The fact that this innovation is so controversial tells us something about the default intonation contours relating to different kinds of sentence types (specifically statements and questions). The traditional pedagogical literature on English intonation makes simple claims about canonical forms: statements and Wh-questions terminate in a falling contour while a yes-no question terminates in a rise. Wh-questions can be used with a rise, but then have a softening, sometimes patronizing, effect. The validity of these claims is sharply contested by those who study conversation from an interactional (conversation-analytic) perspective, but they form useful bases, not only in teaching and in clinical contexts (see, for example, Peppé and McCann 2003) but especially in experimental and large-scale corpus studies geared towards improving speech technology.

While human beings generally have no great difficulty in assessing what a speaker intends with a given utterance – statement, request, greeting, etc. – machines are less adept at doing this. Much research effort has been, and continues to be, invested into modeling human speech (production and recognition) in order to develop speech technology. This includes speech synthesis, automatic speech recognition, and human–machine interaction systems. Any utterances that can only be understood in context pose a challenge to automatic analysis. Shriberg et al. (1998) found it particularly difficult to distinguish automatically between backchannels, e.g., uhuh, and agreements, e.g., yeah, particularly since some of these are lexically ambiguous (see the section below on backchannel). It was found that agreements had a higher energy level than backchannels, which is assumed to be because emotional involvement is greater in an agreement than a simple “continuer”, and greater emotional involvement involves greater energy and often higher pitch. However, the attempt to disambiguate presupposes a single function for each utterance, although linguists have shown that speech acts can be multifunctional: an “agreement” might well also function as a “backchannel”.

A good example of multifunctionality is the act of thanking. Aijmer (1996) shows that thanking goes beyond the expression of gratitude. It can be dismissive (e.g., I can do it myself thank you), ironic (thank you, that’s all I needed), and can also initiate a closing sequence, acting simultaneously as an expression of gratitude and a discourse organizer. According to Aijmer, gratitude is expressed differently and has a different intonational realization, depending on the size of the favor: /thank you with a rising tone sounds casual – it is used in situations where the “favor” is minimal, as in buying a train ticket (e.g., A: Here you are. B: /Thank you). Where more gratitude is being expressed, a falling tone is used (e.g., A: I’ll look after the children for the day if you like. B: Oh that’s so kind of you – hank you) (see also Archer et al. 2012: 263 for further examples). This is consistent with the view of Wells (2006: 66), who suggests that the difference between using a rising and falling tone is the difference between “routine acknowledgment” (/thank you) and “genuine gratitude” ( hank you).

The pragmatic consequences of different intonational realizations of the same utterance are to be seen in Wichmann (2004) in a corpus-based study of please-requests. Such requests occurred with either a falling or a (falling-)rising final contour. For example: Can you open the door please versus Can you open the door /please (i.e., with final rise on please). It was found that those requests with a falling contour were generally used in asymmetrical situations, such as service encounters, where the imposition was socially licensed, while the rising contour was used where compliance could not be taken for granted. Thus, a request at a ticket office for “a ticket to Lancaster, please” assumes that the hearer’s role is to comply. On the other hand, a request to borrow something from a friend generally does not make that assumption, and “could I borrow your |pen /please” would be more likely. These “default” realizations can, of course, be used strategically regardless of the context: a falling tone might be used to sound “assertive”, while the more tentative rising tone might be used to express politeness by suggesting that the hearer has an option to refuse even if it is not actually the case. In other words, such patterns can be used to create the symmetry or asymmetry desired by the speaker, and not just as a reflection of existing relationships. However, if used unwittingly, these choices can also be the source of misunderstandings, particularly in conversation with a native speaker. If the “assertive” version is used innocently in a situation where the speaker does not have the right to demand compliance, it can cause offence. Similarly, a casual-sounding thank you (with a rising tone) might offend a hearer who believes that greater gratitude should be expressed. Whether these pragmatic inferences are likely to be drawn in conversation between NNS (i.e., in English as a lingua franca situation) is a matter for future research.

Information structure

A feature of some varieties of English and other Germanic languages is that they use patterns of weak and strong (stressed) syllables to structure the speech in a rhythmic way, both at word level and at utterance level (known as stress-timed rhythm). Vowel quality depends on stress patterns: unstressed syllables tend to be realized with a schwa, or reduced even further, while an accented syllable will contain a full vowel. Deterding (2012: 21) claims that a syllable-timed rhythm (with consequent absence of reduced syllables) may actually enhance intelligibility, and an insistence that learners acquire a stress-based rhythm may be inappropriate.

This may be true in relation to word stress, which is not part of prosody but part of the lexicon – information that is to be found in a dictionary. Sentence stress, on the other hand, is manipulated by the speaker, and is strongly related to the structuring of information in discourse. Processing is no longer a matter of word recognition but of understanding “the flow of thought and the flow of language” (Chafe 1979). The placement of sentence stress reflects what a speaker assumes is in the consciousness of the hearer at the time, and thus is an example of how discourse is co-constructed.

The default position for “sentence”-stress in English is the last potentially stressed syllable in a prosodic group, but this “norm” can be exploited strategically to indicate that an item is already “given” (or accessible in the mind of the hearer). “Givenness” can relate to a single lexical item that has already been referred to: the plain statement, She’s got a KITten, will have the sentence accent in the default position, namely on the last lexical item. However, in the following exchange, the ‘kitten’ is given: e.g., Shall we buy her a KITten? She’s already GOT a kitten. Givenness can also be notional rather than lexical: e.g., Shall we buy her a KITten? No – she’s alLERgic to cats. Here, the word cats subsumes kitten: an allergy to adult animals can be taken as including an allergy to kittens.

Research into the brain’s response to accentuation patterns has shown that these patterns are important for the hearer in the processing of the ongoing discourse. Baumann and Schumacher (2011) maintain that prosodic prominence (at least in Germanic languages such as English and German) influences the processing of information structure: “information status and prosody have an independent effect on cognitive processing …. More precisely, both newness and deaccentuation require more processing effort (in contrast to givenness and accentuation”) (Baumann, personal communication). Similar results have been shown by other researchers. Dahan, Tanenhaus, and Chambers (2002) used eye tracking technology to establish that if an item was accented, the hearer’s sight was directed towards non-given items, but towards given items if unaccented. A similar eye-tracking experiment by Chen, Den Os, and De Ruiter (2007) showed that certain pitch contours also biased the listener to given or new entities: a rise-fall strongly biased towards a new entity, while a rise or unaccentedness biased towards givenness.

These experiments might lead one to expect that the deaccentuation of given items is universal. This is not the case – many languages, and some varieties of English (e.g., Indian English, Caribbean English, and some East Asian varieties), do not follow the pattern of Standard British English or General American. It therefore remains to be seen what the processing consequences would be for a speaker of such a language. Taking the production perspective, Levis and Pickering (2004) claim that learners tend to insert too many prominences and that these can obscure the meaning of the discourse. They suggest that practising prominence placement at sentence level, i.e., with no discourse context, might exacerbate this tendency to overaccentuate.

One way of raising awareness of prosodic prominence is to use signal processing software to visualize speech. We know something about the phonetic correlates of perceptual prominence thanks to the seminal work of Fry (1955, 1958). An accented syllable generally displays a marked excursion (upwards or downwards) of pitch, measured as fundamental frequency (F0), together with an increase in duration and amplitude. Cross-linguistic comparisons, such as that carried out by Gordon and Applebaum (2010), provide evidence of the universality of the parameters, even if they are weighted differently in different languages.

Finally, it is important to note that classroom discourse itself may not be the best style of speaking to illustrate the prosody of “given” and “new”. In contrast to most research findings, Riesco Bernier and Romero Trillo (2008) found that in some classroom discourse the distinction between “given” and “new” was not evident in the prosody. However, they chose a very particular kind of discourse: “Let’s see/ milk/ does milk come from/ plants/ or animals? Animals/ Animals/ that’s right/ from the cow.” Although the authors do not say this, it suggests that speaking style in pedagogical situations may in fact be very different from the naturally occurring prosody students are being prepared for.

Text structure

A printed page provides the reader with far more information than the words alone. Typographical conventions, such as punctuation, capitalization, bracketing, and change of font, help the reader to recover the internal structure at the level of the clause and sentence. Paragraph indentation, blank lines, and headings (and subheadings) help the reader to group sequences of sentences into meaningful units. In some kinds of text, bullet points and numbered lists are also an aid to organizing the information on the page. Of course, none of this information is available when a text is read aloud, and the listener is reliant on the reader’s voice – pauses and changes in pitch, tempo, and loudness – to indicate the structure of the text.

The idea of “spoken paragraphs” was addressed by Lehiste (1979), who established not only that readers tended to mark these prosodically but also that listeners used the prosodic information to identify the start of a new topic. In read speech, the position of pauses suggests breaks in a narrative, with longer pauses being associated with paragraph breaks. However, the most reliable prosodic correlate of topic shift is a pitch “reset”, an increase in pitch range. This observation – that an increase in pitch range accompanies a major shift in the discourse – has been made for both read-aloud speech and spontaneous conversation, and in languages other than English (Brazil, Coulthard, and Johns 1980; Brown, Currie, and Kenworthy 1980; Nakajima and Allen 1993; Yule 1980).

While there is some agreement that the boundaries between units of text are prosodically marked, there is less agreement as to whether there are any internal features that operate across a “paragraph”. Sluijter and Terken (1993) claimed that a paragraph was not only marked at its boundaries but that each successive sentence within the paragraph displayed a narrower range. The idea is that there is a kind of “supra-declination” that mirrors the declination (tendency for pitch to gradually fall/the pitch envelope to become narrower) across a single sentence, but at the level of the paragraph. This was certainly true for their experimental data, but is less evident in naturally occurring data, mainly because of many competing discourse effects on pitch range, such as parenthesis, reported speech, and cohesive devices (see Wichmann 2000).

While speakers intuitively use prosodic text-structuring devices in conversation, they do not do so consistently when reading aloud. Their use depends very much of the skill of the reader, and many readers are simply not very skilled. Some readers such as newsreaders, for example, are highly paid professionals, but experimental studies of read speech sometimes have to rely on readers recruited from the general public or from student groups – whoever is prepared to offer their time. Kong (2004), who looked at topic structuring in Korean, found that her female speakers marked the structure of their spontaneous narratives much more consistently than when they read aloud a subsequent transcription of them.

It is important to remember, however, that paragraph divisions in written texts are typographical conventions, and do not necessarily map on to meaningful text units. Some texts, especially literary texts, have a very fluid topic structure, shifting gradually from one “scene” to the next. Orthographic paragraphs “indicate not the boundary between one clearly definable episode and another, but a point in a text where one or more of the coherent scenes, temporal sequences, character configurations, event sequences and worlds …. change more or less radically (Chafe, 1979: 180) Since much of the research into prosodic text segmentation has been carried out with Automatic Speech Recognition (ASR) in mind, such complex texts are rarely used, and the focus is generally on texts in which orthographic divisions map consistently on to meaningful units.

An awareness of the effective prosodic structuring of spoken discourse, particularly spoken monologue such as lectures, is thought to be important in teaching. Thompson (2003) claims that the awareness of intonational “paragraphs” is as important for understanding lectures as it is for performing them, and the training of lecturers in speaking skills should therefore also include awareness of phonological structuring. She compared five English for Academic Purposes (EAP) training texts (for listening skills) with six authentic undergraduate lectures. In the authentic data she found longer phonological paragraphs but fewer metatextual cues (first, next, in conclusion, etc.). The EAP training texts, on the other hand, appeared to focus on metatextual comment with little reference to phonological structuring. Thompson suggests that students are not well served by these texts and that learning to “hear” the structure of authentic lectures might help them. She concedes that some EAP teachers avoid intonation as “difficult to teach” but suggests that broad topic shifts can be pointed out and consciousness raised without a lot of technical detail about intonation.

Interaction management: turn-taking in conversation

Spontaneous speech displays many of the same structuring devices as prepared speech, including the kind of pitch resets discussed above. If someone is telling a story the shifts in the narrative will be marked prosodically, just as they are in read-aloud speech. There will be some differences, however, depending on whether the speaker is “licensed” to take an extended turn, or whether other speakers are waiting to take a turn at speaking at the first opportunity. A licensed narrative gives the speaker the space to pause and reflect without risking interruption. This is the case in a lecture, for example, or in a media interview. In casual conversation there is an expectation that all participants have equal rights to the floor, and speakers are especially vulnerable to interruption when they are ending a topic and wanting to start another. Pauses are therefore not reliable topic cues in spontaneous narrative. Speakers frequently launch new topics by omitting a pause and accelerating from the end of one topic into the new. This is known as “rush through” (Couper-Kuhlen and Ford 2004: 9; Local and Walker 2004). It is particularly evident in political interviews, when the interviewee hopes to control the talk and therefore avoid further questions that might raise new and possibly uncomfortable topics.

This is just one of the devices used in the management of turn-taking, which is an important aspect of conversation, and one in which prosody, along with gaze, gesture, and other nonverbal phenomena, plays a part. It is remarkable how smoothly some conversations appear to run, and it has been claimed (Sacks, Schegloff, and Jefferson 1974) that while there is overlap and also silence, there are frequent cases of no-gap-no-overlap, often referred to as “latching”. These are of course perceptual terms, and recent acoustic analysis (Heldner 2011) has shown that a gap is not perceived until after a silence of more than 120 ms, a perceived overlap is overlapping speech of more than 120 ms, and no-gap-no-overlap is perceived when the silence or overlap is less than 120 ms. (Wilson and Wilson 2005had already predicted a less than 200 ms threshold). It seems that smooth turn-taking is less common than has been assumed, applying to fewer than one-fifth of the turns analyzed. However, we cannot assume that any speech overlap at turn exchanges is necessarily an interruption, as Edelsky (1981) showed. Some overlapping speech is intended to support the current speaker, and therefore distinguishes between competitive and collaborative overlap.

The prosodic characteristics of the end of a turn are generally thought to be a lowering of pitch and a slowing down. It is clear, however, that these features alone cannot account for smooth turn-taking nor can they function as reliable cues. Work in the conversation analysis framework (e.g., Szcepek-Reed 2011) finds too little regularity in the shape of turns to justify any generalizations about the prosody of turn-ceding or turn-holding. The smoothness of transition at turn exchanges suggests that participants cannot be waiting for the other speaker to be silent before taking a turn, or even for the final pitch contour, but must have some way of projecting and preparing for an upcoming turn relevant place (TRP) in advance. The cues used in projecting a TRP have been widely discussed (see references in Wilson and Wilson 2005) and include semantic, syntactic, prosodic, body movement-/gaze-related cues. However, as Wilson and Wilson (2005) point out, there may be many cues that indicate an upcoming TRP but which nonetheless do not indicate the exact timing of it. They suggest an alternative, cognitive, account of what appears to be universal behavior, despite some cultural differences. They propose that conversation involves “a fine tuned coordination of timing between speakers” (2005: 958). In other words, the timing of turn-taking is governed by mutual rhythmic entrainment, possibly on the basis of the syllable rate, despite wide variation in syllable length; speakers converge in their speech rate rather like relay runners getting into step before taking the baton. This notion of “entrainment” or “accommodation” as applied to speech will be discussed in more detail in the final section below.

Backchannel

The successful management of conversation depends not only on smooth turn-taking but on the successful elicitation of small responses, sometimes known as “continuers” or “backchannels”. A simple test for this is to consciously withhold any verbal or nonverbal response when another person is speaking to you. They will very soon stop and ask what is wrong. Speakers of a second language therefore must not only be intelligible themselves, they must also be able to indicate to an interlocutor the degree to which they are following a conversation.

A very early study (Yngve 1970) referred to short responses as “getting a word in edgewise”. The pervasiveness of these responses in conversation is confirmed by Jurafsky et al. (1997) (cited in Ward and Tsukahara 2000) who find that short responses constitute 19% of all utterances in a corpus of American English conversation. Studying short responses, however, is complicated by the number of different words or nonword vocalizations that can be used as a backchannel: Benus, Gravano, and Hirschberg (2007) in their study of American English found in their Games corpus that mmhm, uhhuh, okay, and yeah were the most common, followed by right, yes/yep, and alright. While vocalizations such as mmhm and uhhuh are easily recognizable as backchannels, both okay and yeah are multifunctional. Okay, for example, can be used to signal agreement and to mark a topic shift, in addition to functioning as a backchannel response, although Benus, Gravano, and Hirschberg (2007), in an attempt to disambiguate, found that backchannels have “higher pitch and intensity and greater pitch slope than affirmative words expressing other pragmatic functions” (2007: 1065).

Backchannel responses are not randomly produced, but at points that seem to be cued by the current speaker; in other words, speakers “ask” for backchannel. Ward and Tsukahara (2000) indicate clear evidence that backchannel feedback is cued in most cases by the speaker. A possible cue is a period of low pitch, while Benus, Gravano, and Hirschberg (2007: 1065) identify “phrase-final rising pitch as a salient trigger for backchanneling”. This accounts for the interpretation of “uptalk” as a trigger (Hirschberg and Ward 1995). Even these cues, however, do not explain the precision timing of backchannel responses, and Wilson and Wilson’s (2005) notion of “entrainment” may offer an explanation here too.

It is important for language learners to know that there are cross-cultural differences in turn-taking behavior, including backchanneling. For example, there are cultural differences in backchannel frequency, and this difference alone has the potential to cause problems: too few backchannels and a speaker appears unengaged, too many and they seem impatient. However, what is “too few” or “too many”? Maynard (1989, 1997) and Ward and Tsukahara (2000) claim that, even allowing for individual speaking styles, backchanneling is more frequent in Japanese than in English. There also appear to be differences not only in the frequency of responses but in what kind of cue can elicit backchannel responses. A phenomenon that typically elicits a response in one language does not necessarily do so in another language. For example, in studies of turn-taking cues in Dutch (Caspers 2001) and in English (Wichmann and Caspers 2001), it was found that a contour that appears to cue backchannel in Dutch (a high-level tone) blocks backchannel in English. Such differences have implications for cross-cultural communication. A backchannel response elicited but not forthcoming, and also a response that is unsolicited and unexpected, can be perceived as “trouble” and interpreted negatively.

Attitude/interpersonal meaning

Brewer (1912) was not wrong in telling performers to establish sympathy with their audience, and the same is true for conversation. The expression of interpersonal meaning is crucially important to the success of communication. Mennen (2007) points out that the inappropriate use of intonation, and in particular its cumulative effect, can have negative consequences for the non-native speaker. Unlike segmental errors, suprasegmental errors are rarely recognized as such by native listeners, but simply misinterpreted as attitudes that the speaker did not intend. Pickering, Hu, and Baker (2012) claim rightly that “prosody contributes significantly to interactional competence and serves to establish a crucial collegial bond between speakers”, and they conclude that “prosody in the English language classroom is key” (2012: 215). However, “attitude” remains the most elusive of meanings to capture analytically. What is it exactly about a speaker’s “tone of voice” that can make an utterance sound “friendly” or “impolite”?

There are broadly two approaches to studying the correlates of perceived attitudes: the first is to look for features of an individual utterance that cause it to be perceived as “friendly, brusque, condescending”, or any other of the many labels that can be used. The second is to focus on sequential relationships between utterances, and look for the meanings constructed by the similarity or differences between (usually consecutive) utterances rather than any features of an utterance itself. I will look at each approach in turn.

“Attitude” in utterances

Early work on English intonation, such as that of O’Connor and Arnold (1961), suggested that individual contours – falls, rises, fall-rises, and so on – carry independent meanings in conjunction with certain sentence types. However, intonation contours were ascribed so many “attitudinal” meanings that it became clear that the contour meant none of them. O’Connor himself noted that the topic of attitudinal intonation was “bedevilled by the lack of agreed categories and terms for dealing with attitudes” (1973: 270). A more abstract, reductive approach to the meaning of pitch contours is that of Cruttenden (1997), who sees falls and rises as “closed” and “open” contours, and Wichmann (2000), who refers to the same distinction in terms of “final” and “non-final”. The rising tone of a yes-no question is consistent with the “open” meaning of a rise, while the “closed” meaning of a falling nucleus is consistent with the syntactic completeness of a statement. This underlying meaning is used in Wichmann (2004) to explain why please-requests with low endpoints imply little optionality (the matter is final/closed) while a request ending high suggests that the matter is still open, giving the addressee greater optionality. Gussenhoven (2004), building on earlier work of Ohala (1994), has suggested that this distinction is ethological in origin, in other words it goes back to animal behavior, and that low pitch is associated with big (and therefore powerful) animals, while high pitch is associated with small and therefore less powerful animals. The big/small association has, he suggests, become encoded in prosody. But how does this relate to “attitude”?

I have argued in the past (e.g., Wichmann 2000) that some perceived (usually negative) attitudes arise simply because there is a mismatch between the hearer’s expectations and what the speaker actually does. On the assumption that the speaker intends to convey something relevant to the conversation, the hearer will endeavor to infer what this meaning is. A please-request uttered with a falling contour assumes compliance, but the hearer may not feel that assumed compliance is appropriate and may infer something resembling an insult. Similarly, if an expression of gratitude such as thank you sounds casual when the hearer believes that greater gratitude is due, they will perceive the speaker as “rude” or “offhand”. While these choices may have been intentional, with the speaker aware of the implicature generated and prepared to deal with the consequences, it may also be an unintended mistake, which will disrupt the communication until the misunderstanding is resolved. In other words, if things go wrong, participants interpret prosodic “mistakes” as intentional messages and infer meaning accordingly.

Perceived “mismatches” – prosodic behavior that appears to diverge from the hearer’s expectations, especially in cross-cultural situations – also arise in other areas of prosody. Some cultures, for example, tolerate silences between turns, while others value the apparent “enthusiasm” of overlapping speech. Cultural rules for turn-taking behavior are unconscious, and if they are broken, the participants assume that it reflects some intentional behaviour – reticence, aggressiveness, enthusiasm, and so on – rather than a simple error. Tannen (1981) notes the different attitude to turn-taking between New York Jewish speakers and non-New Yorkers. Overlap is “used cooperatively by New Yorkers, as a way of showing enthusiasm and interest, but it is interpreted by non-New Yorkers as just the opposite: evidence of lack of attention”. In some cases, divergent behavior can be responsible for national stereotypes, such as “the silent Finn”, because of the Finnish tolerance for long silences in conversation. Eades (2003) points to a problem arising from similar discrepancies in the interactional behavior between Australian English and Australian Aboriginal cultures. In Australian English interaction a long silence is unusual, and can cause discomfort, but Aborigines value silence, and do not regard it as a sign that the conversation is not going well (2003: 202–203). Eades is particularly concerned with the disadvantage for Aborigines in the context of the courtroom, where silences “can easily be interpreted as evasion, ignorance, confusion, insolence, or even guilt” (2003: 203).

“Attitude” through sequentiality

The second, very different, approach to the prosodic expression of attitude has been suggested by research into prosodic entrainment or accommodation. The idea of “entrainment” goes back to observations in the seventeenth century of the behavior of pendulums, which gradually adapt to each other’s rhythm. Conversation between adults frequently displays accommodation or convergence in both verbal and nonverbal behavior. Gestures, posture, and facial expressions can all mirror those of the interlocutor, while accommodation in speech includes changes to pronunciation and, at the prosodic level, pitch range, pausing, and speech rate. Whether this tendency to converge or accommodate is an automatic reflex or a socially motivated behavior is still a matter of debate (see the discussion in Wichmann 2012). There is no doubt an element of both, and the degree of accommodation may to some extent depend on the affinity felt between interlocutors (Nilsenova and Swerts 2012: 87). By mirroring the other’s verbal and nonverbal signals it is possible to both reflect and to create a greater rapport with the other. Conversely, a failure to accommodate may reflect, or create, a distance between interlocutors.

We have already seen that this kind of rhythmic entrainment or adaptation may account for the timing of turns and backchannel responses. There is also evidence to suggest that a similar accommodation occurs in the choice of pitch “register”. An early model of English intonation that contained an element of sequentiality is the discourse intonation model of David Brazil (e.g., Brazil, Coulthard, and Johns 1980), in particular his idea of “pitch concord”, which involves matching pitch level across turns (see also Wichmann 2000: 141–142). An interactional account of pitch matching is also to be found in Couper-Kuhlen (1996), who suggests that when a speaker response echoes the previous utterance using the same register (i.e., relative to the speaker’s own range), the response is perceived as compliant, whereas if it copies the pitch contour exactly it can be perceived as mimicry. A more recent longitudinal study by Roth and Tobin (2009) showed that prosodic accommodation between students and teachers correlated with lessons perceived as “harmonious”.

It is this matching across turns, in addition to the phonological choices made within an utterance, which can generate – intentionally or unintentionally – a perceived “attitude”. Conversational participants are expected to be “in time” and “in tune” with one another; failure to do so may suggest a lack of affinity, whether or not it was intended. The “attitude” that is then perceived by the hearer is a pragmatic inference that depends on the context of situation.

As Nilsenova and Swerts (2012) rightly point out, an awareness of accommodation behavior, and the signals it can send, may be important for learning situations. Above all, it reminds us that human communication does not consist of isolated utterances but that meaning is made jointly: as Tomasello puts it: “(h)uman communication is … a fundamentally cooperative enterprise” (2008: 6).

REFERENCES

  1. Aijmer, K. 1996. Conversational Routines in English: Convention and Creativity, London: Longman.
  2. Archer, D., Aijmer. K., and Wichmann, A. 2012. Pragmatics. An Advanced Resource Book for Students, London: Routledge.
  3. Baumann, S. and Schumacher, P.B. 2011. (De-)accentuation and the processing of information status: evidence from event-related brain potentials. Language and Speech 55(3): 361–381.
  4. Benus, S., Gravano, A., and Hirschberg, J. 2007. The prosody of backchannels in American English. In: Proceedings ICPhS XVI Saarbrücken, 1065–1068.
  5. Brazil, D., Coulthard, M., and Johns, C. 1980. Discourse Intonation and Language Teaching, London: Longman.
  6. Brewer, R.F. 1912 Speech. In: Voice Speech and Gesture. A pPractical Handbook to the Elocutionary Art, R.D. Blackburn (ed.), Edinburgh: John Grant.
  7. Brown, G., Currie, K., and Kenworthy, J. 1980. Questions of Intonation, Baltimore: University Park Press.
  8. Caspers, J. 2001. Testing the perceptual relevance of syntactic completion and melodic configuration for turn-taking in Dutch. In: Proceedings of Eurospeech.
  9. Chafe, W. 1979. The flow of thought and the flow of language. Syntax and Semantics, vol. 12: Discourse and Syntax, Academic Press Inc.
  10. Chen, A., Den Os, E., and De Ruiter, J.P. 2007. Pitch accent type matters for online processing of information status. The Linguistic Review 24(2): 317–344.
  11. Couper-Kuhlen, E. 1996. The prosody of repetition: on quoting and mimicry. In: Prosody in Conversation, E. Couper-Kuhlen and M. Selting (eds.), Cambridge: Cambridge University Press.
  12. Couper-Kuhlen, E. and Ford, C.E. 2004. Sound Patterns in Interaction, Amsterdam: John Benjamins.
  13. Cruttenden, A. 1997. Intonation, 2nd edition, Cambridge: Cambridge University Press.
  14. Dahan, D., Tanenhaus, M.K., and Chambers, C.G. 2002. Accent and reference resolution in spoken language comprehension. Journal of Memory and Language 47: 292–314.
  15. Deterding, D. 2012. Issues in the acoustic measurement of rhythm. In: Pragmatics and Prosody in English Language Teaching, J. Romero-Trillo (ed.), 9–24, Dordrecht: Springer.
  16. Eades, D. 2003. The politics of misunderstanding in the legalsystem: Aboriginal speakers in Queensland. In: Misunderstandings in Social Life, J. House, G. Kasper, and S. Ross (eds.), 199–226, London: Longman.
  17. Edelsky, C. 1981. Who’s got the floor? Language and Society 10: 383–421.
  18. Fry, D.B. 1955. Duration and intensity as physical correlates of linguistic stress. Journal of the Acoustical Society of America 27(4): 765–768.
  19. Fry, D.B. 1958. Experiments in the perception of stress. Language and Speech 1: 126–152.
  20. Gordon, M. and Applebaum, A. 2010. Acoustic correlates of stress in Turkish Kabardian. JIPA 40(1): 35–58
  21. Gussenhoven, C. 2004. The Phonology of Tone and Intonation, Cambridge: Cambridge University Press.
  22. Halliday, M.A.K. and Hasan, R. 1976. Cohesion in English, London: Longman.
  23. Heldner, M. 2011. Detection thresholds for gaps, overlaps, and no-gap-no-overlaps. Journal of the Acoustical Society of America 130(1): 508–513.
  24. Hirschberg, J. and Ward, N. 1995. The interpretation of the high-rise question contour in English. Journal of Pragmatics 24: 407–412.
  25. Jurafsky, D., Bates, R., Coccaro, N., Martin, R., Meteer, M., Ries, K., Shriberg, E., Stolcke, A., Taylor, P., and Van Ess-Dykema, C. 1997. Automatic detection of discourse structure for speech recognition and understanding. In: Proceedings of the 1997 IEEE Workshop on Automatic Speech Recognition and Understanding.
  26. Kong, E. 2004. The role of pitch range variation in the discourse structure and intonation structure of Korean. In: Proceedings of Interspeech, 3017–3020.
  27. Lehiste, I. 1979. Perception of sentence and paragraph boundaries. In: Frontiers of Speech Research, B. Lindblom and S. Ohman (eds.), London: Academic Press.
  28. Levis, J. and Pickering, L. 2004. Teaching intonation in discourse using speech visualisation technology. System 32: 505–524.
  29. Local, J. 1992. Continuing and restarting. In: The Contextualization of Language, P. Auer and A. di Luzio (eds.), 273–296, Amsterdam: Benjamins.
  30. Local, J. and Walker, G. 2004. Abrupt joins as a resource for the production of multi-unit, multi-action turns. Journal of Pragmatics 36(8): 1375–1403.
  31. Maynard, S.K. 1989. Japanese Conversation. Norwood, NJ: Ablex.
  32. Maynard, S.K. 1997. Analyzing interactional management in native/non-native English conversation: a case of listener response. International Review of Applied Linguistics in Language Teaching 35: 37–60.
  33. Mennen, I. 2007. Phonological and phonetic influences in non-native intonation. In: Non-native Prosody: Phonetic Descriptions and Teaching Practice, J. Trouvain and U. Gut (eds.), 53–76, The Hague: Mouton de Gruyter.
  34. Nakajima, S. and Allen, F.A. 1993. A study on prosody and discourse structure in cooperative dialogues. Phonetica 50: 197–210.
  35. Nilsenova, M. and Swerts, M. 2012. Prosodic adaptation in language learning. In: Pragmatics and Prosody in English Language Teaching, J. Romero-Trillo (ed.), 77–94, Dordrecht: Springer.
  36. O’Connor, J.D. 1973. Phonetics, Harmondsworth: Penguin Books.
  37. O’Connor, J.D. and Arnold, G.F. 1961. Intonation of Colloquial English, London: Longman.
  38. Ohala, J.J. 1994. The frequency code underlies the sound-symbolic use of voice pitch. In: Sound Symbolism, L. Hinton, J. Nichols, and J.J. Ohala (eds.), 325–347. Cambridge, Cambridge University Press.
  39. Peppé, S. and McCann, J. (2003). Assessing intonation and prosody in children with atypical language development: the PEPS-C test and the revised version. Clinical Linguistics and Phonetics 17(4/5): 345–354.
  40. Pickering, L., Hu, G., and Baker, A. 2012. The pragmatic function of intonation: cueing agreement and disagreement in spoken English discourse. In: Pragmatics and Prosody in English Language Teaching, J. Romero-Trillo (ed.), 199–218, Dordrecht: Springer.
  41. Pierrehumbert, J.B. 1987. The Phonology and Phonetics of English Intonation. PhD thesis 1980, Indiana University Linguistics Club.
  42. Pierrehumbert, J.B. and Hirschberg, J. 1990. The meaning of intonation contours in the interpretation of discourse. In: Plans and Intentions in Communication and Discourse, P.R. Cohen, J. Morgan, and M.E. Pollack (eds.), 271–311, Cambridge, MA: MIT Press.
  43. Piske, T. 2012. Factors affecting the perception and production of L2 prosody: research results and their implications for the teaching of foreign languages. In: Pragmatics and Prosody in English Language Teaching, J. Romero-Trillo (ed.), 41–59, Dordrecht: Springer.
  44. Riesco, B.S. and Romero-Trillo, J. 2008. The acoustics of ‘newness’ and its pragmatic implications in classroom discourse. Journal of Pragmatics 40 (6) 1103–1116.
  45. Roth, W.-M. and Tobin, K. 2009. Solidarity and conflict: aligned and misaligned prosody as a transactional resource in intra- and intercultural communication involving power differences. Cultural Studies of Science Education. doi: 10. 1007/s11422-009-9203-8.
  46. Sacks, H., Schegloff, E.A., and Jefferson, G. 1974. A simplest systematics for the organisation of turn-taking in conversation. Language 50: 696–735.
  47. Shriberg, E., Bates, R., Stolcke, A., Taylor, P., Juraafsky, D., Ries, K., Coccaro, N., Martin, R., Meteer, M., and van Ess-Dykema, C. 1998 Can prosody aid the automatic classification of dialog acts in conversational speech? Language and Speech 41 (3–4) 443–492.
  48. Sluijter, A.M.C. and Terken, J.M.B. 1993. Beyond sentence prosody: paragraph intonation in Dutch. Phonetica 50: 180–188.
  49. Szczepek Reed, B. 2011. Beyond the particular: prosody and the coordination of actions. Language and Speech 55(1): 13–34.
  50. Tannen, D. 1981. New York Jewish conversational style. International Journal of the Sociology of Language 30: 133–149.
  51. Thompson, S.E. 2003. Text-structuring metadiscourse, intonation and the signalling of organisation in academic lectures. Journal of English for Academic Purposes 2: 5–20.
  52. Tomasello, M. 2008. Origins of Human Communication, London: MIT Press.
  53. Ward, N. and Tsukahara, W. 2000. Prosodic features which cue back-channel responses in English and Japanese. Journal of Pragmatics 32: 1177–1207.
  54. Wells, J. 2006. English Intonation, Cambridge: Cambridge University Press.
  55. Wichmann, A. 2000. Intonation in Text and Discourse, London: Longman.
  56. Wichmann, A. 2004. The intonation of please-requests: a corpus-based study. Journal of Pragmatics 36: 1521–1549.
  57. Wichmann, A. 2005. Please – from courtesy to appeal: the role of intonation in the expression of attitudinal meaning. English Language and Linguistics 9(2): 229–253.
  58. Wichmann, A. and Caspers, J. 2001. Melodic cues to turntaking in English: evidence from perception. In: Proceedings, 2nd SIGdial Workshop on Discourse and Dialogue, Aalborg, Denmark.
  59. Wilson, M. and Wilson, T.P. 2005. An oscillator model of the timing of turn-taking. Psychonomic Bulletin and Review 12(6): 957–968.
  60. Yngve, V. 1970. On getting a word in edgewise. In: Papers from the Sixth Regional Meeting of the Chicago Linguistic Society, 567–577.
  61. Yule, G. 1980. Speakers’ topics and major paratones. Lingua 52: 33–47.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.190.152.38