Conclusion

Our aims in this study were twofold: (1) to propose the first system for irony detection in social media content in French, and (2) to assess the portability of this system for other languages. The field of language processing to which our work is intended to contribute is a particularly active one, notably due to the importance of irony and sarcasm detection for improving the performance of opinion analysis systems.

Our first task was to establish a full state of the art concerning linguistic and computational approaches for the detection of figurative language. While our work focused specifically on irony and sarcasm, we also described other authors’ contributions in areas such as humor, satire, metaphor and comparison, as the borders between these phenomena are somewhat permeable. Based on our literature review, we made two main observations:

  1. 1) Research in the field of linguistics has approached figurative language from a semantic and pragmatic perspective, concentrating on the mechanisms involved in linguistic expressions of this type of language. These include hyperbole, rhetorical questions, false assertions, etc. Work in this area tends to focus on literary works, such as novels or poetry.
  2. 2) In computational work, irony has mostly been considered as a generic term, extended to cover sarcasm and, in some cases, satire. Studies in this area have made extensive use of social networks, such as Twitter; the presence of specific hashtags indicating the use of irony or sarcasm makes these data extremely valuable. Proposed approaches use feature-based supervised learning, using lexical, syntactic and, more rarely, pragmatic features.

We adopted a mixed approach, combining elements of existing linguistic and computational methods; it would be difficult to treat complex phenomena such as figurative language using an automatic approach without building on a detailed study of these phenomena in a corpus setting. Our chosen approach consisted of three steps.

First, we analyzed the pragmatic phenomena used to express irony. Our main aim was to verify whether the different types of irony identified in linguistics are present in specific corpora collected from social networks such as Twitter. To do this, we proposed a multilevel annotation scheme to determine whether or not individual tweets are ironic, the type of irony involved (explicit/implicit), the category of irony used, and the linguistic cues revealing the existence of this irony (such as emoticons, punctuation and opinion words). This annotation scheme was used for a campaign covering a corpus of 2,000 tweets in French. The quantitative results, along with analysis of the correlations between different levels of the scheme, showed that in most ironic tweets, irony is triggered either by implicit contradictions involving false assertions or by explicit contradictions in the form of an oxymoron or paradox. In the case of cues, negation was seen to be a particularly common marker in both ironic and non-ironic tweets.

Next, using our observations from the annotated corpus, we developed an automatic detection system for tweets in French. Three models were proposed: (1) SurfSystem, a model based on surface features found in the state of the art; (2) PragSystem, a model using pragmatic features extracted from the linguistic content of tweets alongside new features, notably opposition patterns, which proved to be most successful with an accuracy score of 87.7%; and (3) QuerySystem, a query-based method applied to tweets containing false assertions with negations, which were wrongly classified by PragSystem. Testing showed that this final method improves classification when applied to non-personal tweets, increasing accuracy to 88.51%.

Finally, we studied the portability of both the annotation scheme and the computational models used to detect irony in a multilingual context (for Italian, English and Arabic). We tested the performance of our proposed annotation scheme for Italian and English, and tested the performance of our feature-based automatic detection model for Arabic. The results of these experiments showed our scheme to be entirely relevant for Italian and English, languages which present the same tendencies as French. Applying a subset of features from the PragSystem model to a corpus of tweets in Arabic, we were also able to demonstrate the portability of these features, obtaining an accuracy value of 72.76%. Although this result is lower than that obtained for French, it is encouraging with regard to the development of irony detection approaches for Arabic tweets, combining both standard and colloquial forms of the language.

Our work opens up a number of interesting pathways for future research. The first of these relates to improving automatic polarity detection for ironic/sarcastic tweets within the context of sentiment analysis. To this end, we proposed three tweet analysis tasks as part of the DEFT@TALN 2017 evaluation campaign for opinion analysis and figurative language (Benamara et al. 2017), which we co-organized in collaboration with the LIMSI. In this latest edition of the challenge, we proposed three tasks: (1) classification of non-figurative tweets by polarity (objective, positive, negative or mixed); (2) identification of figurative language (irony, sarcasm or humor); (3) classification of figurative and non-figurative tweets by polarity (objective, positive, negative or mixed). For the challenge, the FrIC was expanded to include 7,724 tweets in French concerning news topics (politics, sports, movies, TV shows, artists, etc.) for the period from 2014 to 2016, selected on the basis of keywords (Hollande, Valls, #DSK, #FIFA, etc.) and/or specific hashtags, indicating the presence of figurative language (#ironie, #sarcasme, #humor, #joke). Twelve teams participated in the challenge. The best results, in terms of macro F-measures, were 0.650 for task (1), 0.783 for task (2) and 0.594 for task (3). These results clearly show that the use of figurative language makes it considerably difficcult to analyze opinions.

The second pathway for future investigation relates to ways in which our scheme may contribute to a better definition of the border between irony and sarcasm. Work has recently been carried out in this area, with Sulis et al. (2016) proposing a means of automatically distinguishing irony and sarcasm in tweets. It may be interesting to examine the relationship between fine-granularity pragmatic phenomena linked to irony, as proposed in this book, and the higher level distinction between irony and sarcasm.

Our third and final pathway concerns the development of an automatic irony detection system for multilingual corpora. In this context, we wish to evaluate the performance of a classifier trained using one corpus and tested using a second corpus in a different language. This would enable us to identify the best combination of features for irony detection, independent of language. Furthermore, we believe that automatic detection methods for irony/sarcasm may be improved by the use of a deep learning model based on neurone networks. Work in this area is already under way, in collaboration with the University of Turin, Italy, and the University of Valencia, Spain.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.102.235