Chapter 15

The ASW Thesaurus

 

15.1. Introduction

We have already stressed the central role of the thesaurus in the metalinguistic device, ASW, many times. Among other things, its importance lies in the fact that it enables us to maintain the meta-lexicon of conceptual terms* (of objects of analysis* and activities of analysis*) at a fairly high level of generality; to ensure it contains a great many terms, while being able to give an account not only of the referential specificities of the universe of discourse* of this-or-that archive, but also of the viewpoints and therefore the various classifications (the various “olksonomies”) of objects of analysis in a given universe of discourse.

In addition, the thesaurus is an indispensable tool for the procedure of controlled description* which, as we know, constitutes one of the two procedures for the basic description of an audiovisual text or corpus.

Since we have already examined the most important aspects of the ASW thesaurus (in particular, see Chapter 10 as well as sections 11.4 and 14.4), here we shall content ourselves with giving a general overview of its organization and operation.

Section 15.2 reiterates the place and function of the thesaurus in the ASW metalinguistic system.

sections 15.3 gives a more in-depth treatment of a central aspect in the construction and internal function of the ASW thesaurus, which is that of the facets, interpreted here in the sense of a semantic feature – a classeme, to borrow a concept used by Greimas in his structural semantics [GRE 66] – which constitutes one possible dimension of the meaning (the signified) of a conceptual term and which is interpreted by one or a series of standardized expressions, also known as descriptors.

15.2. General presentation of the ASW thesaurus

In the ASW system of metalinguistic resources, the controlled vocabularies play an essential role:

– on the one hand, they enable us to keep to a bare minimum the nucleus of the ASW metalinguistic system which is formed by the meta-lexicon of conceptual terms representing the objects of analysis* in the ASW universe of discourse;

– and on the other, they offer an excellent opportunity to fulfill the habits, traditions and intellectual or ideological specificities of the most diverse of users and groups of users in relation to a pre-constituted domain of knowledge, a “field work” or a body of heritage.

The ASW thesaurus was primarily conceived to provide values for only a few conceptual terms* such as [Country] (i.e. the name of a particular country), [Geopolitical region] (the name of a geopolitical region), [Language] (the name of a language) or [Era] (the name of an historical era). This is a typical function of the thesaurus, which consists of offering lists (alphabetical, structured, etc.) of predefined values for a concept or conceptual configuration, i.e. a set of concepts positioned in relation to one another in a schema or sequence of description, a model of description* (see Chapter 10 for further explanations).1

Even if we were to adopt a very broad theoretical vision, enabling us to include a particular country, language or geopolitical region in the meta-lexicon of conceptual terms* representing the analytical objects* in a given universe of discourse*, this solution is not, in fact, a solution. After all, one could argue, “France” could be considered a specialized instance of the more general conceptual term, [Country]. “France”, considered as a conceptual term, could even have very different referential values: on one occasion, it could signify the constitutional definition of France as a national and independent state; on another, it could signify a particular social group’s cognitive representation of what France is; on a different occasion, it could signify a popular holiday destination, and so on.

Adopting this point of view entails a risk of making an ontology burgeon in size, albeit from a purely quantitative standpoint – to say nothing of the fact (which in our opinion is far more important) that the metalanguage of description would be reduced, in such a scenario, to the simple substitution of the lexicon of a natural language for a so-called metalinguistic lexicon.

If we were to implement such a “radical” policy, the metalanguage of description would lose one of its major advantages, i.e. being a tool of reasoned classification and reasoning (problem-solving) based on the descriptions of an audiovisual collection produced using a library of models of description which make up the metalanguage of description* peculiar to the universe of discourse* of an audiovisual archive.

In more concrete terms, in adopting such an approach, we would have to add to the ASW meta-lexicon of analytical objects, under the branch [Country], the 190 countries currently recognized in the world (to say nothing of the countries and other territories which may have existed as politically independent entities in the past); under the conceptual term [Language], we would have to add the six or seven thousand languages currently spoken in the world, and so on.

However, there are also clear limitations to the use of a thesaurus. We believe the two main ones to be its empirical exhaustivity and the fact that it imposes a terminological organization upon the analyst, which may not necessarily be that which he wishes to use. Given that every thesaurus is, to a certain extent, rigid, none escape the pitfall of being empirically limited.

In addition, given that every thesaurus is an artifact, a tool designed to deal with a certain type of problem in classifying all sorts of documents or objects (realia), its internal organization may correspond to the expectations and needs of an analyst – but also may not.

In any case, having accepted that the use of thesauruses is still the order of the day in the context of the “semantic web” as well, we have assigned this tool an important place in the general economy of the metalinguistic resources we use to define and create models for describing an audiovisual text or corpus belonging to the collection of an archive.

Figure 15.1 shows the general organization of the ASW thesaurus developed as part of the ASW-HSS research project. We distinguish three main parts:

Figure 15.1. Overall view of the ASW thesaurus

image

1. The part called ASW shared thesaurus. As its name suggests, it is made available to all analysts of audiovisual corpora belonging to the ASW universe of discourse*. This part is, in itself, divided into a specialized (and highly developed) thesaurus devoted to the domain of analysis (i.e. to the objects and domains of knowledge in the ASW universe of discourse*) and a specialized thesaurus dedicated to analysis of the textual object.

2. The part called Thesauruses private to a group of ASW users. In this part, we find the thesauruses created to fulfill the specific needs of a given audiovisual archive. In particular, these include facets, i.e. ranges of predefined values, which classify (standardized) expressions in accordance with the viewpoint adopted by a group of users for analyzing audiovisual corpora.

3. The part called Library of terminologies peculiar to an ASW external reference. This brings together the expressions from the various thesauruses, terminologies (glossaries, etc.) which we use directly (i.e. through the ASW working interface) to index a conceptual term or configuration of conceptual terms.

15.3. Facets and lists of standardized expressions

The first two parts of the ASW thesaurus (Figure 15.1) are constructed in the same way. The shared thesaurus and the private thesaurus of a particular group of users are made up of a set of facets and a (hierarchical) list of standardized expressions (“descriptors”).

Figure 15.2. The shared thesaurus – facets and lists of terms

image

Figure 15.2 shows the organization of the shared thesaurus in the form of several collections of facets and a set of lists of standardized terms or expressions.

As already stated, a facet semantically classifies a list of standardized expressions called descriptors. Figure 15.3 offers a concrete example of this. It shows a specific facet entitled ASW facet for the CT “Ancient Civilization of the Middle East”.

This facet has associated with it a list of expressions (descriptors) which identify different ancient civilizations of the Middle East: <Babylonian civilization>, <Elamite civilization>, <Hebrew civilization>, and so on. When carrying out his description, the analyst can use this list of standardized expressions in the form of a specific element figuring in the procedure of controlled description*, whose specific function is to enable the analyst to perform the task of identification and explicitation of all the subjects* relating to one or more ancient European civilizations.

That said, as we have already pointed out (see Chapter 10, and sections 11.4 and sections 14.4), a standardized expression may belong to several facets. This means that a standardized expression which belongs to more than one facet is considered to possess different meanings in the ASW universe of discourse*.

Figure 15.3. Example of a facet made up of a list of standardized expressions

image

Figure 15.4 shows the concrete example of a person’s name: <Augé, Marc>, which is part of a long list of people’s names that we need for describing the audiovisual corpora analyzed in the ASW-HSS project’s experimentation workshops.2

Figure 15.4. Example of an expression belonging to several facets

image

In the ASW universe of discourse*, the name in its standardized form <Augé, Marc> has three accepted uses, three different meanings: the fact of being a French personality, the fact of being an anthropologist and the fact of being an ethnologist. It is very probable that this same name has a whole range of other significations – outside the ASW universe of discourse. However, in that universe, it has precisely these three meanings.

As we can see in Figure 15.2, the element ASW facet for the specialized CT “Ancient Civilization of the Middle East” is part of a whole series of collections of facets. As explained in sections 11.4, each facet represents a dimension of the content (the signified), i.e. a specific semantic axis of a conceptual term or configuration of conceptual terms, or indeed of an instantiated conceptual term.3

In the current version of the shared thesaurus, we have classed the different facets we need for analyzing the objects belonging to the ASW universe of discourse, in a way which is identical to the taxonomic structure of the two metalexicons of conceptual terms denoting, on the one hand, the analytical objects, and on the other, the specific activities for analyzing said objects.

Thus, as shown in Figure 15.2, we distinguish collections of facets relating to the conceptual term [Object “Endurant”], collections of facets relating to the conceptual term [Object “Perdurant”] and indeed collections of facets relating to the conceptual term [Procedure of structural analysis of the textual object]. Each collection may, in principle, be made up of even more specialized collections of facets.

Two major avenues for future research and development emerge. The first is that of updating the ASW thesaurus, enriching it, with the different existing resources, terminological and otherwise, which are exterior to the ASW environment, while conditioning this process of enrichment in relation to the needs and expectations of analysts working for this-or-that specific archive.

Here, we think first of the resources from a language of indexation such as RAMEAU4 from the Bibliothèque Nationale de France (French National Library) or a thesaurus such as MOTBIS5 from the CNDP-CRDP, made up of a whole series of specialized micro-thesauruses which correspond, grosso modo, to the taxonomic domains according to which ASW meta-lexicon of conceptual terms is organized.

However, we also think of (open-ended) lists of names of places, people, institutions, works, etc. which the analyst of a particular archive might need and which would greatly simplify his task (the simplification consisting essentially of the fact of “ticking” this-or-that value for a conceptual term to be embellished, instead of producing a free, verbal description/indexation of it).

Of course, such a process of enrichment cannot just be done “manually”. It must rely on (at least partially) automatic processes, matching the lists of expressions and facets making up the ASW thesaurus with the metalinguistic data from resources outside the ASW system.

A second avenue – just as important as the first – concerns the reuse of free indexations (verbal or other forms of descriptions carried out by way of the procedure of free description*) produced by the community of analysts working with the ASW tools and resources for a particular audiovisual archive.

Such “recycling” would entail offering the analyst of an audiovisual text or corpus controlled suggestions of expressions produced freely by other analysts in the “ASW community” beforehand. These freely-produced expressions could form lists of predefined values – in just the same way as the standardized expressions interpreting the meaning of a conceptual term (see the examples above in Figures 15.3 and 15.4). The analyst would then have the option of reusing the expressions produced by other members of the community of analysts (by ticking them in the list of available expressions) or “devising” a new formula to explicitize a conceptual term or configuration of conceptual terms.


1. For instance, the ASW thesaurus has a facet called “Authors of French literature” which contains a fairly well-populated list (and open-ended, meaning it can be added to at any moment) of the names of people who fulfill the function of being an author of French literature. From the conceptual point of view, “author of French literature” is a configuration comprising the generic conceptual terms [Author] and [Literature by country], as well as the referential term (i.e. possessed of a single referential value) [Country: France].

2 See http://semiolive.ext.msh-paris.fr/asa-shs/.

3 An instantiated term is a conceptual term with a specific value. Remember that the conceptual term [Country] is said to be a generic term; the term [Country: <Peru>] is called a specialized term or, even better, an instantiated term. Thus, the element AICH Facet “Provinces of Cuzco” is a facet (a dimension of the meaning) of a configuration composed of the generic term [Province] and the specialized term [Territory: <Cuzco>].

4 See http://rameau.bnf.fr/.

5 See http://www.cndp.fr/thesaurus-motbis/site/.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.227.92