Implications and Limitations of the Faceted Framework

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

component of the user study evaluation here. Without a reliable ground truth measure, the entire

evaluation study would collapse.

In addition to the ground truth problem, there is a paradox in meta-evaluation research,

which often requires researchers to make compromises and tough choices in study design: me-

ta-evaluation of evaluation metrics need to be conducted based on a relatively large, session-level

dataset which usually includes both search behavior data and annotations from users and/or exter-

nal assessors. However, in the context of controlled laboratory studies, it is often hard to recruit a

large group of participants doing complex search and annotation tasks. Consequently, researchers

usually nd it dicult to denitively answer an IIR evaluation question with user studies data.

e most common compromise made to resolve this paradox is asking participants to do

more search sessions (described by the subfacet called amount of tasks), usually in within-subjects

design (included in the experimental design facet), for ensuring the richness of search and annota-

tion data. For instance, in Luo et al. (2017), each participant conducted 20 tasks within 2 h in a

controlled lab environment. Although this data collection strategy can improve the richness of data

and help control individual dierences, as a compromise it comes with a series of limitations, such

as potential learning eect and user fatigue. Fortunately, a variety of compensations can partially

mitigate the negative eect of these limitations. For example, when doing complex search and

annotation tasks, participants were asked to take a break between tasks to reduce possible fatigue

(recorded under quality control facet) (e.g., Jiang, He, and Allan, 2017). Also, the learning eects in

doing a sequence of search tasks can be mitigated by randomized task order (characterized under

the task facet) (e.g., Mao et al., 2016).

In sum, for the evaluation of meta-evaluation user studies, one should also start with the

specic research focuses (e.g., system-oriented features, search behavior features, physiological and

emotional indicators), deconstruct the study design into facets and reorganize them as a facet map

consisting of multiple levels of interrelated facets and factors. In faceted evaluation, the core facets

directly related to the major goals and study design compromises should be identied (e.g., behav-

ior and experience, task) and analyzed based upon their roles in addressing the proposed research

problems. en, other relevant subfacets can be added around the core facets in the facet map. e

facet values as well as the potential connections among them should be identied and evaluated

based on the extent to which the design decisions and manipulations in these subfacets jointly help

answer the research questions and/or control the “damages” caused by the aforementioned study

design decisions.

5.4 SUMMARY

is chapter explained and illustrated the faceted evaluation approach in three dierent lines of

user studies. Particularly, we proposed a multi-faceted, replicable evaluation method, facet mapping,

5.3 METAEVALUATION OF EVALUATION MEASURES

50 5. EVALUATING INTERACTIVE IR USER STUDIES OF DIFFERENT TYPES

in order to facilitate the deconstruction of complex user study design and also to support the con-

struction of facet networks in analyzing the connections and “collaborations” among dierent facets

(see Figure 5.1). Given the goals of faceted user study evaluation, the main missions of facet map

are: (1) identify the main facets and components of a user study; and (2) clarify the roles of dierent

facet values as well as the underlying connections among them in answering the predened research

questions (e.g., manipulating experimental conditions, controlling the context of search interaction,

damage control for the study design compromises in other facets). e faceted evaluation should

include both the assessment of each individual facet (e.g., the quality of task description and exper-

imental system) and (more importantly) the evaluation of the collaboration among dierent facets

within the same research context.

Improving the characterization and evaluation of IIR user studies will create sizable im-

pacts on multiple aspects of IIR research and also lead to new challenges and opportunities for

future works. Based upon the previously explained structure and missions of the faceted scheme,

the following chapters will further discuss the implications of this faceted framework for IIR user

study design, reporting and evaluation practices as well as the possible directions for future works

in related areas.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Implications and Limitations of the Faceted Framework

Create new playlist

Sign In

Sign Up

Table of Contents for
Implications and Limitations of the Faceted Framework