2.5 Summary

As it is shown above, IIR user studies in this group (system/interface features evaluation)

go beyond the classical system-oriented evaluation approach by taking into account users’ per-

ception, evaluation, and behavioral pattern in search interaction. e design and evaluation of

search systems have been strongly inuenced by computer science and algorithmic perspectives for

decades. In contrast to traditional IR evaluation works, IIR evaluation take a cognitive, user-cen-

tered perspective, put user (instead of explicit query or document) at the central position of search

interaction, and evaluate IR system based on the extent to which the system successfully represents

user’ knowledge states and supports their work and search tasks (Belkin, 2000; Belkin et al., 2004;

Ruthven, 2008). To accomplish this goal, researchers need to go beyond explicit queries and ranked

documents to capture and represent multiple user-focused facets (e.g., users’ knowledge states, task

stage, search task diculty) in search system evaluation.

2.4 METAEVALUATION OF EVALUATION METRICS

In addition to the two major types of IIR user studies discussed above, some of the studies take

a step back from specic systems and problems and seek to evaluate the user-oriented measures

applied in IR system evaluation. In this type of studies, researchers usually measure user behaviors

(e.g., query formulation, search result browsing and examination, eye movement) and experience

(e.g., search satisfaction, task diculty) with dierent sets of evaluation metrics (e.g., in-situ and

post-search evaluation metrics, traditional search behavioral metrics and neuro-physiological met-

rics) and evaluate the eectiveness of these measures against a series of predened user-oriented

ground truth (e.g., search satisfaction and usefulness judgment). e major goal of this line of

studies is to nd or design reliable measures that can be applied in future standard IIR evaluation

and search interaction studies.

For example, to address the limitations of traditional TREC-style relevance judgments,

Jiang, He, and Allan (2017) explored two improvements (i.e., collecting in situ judgment to make

relevant judgment contextual; collecting multidimensional assessments to address dierent aspects

of relevance and usefulness) and evaluated the new framework of relevance using six dierent

user experience measures as ground truth. Mao et al. (2016) designed a laboratory study where

they compared relevance annotations with document usefulness measures and demonstrated that

a measure based on usefulness rather than relevance annotated has a better correlation with user

satisfaction. In addition, they also found that external assessors can provide high-quality usefulness

annotation when addition search context information was provided to them. Chen et al. (2017) me-

ta-evaluated a series of online and oine metrics (e.g., online behavioral features, oine relevance

judgments) to study the extent to which they can infer actual search satisfaction in dierent task

scenarios. As it is shown in the above examples, this line of research sheds light on the connection

between user characteristics and IIR evaluation measures and often proposes innovative user-ori-

2.4 METAEVALUATION OF EVALUATION METRICS

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 2.5 Summary