3
to be necessary for a good result (Robertson, 2008, p. 447). is balance among dierent facets
is much more dicult to reach in user studies than in classical system-focused IR evaluations,
primarily because of various individual dierences and the dynamic nature of human cognition,
perception, and associated behaviors during the course of information seeking episodes. erefore,
the aims of this work on IIR user studies evaluation are:
identify the facets/dimensions of user study, especially the easily ignored user-side
factors which may signicantly alter the results of study; and
evaluate the strengths and limitations of dierent types of designs and compromises
made in user studies with respect to answering their respective research questions.
Indeed, there is no single best toolkit for user studies which can be universally applied for all
IIR-related research problems. However, it is still benecial for researchers to be clear about: (1)
given the research problem(s), what kind of balance they hope and need to reach among dierent
dimensions or facets of user study; and (2) what are the potential impacts of the decisions and
compromises made in user study design on the associated results and ndings?
1.3 SYSTEMATIC REVIEW AND USER STUDY EVALUATION
In this work, our approach to developing knowledge and addressing the research problems discussed
above is to build a faceted framework and apply it in deconstructing, characterizing, and evaluating
various types of IIR user studies. To develop and implement the faceted approach, we rst system-
atically reviewed and collected 462 IIR user study papers published in multiple high-quality IR
venues. en, we developed an initial faceted framework from a small portion of the selected papers
and revised and updated when new factors and facets emerged or being extracted from the paper
coding process.
In terms of the criteria of paper selection, we only included the user studies where researchers
proposed IIR-related research questions, recruited participants (instead of using simulated users),
and clearly articulated the major components of their study design (e.g., experimental system, task
design, test collection and corpus, interface). Hence, user studies where researchers only used search
behavior datasets from the existing user study collections (e.g., Text Retrieval Conference (TREC)
test collections,
1
NII Testbeds and Community for Information access Research (NTCIR) test col-
lections
2
), or large-scale search logs (e.g., Bing search logs, Yandex search logs) without designing
any new study were excluded from our analysis as they did not really pertain to the dilemma of
balancing dierent facets and making dicult compromises in user study design (e.g., Deveaud et
al., 2018; Feild and Allan, 2013; Jiang et al., 2017; Luo, Zhang, and Yang, 2014; Spink et al., 2004).
1
https://trec.nist.gov/
2
http://research.nii.ac.jp/ntcir/index-en.html
1.3 SYSTEMATIC REVIEW AND USER STUDY EVALUATION
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.84.33