17
Paper Selection
Develop and Evaluate
Inclusion/Exclusion Criteria
Identify Sources
Manually Search and Select Papers
Multi-stage Coding: from Core Facets
to Relevant Facets and Factors
Revise the Coding Scheme when New
Factors Emerge
Establish the Initial Coding Scheme
Coding Scheme
Figure 3.1: Paper selection and coding scheme.
3.1 PAPER SELECTION: SIGIR, CHIIR, CHI, JASIST, IP&M, AND
TOIS
e primary purposes of our paper survey are: (1) carefully examine the facets of recent IIR user
studies and their combinations in respective research contexts; and (2) develop a faceted evaluation
framework which can be applied to a variety of research problems and future meta-evaluation of
study design and reporting practices.
With respect to paper selection criteria, this work includes IIR user studies where researchers
conducted user studies (e.g., recruited real participants, designed systems and/or interventions) to evaluate
experimental systems and/or examine user behavior and experience in online information seeking. We did
not only select IIR evaluation research, as many other types of human focused user studies also con-
tribute to the understanding of evaluation-related fundamental concepts (e.g., information need,
uncertainty, anomalous state of knowledge). Additionally, we excluded user studies where research-
ers directly employed existing behavioral datasets for evaluation or meta-analysis and provided very
limited information about study designs and research contexts.
To ensure the accuracy of including and excluding papers based on our criteria, instead
of using keyword searching, we manually reviewed the titles, and abstracts of all papers published
3.1 PAPER SELECTION: SIGIR, CHIIR, CHI, JASIST, IP&M, AND TOIS
18 3. METHODOLOGY: PAPER SELECTION AND CODING SCHEME
during the time period on the selected venues. In the cases where the paper types are not clear in the
title and abstract, we read the main text until we gured out if the paper actually met our criteria.
In terms of the implementation of the paper selection criteria dened above, during the course of
paper selection, we carefully examined the methodology section of each paper that was recently
published in the selected venues and ltered out the researcher papers that (1) clearly indicated
that the experimental data was extracted from existing large-scale datasets and test collections (e.g.,
TREC, NTCIR, CLEF, AOL search logs), or (2) showed no clear evidence conrming that the
authors actually carried out a new user study to collect fresh data on user behavior, evaluation, and/
or interaction experience.
Similar to the ndings from Kelly and Sugimoto (2013), during paper selection we found
many cases where IIR researchers published multiple papers using one dataset from the same user
study. For these user study papers, to avoid repetition and potential bias we only included the one
that oers the most sucient details about the user study design. In this way we can better focus
on the diverse facets of user studies reported in dierent communities, and also obtain more details
regarding IIR research topics and focuses, specic methods and techniques, and result reporting
styles for faceted user study evaluation.
Overall, we obtained 462 IIR user study papers that jointly cover a wide range of research
problems and topics, methods, and ndings in the area of IR. In terms of the focuses of the selected
venues, while SIGIR has been publishing papers on both the topics of IIR evaluation and under-
standing user behavior and experience in recent years, the main focus is still on evaluating search
system features, such as interface component, search assistant, and underlying ranking algorithms.
For example, Singla, White, and Huang (2010) evaluated and compared the performances of
dierent trail-nding algorithms in web search and demonstrated the value of mining and recom-
mending search trails based on users’ search logs collected via a widely distributed browser plugin.
Wan, Li, and Xiao (2010) developed a multi-document summarization system to automatically
extract easy-to-understand English summaries (EUSUM) for non-native readers. ey ran a user
study to evaluate the novel system and their results indicated that the EUSUM system can produce
more useful, understandable summaries for non-native readers than the baseline summarization
techniques.
As it is illustrated in these two representative cases, system-evaluation-focused studies
published in SIGIR (as well as other reputable IR venues) often start with a practical IR problem
and present an innovative system, or a new component of a system, as a solution to the problem
highlighted. en, the role of user study here is to evaluate the system with users and to empirically
demonstrate the value of the system in terms of a series of predened facets or dimensions, such
as usefulness, relevance, usability, perceived workload, and eectiveness. Despite this longstand-
ing focus on IR evaluation, there is growing research eort on understanding search behavior
and experience within IR community in recent years (e.g., Edwards and Kelly, 2017; Lagun and
19
Agichtein, 2015; Ong et al., 2017). It is worth noting that the newly opened track of Human Factors
and Interfaces in SIGIR conference can oer more opportunities for the top-tier IR community to
accept more diverse information seeking and search studies that do not entirely focus on the topic
of system evaluation.
Compared to SIGIR, CHIIR as a community dedicated to user-centered IR research has
oered much room for all three lines of IIR studies. Particularly, more information seeking and
search studies are obtaining visibility in CHIIR platform. For instance, He and Yilmaz (2017)
characterized users’ search tasks emerged in naturalistic settings and discussed the connections
among dierent facets of tasks. is work sheds light on task characteristics in real-life information
seeking episodes and can oer useful insights on designing and evaluating task-specic IR systems
and recommendations. Wang, Sarkar, and Shah (2018) investigated how people judge the adequacy,
accuracy, relevance, and trustworthiness of dierent types of impersonal and interpersonal informa-
tion sources and how task type aects individuals’ evaluations of information sources. ese studies
explore dierent aspects of user and search context (e.g., task) in information seeking and search
and thus contribute valuable insights on identifying and characterizing a variety of user-oriented
facets of study design.
IIR studies published in CHI mainly cover user behavior research and search interface
evaluation. In fact, many of the published papers start with introducing and exploring the hidden
problems in search interaction and then focus on the actionable interface design solutions. is
approach to developing knowledge on IIR ts well with the HCI communitys focus on user
interfaces of dierent types, and also oers more diverse perspectives that largely enriched our
understanding on user-centered IR problems. For example, Morris et al. (2018) studied the needs
and encountered obstacles of information searchers with dyslexia and demonstrated that factoring
readability into search engine ranking and result presentation on interfaces can benet both dys-
lexic and non-dyslexic searchers. In this case, the mission of the conducted user study is to build a
bridge between users’ needs and actionable implications for search interfaces and algorithms design.
Vtyurina and Fourney (2018) explored the role of implicit conversational cues (e.g., intonation) in
guided task support with virtual assistants and demonstrated that many of user-assistant interaction
in a guided task process were initiated from implicit conversational cues rather than from plain,
explicit questions. e associated ndings generate important implications for detecting triggers
and inferring user intents in conversational search interaction. Kim et al. (2017) investigated the
diverse Internet search roles of adults in everyday-life information seeking (ELIS) and explained
how in-home contextual factors aect search roles and how systems and interfaces can be designed
to support the roles of dierent types.
With respect to the journals (i.e., JASIST, IP&M, and TOIS), they each cover all three
streams and major aspects of IIR research (e.g., supporting search tactics, Xie, Joo, and Bennett‐Ka-
pusniak, 2017; understanding the role of query auto-completion in search sessions, Smith, Gwiz-
3.1 PAPER SELECTION: SIGIR, CHIIR, CHI, JASIST, IP&M, AND TOIS
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.140.79