27
science research in general. However, clean, manipulatable independent variable at the operation-
alization level is not always available in all research contexts. In many experimental designs, it is
very dicult, if not entirely impossible, to reasonably manipulate some of the variables or factors of
interest (especially the factors related to user characteristics and perceptions) due to the nature of
these variables as well as the unpredictable impact from various contextual factors. erefore, as an
alternative option, many IIR researchers choose to include quasi-independent variables in their user
study designs and modeling.
Quasi-independent variables are often associated with individual traits and may be treated as
statistically independent but are not subject to random assignment and direct manipulation, as are
independent variables (Kelly, 2009). Since many user characteristics cannot be directly manipulated
or randomly assigned (e.g., task familiarity, topic knowledge, domain knowledge, perceived task dif-
culty) through any special design in recruitments, they are often represented by quasi-independent
variables in statistical models. For instance, Zhang et al. (2015) studied the eect of users’ domain
knowledge on their search behaviors and adopted participants’ self-reported level of domain knowl-
edge in building statistical and prediction models. In this example, the level of domain knowledge
was treated as independent in the regression models. However, as a quasi-independent variable, par-
ticipants’ domain knowledge was not subject to any form of random assignment or predetermined
manipulation in the user study design. Consequently, the actual distribution of dierent domain
knowledge levels was not fully controlled by the researchers.
Similarly, Liu et al. (2010) employed a series of self-reported task features and user character-
istics (e.g., task diculty, topic knowledge) and investigated how these quasi-independent variables
aect users’ querying behavior, browsing and result examination strategies, and eye movements in
task-based search sessions. Aula, Khan, and Guan (2010) studied how users’ search tactics vary
across dierent levels of (self-reported) task diculty. Again, in these examples, quasi-independent
variables were measured mainly through self-reported metrics (e.g., 5-point or 7-point Likert
Scales) and were not directly designed or manipulated by researchers. us, compared to indepen-
dent variables (e.g., task type, task product, task goal, system/interface features), quasi-independent
variables are often more dicult to control and are subject to the impacts caused by a variety of
even unknown contextual factors. is is one of the reasons why it is critical to control seemingly
irrelevant contextual variables and hidden factors in data analyses when quasi-independent vari-
able(s) are involved in user study design.
4.1.2 PARTICIPANT AND TASK
In IIR user studies (especially in the research focusing on understanding user behavior and experi-
ence), participant and task are two of the central facets that fundamentally determine the scope and
direction of data collection process (e.g., what type of behavioral data can be collected, from whom
should we collect data, what is the nature of the tasks that contextualize participants’ interactions
4.1 STRUCTURE OF THE FACETED FRAMEWORK
28 4. FACETED FRAMEWORK OF INTERACTIVE IR USER STUDIES
with search systems). As is shown in Figure 4.1, we identied a series of subfacets under these two
facets and found that these two main facets and the associated subfacets not only aect the study
procedure, but also partially decide the quality of data collected, the reliability of statistical results,
as well as the generalizability of the ndings emerged from the empirical evidences. is subsection
explains the two main facets and the related subfacets in detail with examples from the research
paper collection.
Participant recruitment is the starting point of user study procedure and determines the
source of available data (e.g., search behavioral data, self-reported and neuro-physiological data on
interaction experience, search evaluation data). Dierent recruitment methods are often associated
with dierent experimental setups. From our collection of user study papers, we summarized two
main recruitment approaches: (1) widely used small-scale user study recruitment methods, such as
yers, in-class recruitment, personal social networks, and internal mailing lists; and (2) large-scale
crowdsourcing techniques, such as recruitment within certain institutions (e.g., Microsofts large-
scale user studies) or via professional crowdsourcing and survey platforms (e.g., Amazon Mechan-
ical Turk, SurveyMonkey). In small-scale user studies, researchers usually expect high commitment
from participants and ask them to engage in relatively complex tasks (e.g., complex search task and
work task, relevance and usefulness judgments, post-search in-depth interview) that need to be car-
ried out through specic experimental systems and/or within certain controlled lab settings. With
a variety of predened constraints and conditions, researchers are more likely to collect a variety of
more detailed, reliable data on users’ search interactions, cognition, and experience.
Let us examine a couple of examples. Cole et al. (2013) invited participants to complete pre-
dened search tasks in their controlled lab and collected data on their search interactions. In this
case, researchers had a relatively nice control over a series of contextual factors (e.g., task type, task
topic, search system and interface) and successfully collected several types of data on users’ search
interactions, including search behavioral data, eye movement data, and qualitative data regarding
search experience and obstacles from post-search interview transcripts. Moshfeghi, Triantallou,
and Pollick (2016) conducted an fMRI study on information needs and demonstrated that users’
knowledge states (i.e., information need, anomalous state of knowledge) can be inferred from the
activities of certain neuro-physiological signals. Apparently, at least part of the data collected in
these small-scale user studies would not be available if the participants were recruited and partici-
pated in the study via crowdsourcing platform.
When we look at the limitations, the recruitment methods used in small-scale (often lab-
based) user studies are inecient, time consuming, and expensive. To reduce the cost and expedite
participant recruitment and data collection process, many researchers in research institutions tend
to adopt convenience sampling methods and recruit students on campus as their participants.
However, only studying information seeking and search behaviors of students from the same
campus may result in the lack of variation in participants’ knowledge background and eventually
29
limits the generalizability of the ndings. Hence, for small-scale user studies, researchers have to
go beyond students and universities and to investigate non-student users’ search interactions in
naturalistic settings.
Compared to small-scale recruitment methods, a crowdsourcing platform is more ecient
for participant recruitment in that (1) it can easily recruitment large amounts of participants that
meet certain predened requirements (e.g., education level, language, experience and performance
in previous studies); and (2) researchers do not need to directly engage in the recruitment process or
worry about space and time constraints. Participants can be automatically recruited via crowdsourc-
ing tools in dierent time zones when researchers are sleeping (Kittur, Chi, and Suh, 2008). For
example, Zhang et al. (2014) explored the multidimensionality of document relevance and collected
human relevance judgment data via Amazon Mechanical Turk. Five hundred and two participants
were recruited from the crowdsourcing platform and provided rich relevance judgment data for
researchers to conduct both Exploratory Factor Analysis and Conrmatory Factor Analysis.
In contrast to the crowdsourcing research, in small-scale user studies, IIR researchers (in-
cluding the authors) often nd it dicult to recruit even more than forty participants within several
months. Note that some of the small-scale user studies do have their own recruitment and regis-
tration systems that allow participants to sign up for the study by themselves. However, in these
situations, researchers still need to actively advocate their study through dierent channels and the
recruitment eciency is usually quite low compared to crowdsourcing studies.
Despite the advantages discussed above, crowdsourcing recruitment has limitations. First,
researchers often nd it challenging to control the quality of data collected through crowdsourcing
platform as participants need to complete the entire study “in the wild and the incentives/pay-
ments often turn out to be the only reason that motivates people to participate (in other words,
researchers cannot ensure that participants are really paying attention to the tasks and assignments
in their studies). To address this issue, many researchers set up several lters in their task and sur-
vey (e.g., timer, trap questions) to help them lter out low-quality data before conducting analysis.
Second, with respect to the associated study procedure, it is dicult for researchers to incorporate
complex tasks (involving many steps and a variety of actions) into crowdsourcing study settings and
also to expect good-quality data automatically come out from the study.
Given these limitations, many user study researchers have developed multiple approaches to
increase participants’ engagement and curiosity in crowdsourcing studies in order to improve data
quality (Law et al., 2016; Zhao and Zhu, 2014).
Sample size is another critical subfacet as it is highly relevant to the validity of statistical
results and the associated conclusions. On the readers side, sample size is also one of the important
dimensions based upon which reviewers judge the quality of a given piece of IIR user research.
Sakai (2016) discussed the importance of power analysis in IR study (especially in statistical sig-
nicance tests) and argued that researchers need to determine an appropriate sample size according
4.1 STRUCTURE OF THE FACETED FRAMEWORK
30 4. FACETED FRAMEWORK OF INTERACTIVE IR USER STUDIES
to the study setup (e.g., number of separate groups) and specic analysis methods adopted (e.g.,
parametric methods vs. nonparametric methods) and to avoid both overpowered and underpow-
ered situations in statistical modeling. In addition to sample size and recruitment methods, demo-
graphic characteristics (e.g., gender and age composition, educational background, occupation) as
important user characteristics are also frequently reported in the form of descriptive statistics in
IIR research papers.
Task design is a major component of many IIR user study design as it contextualizes users’
interactions with search systems. In most task-based IR studies, researchers usually assign pre-
dened search tasks to participants and ask them to search for information that is relevant to the
task topic or useful for addressing the tasks. ere are also a few eld studies where researchers
study the characteristics and distributions of users’ self-reported, authentic tasks of dierent types
(e.g., He and Yilmaz, 2017). Note that we included search task source as a subfacet in the framework
because the sources of tasks may signicantly aect users’ motivations, search strategies, and per-
formances in task-based search sessions.
To deconstruct tasks and represent dierent aspects of them in empirical studies, task fea-
tures and facets are often measured via separate independent variables in user-centered IR research
(e.g., Liu et al., 2010; Zhang et al., 2015). According to Li and Belkin (2008), a task (i.e., work task,
information seeking and search task) can be deconstructed into several facets and dimensions (e.g.,
task product, task goal, task stage, time length, urgency) that can be operationalized as dierent
variables. In addition, work task and search task should be treated as dierent levels of the entire
task context because work task as a broader context often motivates subjects to engage in a series
of information seeking and search tasks and to interact with search systems (Byström and Hansen,
2005; Li, 2009; Li and Belkin, 2010). erefore, dening task context in IIR user study not only in-
volves the design of specic search tasks, but also includes the construction of associated work tasks
or search scenarios as the backgrounds of the search tasks at hand (cf. Borlund, 2016; Mitsui et al.,
2017). Unfortunately, many of the recently published IIR user studies merely dened and presented
search tasks to participants and left out the related work task or search scenario descriptions, result-
ing in the lack of broader contexts or cover stories” as well as the realism of natural search activities.
In addition to Li and Belkin (2008)’s faceted scheme, researchers have also proposed some
other task typologies that focus on one of the core aspects or facets of task context. For example,
Kelly et al. (2015) developed a cognitive complexity framework to characterize tasks of dierent
types and demonstrated that tasks of varying complexity can lead to dierent web search strategies.
Capra et al. (2018) focused on the factors aecting task prior determinability and indicated that
manipulating the dimensions and information items of tasks in dierent ways result in statistically
signicant dierences in some aspects of search behaviors and outcomes. ese typologies and the
associated empirical works enhanced our understanding of certain aspects of task and shed lights
on the design and manipulation of task features in IIR user studies.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.143.110.220