4.1.3 System Feature

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

science research in general. However, clean, manipulatable independent variable at the operation-

alization level is not always available in all research contexts. In many experimental designs, it is

very dicult, if not entirely impossible, to reasonably manipulate some of the variables or factors of

interest (especially the factors related to user characteristics and perceptions) due to the nature of

these variables as well as the unpredictable impact from various contextual factors. erefore, as an

alternative option, many IIR researchers choose to include quasi-independent variables in their user

study designs and modeling.

Quasi-independent variables are often associated with individual traits and may be treated as

statistically independent but are not subject to random assignment and direct manipulation, as are

independent variables (Kelly, 2009). Since many user characteristics cannot be directly manipulated

or randomly assigned (e.g., task familiarity, topic knowledge, domain knowledge, perceived task dif-

culty) through any special design in recruitments, they are often represented by quasi-independent

variables in statistical models. For instance, Zhang et al. (2015) studied the eect of users’ domain

knowledge on their search behaviors and adopted participants’ self-reported level of domain knowl-

edge in building statistical and prediction models. In this example, the level of domain knowledge

was treated as independent in the regression models. However, as a quasi-independent variable, par-

ticipants’ domain knowledge was not subject to any form of random assignment or predetermined

manipulation in the user study design. Consequently, the actual distribution of dierent domain

knowledge levels was not fully controlled by the researchers.

Similarly, Liu et al. (2010) employed a series of self-reported task features and user character-

istics (e.g., task diculty, topic knowledge) and investigated how these quasi-independent variables

aect users’ querying behavior, browsing and result examination strategies, and eye movements in

task-based search sessions. Aula, Khan, and Guan (2010) studied how users’ search tactics vary

across dierent levels of (self-reported) task diculty. Again, in these examples, quasi-independent

variables were measured mainly through self-reported metrics (e.g., 5-point or 7-point Likert

Scales) and were not directly designed or manipulated by researchers. us, compared to indepen-

dent variables (e.g., task type, task product, task goal, system/interface features), quasi-independent

variables are often more dicult to control and are subject to the impacts caused by a variety of

even unknown contextual factors. is is one of the reasons why it is critical to control seemingly

irrelevant contextual variables and hidden factors in data analyses when quasi-independent vari-

able(s) are involved in user study design.

4.1.2 PARTICIPANT AND TASK

In IIR user studies (especially in the research focusing on understanding user behavior and experi-

ence), participant and task are two of the central facets that fundamentally determine the scope and

direction of data collection process (e.g., what type of behavioral data can be collected, from whom

should we collect data, what is the nature of the tasks that contextualize participants’ interactions

4.1 STRUCTURE OF THE FACETED FRAMEWORK

28 4. FACETED FRAMEWORK OF INTERACTIVE IR USER STUDIES

with search systems). As is shown in Figure 4.1, we identied a series of subfacets under these two

facets and found that these two main facets and the associated subfacets not only aect the study

procedure, but also partially decide the quality of data collected, the reliability of statistical results,

as well as the generalizability of the ndings emerged from the empirical evidences. is subsection

explains the two main facets and the related subfacets in detail with examples from the research

paper collection.

Participant recruitment is the starting point of user study procedure and determines the

source of available data (e.g., search behavioral data, self-reported and neuro-physiological data on

interaction experience, search evaluation data). Dierent recruitment methods are often associated

with dierent experimental setups. From our collection of user study papers, we summarized two

main recruitment approaches: (1) widely used small-scale user study recruitment methods, such as

yers, in-class recruitment, personal social networks, and internal mailing lists; and (2) large-scale

crowdsourcing techniques, such as recruitment within certain institutions (e.g., Microsoft’s large-

scale user studies) or via professional crowdsourcing and survey platforms (e.g., Amazon Mechan-

ical Turk, SurveyMonkey). In small-scale user studies, researchers usually expect high commitment

from participants and ask them to engage in relatively complex tasks (e.g., complex search task and

work task, relevance and usefulness judgments, post-search in-depth interview) that need to be car-

ried out through specic experimental systems and/or within certain controlled lab settings. With

a variety of predened constraints and conditions, researchers are more likely to collect a variety of

more detailed, reliable data on users’ search interactions, cognition, and experience.

Let us examine a couple of examples. Cole et al. (2013) invited participants to complete pre-

dened search tasks in their controlled lab and collected data on their search interactions. In this

case, researchers had a relatively nice control over a series of contextual factors (e.g., task type, task

topic, search system and interface) and successfully collected several types of data on users’ search

interactions, including search behavioral data, eye movement data, and qualitative data regarding

search experience and obstacles from post-search interview transcripts. Moshfeghi, Triantallou,

and Pollick (2016) conducted an fMRI study on information needs and demonstrated that users’

knowledge states (i.e., information need, anomalous state of knowledge) can be inferred from the

activities of certain neuro-physiological signals. Apparently, at least part of the data collected in

these small-scale user studies would not be available if the participants were recruited and partici-

pated in the study via crowdsourcing platform.

When we look at the limitations, the recruitment methods used in small-scale (often lab-

based) user studies are inecient, time consuming, and expensive. To reduce the cost and expedite

participant recruitment and data collection process, many researchers in research institutions tend

to adopt convenience sampling methods and recruit students on campus as their participants.

However, only studying information seeking and search behaviors of students from the same

campus may result in the lack of variation in participants’ knowledge background and eventually

limits the generalizability of the ndings. Hence, for small-scale user studies, researchers have to

go beyond students and universities and to investigate non-student users’ search interactions in

naturalistic settings.

Compared to small-scale recruitment methods, a crowdsourcing platform is more ecient

for participant recruitment in that (1) it can easily recruitment large amounts of participants that

meet certain predened requirements (e.g., education level, language, experience and performance

in previous studies); and (2) researchers do not need to directly engage in the recruitment process or

worry about space and time constraints. Participants can be automatically recruited via crowdsourc-

ing tools in dierent time zones when researchers are sleeping (Kittur, Chi, and Suh, 2008). For

example, Zhang et al. (2014) explored the multidimensionality of document relevance and collected

human relevance judgment data via Amazon Mechanical Turk. Five hundred and two participants

were recruited from the crowdsourcing platform and provided rich relevance judgment data for

researchers to conduct both Exploratory Factor Analysis and Conrmatory Factor Analysis.

In contrast to the crowdsourcing research, in small-scale user studies, IIR researchers (in-

cluding the authors) often nd it dicult to recruit even more than forty participants within several

months. Note that some of the small-scale user studies do have their own recruitment and regis-

tration systems that allow participants to sign up for the study by themselves. However, in these

situations, researchers still need to actively advocate their study through dierent channels and the

recruitment eciency is usually quite low compared to crowdsourcing studies.

Despite the advantages discussed above, crowdsourcing recruitment has limitations. First,

researchers often nd it challenging to control the quality of data collected through crowdsourcing

platform as participants need to complete the entire study “in the wild” and the incentives/pay-

ments often turn out to be the only reason that motivates people to participate (in other words,

researchers cannot ensure that participants are really paying attention to the tasks and assignments

in their studies). To address this issue, many researchers set up several lters in their task and sur-

vey (e.g., timer, trap questions) to help them lter out low-quality data before conducting analysis.

Second, with respect to the associated study procedure, it is dicult for researchers to incorporate

complex tasks (involving many steps and a variety of actions) into crowdsourcing study settings and

also to expect good-quality data automatically come out from the study.

Given these limitations, many user study researchers have developed multiple approaches to

increase participants’ engagement and curiosity in crowdsourcing studies in order to improve data

quality (Law et al., 2016; Zhao and Zhu, 2014).

Sample size is another critical subfacet as it is highly relevant to the validity of statistical

results and the associated conclusions. On the reader’s side, sample size is also one of the important

dimensions based upon which reviewers judge the quality of a given piece of IIR user research.

Sakai (2016) discussed the importance of power analysis in IR study (especially in statistical sig-

nicance tests) and argued that researchers need to determine an appropriate sample size according

4.1 STRUCTURE OF THE FACETED FRAMEWORK

30 4. FACETED FRAMEWORK OF INTERACTIVE IR USER STUDIES

to the study setup (e.g., number of separate groups) and specic analysis methods adopted (e.g.,

parametric methods vs. nonparametric methods) and to avoid both overpowered and underpow-

ered situations in statistical modeling. In addition to sample size and recruitment methods, demo-

graphic characteristics (e.g., gender and age composition, educational background, occupation) as

important user characteristics are also frequently reported in the form of descriptive statistics in

IIR research papers.

Task design is a major component of many IIR user study design as it contextualizes users’

interactions with search systems. In most task-based IR studies, researchers usually assign pre-

dened search tasks to participants and ask them to search for information that is relevant to the

task topic or useful for addressing the tasks. ere are also a few eld studies where researchers

study the characteristics and distributions of users’ self-reported, authentic tasks of dierent types

(e.g., He and Yilmaz, 2017). Note that we included search task source as a subfacet in the framework

because the sources of tasks may signicantly aect users’ motivations, search strategies, and per-

formances in task-based search sessions.

To deconstruct tasks and represent dierent aspects of them in empirical studies, task fea-

tures and facets are often measured via separate independent variables in user-centered IR research

(e.g., Liu et al., 2010; Zhang et al., 2015). According to Li and Belkin (2008), a task (i.e., work task,

information seeking and search task) can be deconstructed into several facets and dimensions (e.g.,

task product, task goal, task stage, time length, urgency) that can be operationalized as dierent

variables. In addition, work task and search task should be treated as dierent levels of the entire

task context because work task as a broader context often motivates subjects to engage in a series

of information seeking and search tasks and to interact with search systems (Byström and Hansen,

2005; Li, 2009; Li and Belkin, 2010). erefore, dening task context in IIR user study not only in-

volves the design of specic search tasks, but also includes the construction of associated work tasks

or search scenarios as the backgrounds of the search tasks at hand (cf. Borlund, 2016; Mitsui et al.,

2017). Unfortunately, many of the recently published IIR user studies merely dened and presented

search tasks to participants and left out the related work task or search scenario descriptions, result-

ing in the lack of broader contexts or “cover stories” as well as the realism of natural search activities.

In addition to Li and Belkin (2008)’s faceted scheme, researchers have also proposed some

other task typologies that focus on one of the core aspects or facets of task context. For example,

Kelly et al. (2015) developed a cognitive complexity framework to characterize tasks of dierent

types and demonstrated that tasks of varying complexity can lead to dierent web search strategies.

Capra et al. (2018) focused on the factors aecting task prior determinability and indicated that

manipulating the dimensions and information items of tasks in dierent ways result in statistically

signicant dierences in some aspects of search behaviors and outcomes. ese typologies and the

associated empirical works enhanced our understanding of certain aspects of task and shed lights

on the design and manipulation of task features in IIR user studies.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 4.1.3 System Feature

Create new playlist

Sign In

Sign Up

Table of Contents for
4.1.3 System Feature