CHAPTER 4

Data Collection

You can have data without information, but you cannot have information without data.

—Daniel Keys Moran

Route Guidance: The Importance of Data Collection

Data collection is the cornerstone of measurement and evaluation. Without it, there are no data, there are no results, and there is no information describing the potential success of a program, or that success is occurring at any level. Reaction and learning data serve as indicators of the potential use of knowledge and skills. These early indicators of success are important because they give insight into a program’s effectiveness in enabling learning, as well as the opportunity that exists for success with program implementation. Application and impact data describe the extent to which people are doing something with what they know and the consequences that result in their doing it. Collecting these data are important if you seek a more direct connection between a program and its ultimate value to the organization.

Suggested Route: Collecting Data at Multiple Timeframes

Real world evaluation requires data collection to occur at multiple timeframes, depending on the purpose and objectives of a program or project. Characteristics such as value of information, customer focus, frequency of use, and difficulty of the data collection process are important considerations when planning an evaluation. For example, reaction data offer the least valuable information in terms of the impact the information will have on the client’s decision making. Participants, facilitators, and other consumers of a program appreciate reaction and learning data; but clients, or those funding the program, have a greater appreciation for application, impact, and ROI data. Evaluation occurs more frequently at the reaction and learning levels, and its use decreases as we move toward application, impact, and ROI. Figure 4-1 summarizes these differing characteristics.

FIGURE 4-1. EVALUATION AT DIFFERENT LEVELS

While the characteristics of evaluation at each level differ, each level provides an important input into describing the success of and opportunity to improve a program, project, or initiative. That importance depends on the purpose and objectives of the program, as well as the purpose of the evaluation. Clear purpose and measurable objectives can lead to the appropriate data collection approach. How you approach data collection also depends on factors such as cost, utility, time required for participants and supervisors, amount of disruption, and other practical matters that influence the balance between the science of data collection and the art of addressing real world issues.

Methods and Instruments

Table 3-4 lists the different instruments you can use to collect data at the reaction, learning, application, and impact levels. Regardless of the method or technique you use to collect data, the instruments must be valid, reliable, simple, economical, easy to administer, and easy to analyze. The following provides a description of some common methods to capture data at the different levels of evaluation.

Surveys and Questionnaires

Surveys and questionnaires are the most common data collection methods. They come in all types, ranging from short reaction forms to detailed, multipaged instruments addressing issues important to all five levels of evaluation.

While both instruments are considered “self-administered,” the fundamental difference between a survey and a questionnaire is in the type of questions and the purpose behind the instruments. Surveys attempt to measure changes in attitude toward the program, work, policies, procedures, the organization, and even people. Pre- and post-measurements demonstrate changes in attitude. Attitude surveys tend to use dichotomous or binary questions and Likert-type scales. Figure 4-2 provides a sample of attitude survey questions.

FIGURE 4-2. SAMPLE ATTITUDE SURVEY QUESTIONS

Questionnaires provide deeper insight into the opinions and perceptions of respondents than a simple attitude survey. They represent a more comprehensive journey toward understanding what participants think about a program, what they plan to do with what they learn, how they are applying what they learned, and the impact their use of skills acquired from the program have on the business. Questionnaires may include similar types of questions as those found on attitude surveys, but they will also include numerical scales, rank-ordered questions, checklists, multiple-choice questions, and open-ended questions (see Figure 4-3). They allow us to collect data about reaction, learning, application, and impact. Questionnaires are also useful in collecting data describing the alignment between a program and improvement in business measures along with the monetary benefits of a program. In some circumstances, questions are added to a questionnaire to help isolate the effects of a program on improvement in business measures.

FIGURE 4-3. SAMPLE QUESTIONS ON QUESTIONNAIRE

ROI at Level 1

An end-of-course questionnaire is an opportunity to collect data that forecast the improvement in an organization, including the ROI. By adding the series of questions below to your end-of-program evaluation, not only do you have the opportunity to forecast the ROI, but you also encourage participants to look beyond the event.

The questions ask participants to think about specific actions they plan to take with what they learn from the program and to identify the specific measures that will improve as a result of that knowledge acquisition. Because the input is an estimate of contribution, an adjustment is made for the error in the estimate by asking the question about confidence. While completing this series of questions merely estimates the potential ROI, it provides an early indicator of the opportunity the group has to influence the organization.

Supplemental Questions to Ask on Feedback Questionnaires

• As a result of this program, what do you estimate to be the increase in your personal effectiveness (expressed as a percentage)? _________ percent

• Indicate what you will do differently on the job as a result of this program. (Please be specific.)

• What specific measures will improve? _________________________________

• As a result of any change in your thinking, new ideas, or planned actions, please estimate (in monetary values) the benefit to your organization (e.g., reduced absenteeism, reduced employee complaints, better teamwork, or increased personal effectiveness) over a period of one year. $ _____________

• What is the basis of this estimate? ________________________

• What confidence, expressed as a percentage, can you place in your estimate? (0 percent = no confidence; 100 percent = certainty) _______ percent

While technically the two instruments are different, in the real world, the words survey and questionnaire are used interchangeably. Both are self-administered instruments, and both present challenges in terms of design, distribution, and analysis. One of the biggest challenges is asking the right questions—on any given follow-up questionnaire (Level 3 and Level 4); for example, you can include content issues such as:

• progress with objectives

• action plan implementation

• relevance or importance

• perception of value

• use of materials

• knowledge or skill enhancement

• skills use

• changes with work actions

• linkage with output measures

• barriers or enablers

• management support

• recommendations for other audiences or participants

• suggestions for improvements

• other benefits, solutions, or comments.

Additionally, if your interest is in impact and ROI, you can add the following content items to your questionnaire:

• improvement or accomplishments

• defined measure

• provide the change

• unit value

• basis for input

• total impact

• list other factors

• improvement linked with program

• confidence estimate.

These content issues represent plenty of data to tell the complete story of program success, but they add little benefit if they are unnecessary. Using program objectives as the basis for the questions (see chapter 2) can help ensure that you ask what you should, not what you can.

Along with asking the right questions is asking them the right way. Write questions in such a way that people can and will answer them. Some of the most common challenges in writing survey questions include focusing a question to a single issue, keeping the question brief and to the point, writing the question clearly so respondents know how to answer the question, and asking questions so respondents will answer objectively, rather than biasing the question with leading terms. With regard to the response choices, one scale does not necessarily fit all questions. A consideration when developing the response choices includes how much variance the scale requires to capture an appropriate measure. Another consideration is discrimination between response choices—if you use a three-point scale, consider whether or not enough variance exists to give respondents the opportunity to respond as accurately as possible. Labeling responses is also important. If your question asks about effectiveness, the response choice descriptors should represent effectiveness, not agreement. Finally, symmetry is also a consideration. Is there a balance between the choices? Do they reflect a continuum that demonstrates the direction of response choices (such as lowest to highest or best to worst)?

Many books exist to help you design and administer surveys and questionnaires as well as analyze the data. Survey Basics (ASTD Press, 2013) describes the fundamentals of writing survey questions. Success with these data collection methods begins in the planning phase. Figure 4-4 presents a summary of steps you can take to design useful surveys and questionnaires.

FIGURE 4-4. STEPS FOR WRITING SURVEYS AND QUESTIONNAIRES

While not as great a challenge when collecting Level 1 Reaction and Level 2 Learning data, one of the greatest challenges when using surveys and questionnaires for follow-up evaluation (Level 3 Application and Level 4 Impact) is in getting a good response rate. Planning for high response is a critical success factor in collecting data after program implementation. Chapter 3 describes a planning process for ensuring high response.

Interviews and Focus Groups

Interviews and focus groups are useful data collection methods and can be particularly powerful when capturing data after implementation of a program or project. Interviews are helpful when it is important to probe for detail about an issue. They can help the evaluator gain clarity on understanding the link between the program and its outcomes, as well as a monetary value for a particular measure.

A major disadvantage of the interview process is that it is time-consuming. A one-hour interview involving an interviewee and the interviewer equals two hours of time to the organization. Interviewing a large number of people can be very expensive. For example in one major study, 36 people were interviewed within a four-week timeframe. Considering there were two interviewers (one scribe and one interviewer), 36 interviewees, and interviews that lasted 90 minutes each, the total time requirement was 9,720 minutes. Converting that to hours equals 162 hours of interview time. That was a lot of time and cost for data collection. Considering the scope of the project, the cost was justifiable and necessary to capture the data. But this will not always be the case. So, while powerful, interviews are expensive and should be used selectively.

Focus groups are helpful when trying to obtain in-depth feedback, but particularly so when it is also important for the group to hear from others. A focus group involves a small group discussion of the participants (or other source of data) facilitated by a person with experience in the focus group process. The operative word in focus groups is focus. It is important to keep the process on track; otherwise, the group can derail the process, and you are likely to leave without the information you need. For example, in working with an organization to determine the ROI for an absenteeism reduction program, you could conduct a focus group with supervisors of those people who do not show up for work. This would help you determine their perception of the cost of an unexpected absence. Invite five supervisors to participate in a 60-minute focus group, and after introductory comments, ask each of the five supervisors to answer three questions using a round-robin format, and giving each supervisor two minutes to answer each question, plus time for discussion. This should give you data that are reliable enough to use in an ROI calculation.

Observation and Demonstration

Observation and demonstration are useful techniques because they allow you to capture real-time data. By watching participants and taking note of their actions and behaviors, you can quickly tell if they are ready to apply what they know back on the job. Formal observation at Level 2 Learning is also known as a form of performance testing. The skill being observed can be manual, verbal, analytical, or a combination of the three. For example, new computer science engineers complete a course on systems engineering. As part of their Level 2 assessment, they demonstrate their skill by designing, building, and testing a basic system. The facilitator checks out the system, then carefully builds the same design and compares his results with those of the participants. The comparisons of the two designs provide evidence of what participants learned in the program.

A useful tool that can help measure success with knowledge, skill, and information acquisition through observation is the observation checklist, which allows the observer to check off whether or not the individual is following a procedure. Checklists with yes or no response choices are useful when there is either a right or wrong way to perform a skill. But much of the time, success with performance is based on a gradient. Sharon Shrock and Bill Coscarelli offer an observation tool (or performance test) they refer to as a behaviorally anchored numerical scale (see Table 4-1). This scale requires observers to rank the behavior of participants using a five-point scale that is anchored in descriptions of good and poor behaviors.

TABLE 4-1. SAMPLE BEHAVIORALLY ANCHORED NUMERICAL SCALE

Behavior

Performance

Rating

I. Response to directory assistance request

1.   Curt voice tone; listener is offended

1

2.   Distant voice tone; listener feels unwelcome

2

3.   Neutral voice tone; listener is unimpressed

3

4.   Pleasant voice tone; listener feels welcome

4

5.   Warm, inviting voice tone; listener feels included

5

Shrock and Coscarelli (2000)

Observation at Level 3 Application can also provide an objective view of an individual’s performance with knowledge and skills. For example, in retail sales, the classic approach to observation at Level 3 is the use of mystery shoppers, who go into a store and pose as customers, observing the way salespeople perform. After the shopping experience they write up an after-action report, describing their experience and rating the salesperson on behaviors identified by the client organization. Another example of observation that serves as a credible approach to assessing application of knowledge and skill is the process that monitors call center representatives. In this process, observers listen to conversations between the call center representative and the customer, rating them on a series of behaviors. The customer survey is another form of observation. Through the customer survey process, the customer rates a sales representative or service provider on performance. Unlike an ex-post-facto, self-administered questionnaire in which the respondents reflect on what they remember happening in the past, this form of observation takes place in real time, capturing data in the moment. In some retail stores, for example, at the point of checkout a system is in place that asks customers whether or not the cashier greeted them as they walked up to the checkout line. By simply answering yes or no, the customer has been placed in the role of observer, providing a rating of the cashier’s performance.

There is a fine line between observation at Level 2 Learning and observation at Level 3 Application. That line is drawn at the point where observation influences performance. For example, if your supervisor brings a checklist to your workplace and asks you to perform five tasks, and then checks each task off the checklist, that observation has likely influenced your performance. This form of observation is a Level 2 observation. On the other hand, when the observer is unknown or invisible to the person being observed, the data are usable to assess Level 3. The benefit of observation is that data are collected in real time, hopefully, from an objective observer. To enhance this objectivity, protocols, checklists, and other tools are provided to the observer to help with reliability of their observation. The downside of observation, however, lies in the fact that multiple observers can see different things—hence the need for protocols and checklists.

Test and Quizzes

Tests and quizzes are typical instruments used in evaluating learning. Tests are validated instruments that meet specific requirements of validity and reliability. Quizzes are less formal and, while they may or may not pass validity and reliability patrol tests, they do provide some evidence of knowledge acquisition.

There are a variety of different types of tests. The work of Shrock and Coscarelli, along with others, can provide more detail on the mechanics of good test design. But for purposes of this book, the following is a brief overview of the different types of tests.

Objective tests are those tests for which there is a right or wrong answer. For example, a true-false test, matching items, multiple-choice items, and fill in the blank items are examples of the types of questions found on objective tests.

Criterion-referenced test (CRT) is an objective test with a predetermined cutoff score. The CRT is a measure against carefully written objectives for a program. In a CRT, the interest lies in whether or not participants meet the desired minimum standards. The primary concern is to measure, report, and analyze participant performance as it relates to the instructional objectives. CRT is a gold standard in test design; the challenge is in determining that cutoff score.

Norm-referenced tests compare participants with each other or to other groups rather than to a specific cutoff score. They use data to compare the participants’ test scores with the “norm” or average. Although norm-referenced tests have limited use in some learning and development evaluations, they may be useful in programs involving large numbers of participants where average scores and relative rankings are important.

Performance testing is a form of observation assessment. It allows the participant to exhibit a skill (and occasionally knowledge or attitudes) that has been learned in a program. Performance testing is used frequently in job-related training, where the participants are allowed to demonstrate their ability to perform a certain task. In supervisory and management training, performance testing comes in the form of skill practices or role plays.

ROI at Level 2

Test scores connected to improvement in business impact measures not only validate the usefulness of the test, but also serve as the basis for forecasting an ROI using the test itself.

For example, a large retail store chain implemented an interactive selling skills program. The program manager developed a test to predict sales performance based on the knowledge and skills taught in the program. At the end of the program, participants took the comprehensive test. To validate the test, the learning team developed a correlation between the test scores and actual sales from associates. Results showed that a strong and significant correlation existed.

When a second group of participants took the test, the average test score was 78, which correlated with a 17 percent increase in weekly sales. The average sales per week, per associate at the beginning of the program was $20,734. The profit margin was 4 percent and the cost of the program was $3,500 per person. The company considered a working year to include 48 weeks. To forecast the ROI, the program manager calculated the profit on the predicted increase in sales, annualized the change in performance, and compared the results to the program cost.

• Average sales for the group prior to the program

$20,734.00 per week

• Score of 78 on test predicts 17 percent increase in sales

$ 3,524.78 per week

• Profit on sales is 4 percent

$ 140.99 per week

• Annual increase in profit (profit × 48 weeks)

$ 6,767.52 annually

93%

By using the predictive validity of a test and comparing test scores to increase in weekly sales, the learning department team predicted that this new group of participants could achieve a 93 percent ROI from increased sales resulting from the program.

Simulations

Another technique to measure learning is a job simulation. Simulation involves the construction and application of a procedure, process, behavior, or task that simulates or models the performance for which the program is designed to teach. The simulation is designed to represent, as closely as possible, the actual job situation. Simulations may be used during the program, at the end of the program, or as part of the follow-up evaluation. There are a variety of types of simulations:

Electrical and technical simulations use a combination of electronics and mechanical devices to simulate real-life situations. Programs to develop operational and diagnostic skills are candidates for this type of simulation. An expensive example of this type is a simulator for a nuclear plant operator, or the simulator used for boat operators leading large boats and ships through the Panama Canal. Other and less expensive types of simulators have been developed to simulate equipment operation.

Task simulation involves performance with a specific task. For example, aircraft technicians are trained on the safe removal, handling, and installation of a radioactive source used in a nucleonic oil-quantity indicator gauge. These technicians attend a thorough training program on all of the procedures necessary for this assignment. To become certified, they are observed in a simulation where they perform all the necessary steps on a check-off card. After they have demonstrated that they possess the skills necessary for the safe performance of this assignment, they become certified by the instructor.

Business games have grown in popularity in recent years. They represent simulations of a part or all of a business enterprise, where participants change the variables of the business and observe the effect of those changes. The game not only reflects the real world situation, but also the content presented in programs. One of the earlier computer-based business games that is still applied today is GLO-BUS: Developing Winning Competitive Strategies, developed by Arthur A. Thompson Jr. at The University of Alabama. This business game is ideal for high-potential employees who are likely to drive organization strategy at some point in their career.

Case studies are a popular approach to simulation. A case study represents a detailed description of a problem and usually contains a list of several questions posed to the participant, who is asked to analyze the case and determine the best course of action. The problem should reflect the conditions in the real world setting and the content of a program. There are a variety of types of case studies that can help determine the depth of a person’s knowledge, which include exercises, situational case studies, complex case studies, decision case studies, critical incident case studies, and action maze case studies. Readers of case studies must be able to determine conclusions from the text, discern the irrelevant from the relevant portions of the case, infer missing information, and integrate the different parts of the case to form a conclusion.

Role plays, sometimes referred to as skill practices, require participants to practice a newly learned skill or behavior. The participant under assessment is assigned a role and given specific instructions, which sometimes include an ultimate course of action. Other participants observe and score the participant’s performance. The role and instructions are intended to simulate the real world setting to the greatest extent possible.

The assessment center method is a formal procedure. Assessment centers are not actually centers in a location or building, rather the term refers to a procedure for evaluating the performance of individuals. In a typical assessment center, the individuals being assessed participate in a variety of exercises that enable them to demonstrate a particular skill, knowledge, or ability (usually called job dimensions). These dimensions are important to on-the-job success of the individuals for which the program was developed.

Simulations in the Virtual World

Simulations represent an interesting and interactive way in which to evaluate learning. Given the rise in technology-enabled learning, virtual reality as a form of simulation is taking on an important role in the evaluation process. Using 3-D animation, participants can be put into their role without leaving their computers. While virtual reality can be used to complement any type of program, it is most frequently used in game-based learning and product support training. Skills2Learn is among several providers of e-learning and virtual reality simulation. Their website, www.skills2learn.com, provides examples of how organizations use virtual simulation to support the development of their team members.

Action Plans

Action plans are an excellent tool to collect both Level 3 and Level 4 data. Action plans can be built into a program so that they are a seamless part of the process, rather than an add-on activity.

Using action plans, program participants can identify specific actions to take as a result of what they are learning. When using action plans for Level 4 data collection, participants come to the program with specific measures in mind and target those actions toward improving their pre-defined measures. Table 4-2 is an example of an action plan completed for a coaching project.

TABLE 4-2. COMPLETED ACTION PLAN

As you see in the action plan, Caroline Dobson has identified the specific measure that she wants to improve and set the objective for improvement in that measure. On the left-hand side of the action plan, she has listed the specific actions that she plans to take to improve the measure. She completes items A, B, and C—which represent the specific measure of interest, the monetary value of that measure, and how she came to that value—prior to leaving the program. She completes items E, F, and G—which describe the improvement in the measure due to the program—six months after the program.

Each additional participant in the coaching experience completed an action plan similar to that completed by Caroline. The output of the process is a table such as that shown in Table 4-3. Caroline Dobson’s results are listed at Executive #11. By reducing turnover by four, she saved the organization $215,000 for the year. She estimates 75 percent of the savings is due to the program. Because she is estimating the contribution, she adjusts that estimate by providing a confidence factor of 90 percent. The estimated monetary contribution of the coaching she received is $145,125. Results from each executive are summed in order to calculate the total monetary benefit of the program.

The key to successful implementation of action plans lies in the following steps:

• Prior to the intervention:

○ Communicate the action plan requirement early.

○ Have participants identify at least one impact measure to improve with the program.

• During the intervention:

○ Describe the action planning process at the beginning of the program or project.

○ Teach the action planning process.

○ Allow time to develop the plan.

○ Have the facilitator approve the action plan.

○ Require participants to assign a monetary value for each improvement.

○ If time permits, ask participants to present their action plans to the group.

○ Explain the follow-up process.

• After the intervention at a pre-determined time:

○ Ask participants to report improvement in the impact measure.

○ Ask participants to isolate the effects of the program.

○ Ask participants to provide a level of confidence for estimates.

○ Collect action plans, summarize the data, and calculate the ROI.

This data collection approach is not suitable for every program or for every audience. But when a program represents a cross-functional group of participants whose business needs vary, and the participants are familiar with those measures, action planning can be a powerful tool resulting in credible output.

TABLE 4-3. ACTUAL DATA REPORTED: BUSINESS IMPACT FROM COACHING

Performance Contracts

Similar to an action plan is a performance contract, which is a tool used to document agreement between multiple parties as to what they will do in order to achieve an outcome. Organizations use performance contracts with their external contractors on a routine basis. They can also be used within the organization as people become involved in programs, projects, and initiatives.

A performance contract specifically states the impact measure or measures that need to improve. Parties charged with implementation agree to their role, the actions they will take, and dates by which they will take those actions. Those involved in the process tend to be the participant of the program, the supervisor, and sometimes the facilitator or project leader. Although the steps can vary according to the specific kind of performance contract and the organization, the following is a common sequence of events:

1.   With the approval of a supervisor, the participant makes the decision to attend a program or be involved in a project.

2.   The participant and supervisor mutually agree on a measure or measures for improvement.

3.   The participant and the supervisor set specific, measureable goals.

4.   The participant attends the program and develops a plan to accomplish the goals.

5.   After the program, the participant works toward completing the contract requirements against a specific deadline.

6.   The participant reports the results of their effort to the supervisor.

7.   The supervisor and the participant document the results and forward a copy to the program lead or department along with appropriate comments.

The supervisor, participant, and any other party involved in the process mutually select the actions to be performed or improved prior to the beginning of the program. The process of selecting the area for improvement is similar to that used in the action planning process. The topics can cover one or more areas, including routine performance, problem solving, innovative or creative applications, or personal development.

Program Follow-Up Sessions

In some situations, programs include a series of follow-up sessions. For example, if you are working with a major leadership program, you may offer multiple opportunities for participants to get together in person over a period of time. Each follow-up session is an opportunity to collect data regarding the use of knowledge and skills acquired from the initial or previous follow-up session. For example, how could you evaluate a comprehensive leadership program with three in-person sessions over the period of one year, along with e-based modules that serve as the basis for the upcoming in-person session? At each follow-up session, have the facilitator ask the participants to provide information on success with implementation, including the barriers and enablers. Program follow-up sessions are an ideal way to build evaluation into the program and capture meaningful data without incurring additional costs.

Performance Records

Performance data are available in every organization. Monitoring performance data enables management to measure performance in terms of output, quality, cost, time, and customer satisfaction. The ultimate need for a program is usually driven because there is a problem or opportunity with measures in the records. Measures such as sales, safety violations, rejects, inventory turnover, and customer satisfaction, among many others, are found in performance records. Examples of performance records that can serve you well in measuring the improvement in key measures include sales records, safety records, quality records, scorecards, operating reports, production records, inventory records, timekeeping records, and technology that may capture data on a routine basis.

While there is no one best way to collect data, performance records can prove to be the simplest, most cost-effective, and most credible approach.

Sources of Data

A variety of sources of data exist. As you have read in earlier chapters describing the standards that we set to ensure reliable implementation of evaluation practices, credibility is the most important consideration in selecting a source of data. Credibility is based on how much a source knows about the measure being taken and the reliability in the method to collect the data. Sometimes it may be important to go to multiple sources, but you always weigh the benefits against the costs of doing so. The following are a variety of sources of data.

Participants

Participants are likely the most credible source of any level of data, particularly reaction, learning, and application data. Because they are the audience that engages with the content and will apply the content, it goes without question that they have the best perspective of the relevance and importance of program content. Participants are also the key source when it comes to learning. Their ability to perform a task, take action, or change their behavior is evident through different techniques presented earlier. In terms of Level 3 Application data, participants know best about what they do every day with what they learn. The challenge is to find an effective, efficient, and reliable way to capture these data. Occasionally, when collecting application data, another perspective enhances the credibility and reliability of results; however, you should not discount the importance of participant input in capturing data at Level 3.

Participants’ Managers

Another source of data is the immediate supervisor, manager, or team leader of the participant. This audience often has a vested interest in evaluation, because they have a stake in the process due to the involvement of their employees. In many situations, they observe the participants as they attempt to apply the information, knowledge, and skills acquired in the program. Consequently, they can report on successes as well as difficulties and problems associated with the process, and can assess the relevance of the content and capability of the employee as a result of the program.

Direct Reports

Direct reports are an excellent source of data, particularly when the participants’ behaviors affect their work. For example, with leadership development, while the participants can give insights into the opportunities they have had to apply what they learn, the barriers that have prevented them from applying what they have learned, and their perspective on how effectively they are applying what they have learned, it is often the direct report who feels the change in the leadership behavior acquired through a program. The challenge with obtaining data from direct reports, however, is with the potential bias that can occur if they fear providing negative data will affect their jobs. So, as with any data collection approach, you want the individual to feel comfortable providing the information—this is when confidentiality and anonymity come into play. Assuring the direct reports that their data will be held in confidence and that the assessment is around the program and not the individual will often alleviate any fear of repercussion if negative input is provided.

Peer Groups

Peers of participants can provide good input into the extent to which a participant is using knowledge, skill, and information. This is especially true when that performance affects the peer’s work. Peers are often relied on to provide input into use of knowledge, skill, and information when collecting data through 360-degree feedback. While peers are not always in a role to provide comprehensive, objective data, if they engage closely with that participant, they can give insights into behavior changes and performance on the job. With regard to Level 4 data, a peer may or may not be the best source of data.

Internal Customers

The individuals who serve as internal customers of the program are another source of data when the program directly affects them. In these situations, internal customers provide reactions to perceived changes linked to the program. They report on how the program has (or will) influence their work or the service they receive. Because of the subjective nature of this process and the lack of opportunity to fully evaluate the application of skills of the participants, this source of data may be limited.

Facilitators

In some situations, the facilitator may provide input on the success of the program. The input from this source is usually based on observations during the program. Data from this source have limited use because of the vested interest facilitators have in the program’s success. While their input may lack objectivity, particularly when collecting Level 1 and Level 2 data, it is sometimes an important consideration.

Sponsors and Clients

The sponsor or client group, usually a member of the senior management team, is an important source of reaction data. Whether an individual or a group, the sponsor’s perception is critical to program success. Sponsors can provide input on all types of issues and are usually available and willing to offer feedback.

External Experts

Occasionally, external experts can provide insights that we cannot obtain from participants, their supervisors, peer groups, or direct reports. External experts are those people who are observing behaviors from a distance. They may capture these data through conversations or engagement in the work place. An expert may also be someone who is working closely with the participant on a project, providing coaching and support as the project is completed. Typically, experts have a set of standards they follow to assess performance and can provide insight often missed when the supervisor or direct reports are unfamiliar with a process or project.

Performance Records

As mentioned earlier in the discussion on performance monitoring, performance records are an excellent source of data, particularly when collecting Level 4 Impact data. Performance records house the information that we need, making it accessible and inexpensive. Performance records come from the systems in place and for all practical purposes are perceived as a credible source of data providing credible information.

Learning and Development Staff

A final source of data is the learning and development staff. On occasion, the learning and development staff can have an objective view of the success participants are having with new skills and behavior change. They can also be, occasionally, a source of data in determining how much impact the program has had on key measures. This is particularly true when the measures are housed in a system or monitored within the learning function. The challenge with using learning and development staff is that because the evaluation is of their program, the perception could exist that the input on improvement may be inflated. While this bias may or may not exist, it is important to manage that perception, particularly when a program has made significant impact on the measures of importance.

Ideal Versus Real

Triangulation is a process where multiple methods, sources, or theories are used to validate results. The concept comes from land surveying and navigational processes to determine location. It is most useful when applying qualitative data collection approaches. Triangulation is a valuable approach and in the ideal world, organizations would expend resources to triangulate data for every evaluation project. However, triangulation is not always an option nor is it always necessary. Time, constraints, and conveniences, as well as the purpose of the evaluation, dictate the number of sources and methods used to collect data.

Timing of Data Collection

Collecting data at multiple timeframes results in information that provides a complete road map of success and the alternative routes you need to take to improve it if you did not reach the intended outcome. As you have read, reaction and learning data are early indicators of program success. These data provide information you need to make immediate adjustments to the program. They also help you discern the actions you can take to further support application of knowledge, skills, and information. This is why reaction and learning data are collected during program implementation.

Application data describe how people are using what they know on a routine basis, the barriers that prevent them from using it, and the enablers that support them in using it. Therefore, application data are collected post-program. Improvement in business measures, or Level 4 data, result as a consequence of the application of what people know. Therefore, these data are also collected post-program. As mentioned in chapter 3, when you are planning your data collection, your objectives will lead you toward the right timing for data collection. Other considerations are the availability of data, the ideal time for behavior change and impact, stakeholder needs for the data, convenience of data collection, and constraints on data collection.

Availability of Data

The evaluation approach described in this book has a set of standards and guideposts that ensure consistent and reliable implementation of the process. Data availability is critical in the evaluation process. If the data are not available, the timing at which you collect data will reflect the time in which the data become available. For example, if you are evaluating a program and you plan to collect your application data in three months, but the impact data are not available at that three-month mark, you will not be able to collect the impact data. Therefore, during your planning process you must identify the time in which the data will become available and collect them at that point. If the data are nonexistent, a system must be put in place to create the data, making them available for tracking and accessing at the time needed. Another alternative is to use a proxy. Proxy measures are good alternatives when the actual data are unavailable or too costly to collect.

Ideal Time for Behavior Change

Level 3 Application measures the extent to which people are applying what they know on a routine basis or that behavior is changing. The time at which routine application occurs depends on what the program or process is offering in terms of knowledge, skill, and information. For example, if you are evaluating a program that offers specific job skills required for individuals to do their job, routine behavior change will occur soon after the program. If, however, you are evaluating a leadership development program, in which there are a variety of participants and not all of those participant have the opportunity to immediately apply what they have learned, then the timing at which behavior change will occur may be delayed until they do have an opportunity.

For many of the case studies that we have published through ATD, Level 3 data collection occurs at approximately three months after the program. This three-month timeframe represents the typical time at which most programs should lead to behavior change. It is also a point in time when stakeholders want to see the data. But this is not intended to suggest that all Level 3 data collection occurs at the three-month mark. You must look at the program; the knowledge, skill, and information being deployed through the program; and the opportunity for participants to apply what they learn on a routine basis.

Ideal Time for Impact

Just like behavior change, the ideal time for impact depends on a variety of issues. One issue is the type of data with which you are working. For some data you can see quick impact and capture that impact at any point in time. This is true when working with data that are in a system and tracked routinely. Other measures take time to respond to behavior change. For example, if you are implementing a new purchasing process, the purchasing agents may be applying what they learn on a routine basis soon after the program. But, because of the nature of contracting, you will not see the reduction in cost for six months because it takes that long for the contracts to close. Many times impact data are captured at the same time as the application data. This again gets back to the nature of the measure, but it is also driven by the desires of stakeholders.

Convenience of Data Collection

Another consideration when determining timing for data collection is convenience. How easily can you access the data? Convenience depends on many of the issues just covered, including availability of data, ideal time for behavior change, and ideal time for impact. But it also considers convenience from the perspective of those who are providing the data. Your data collection process should be easy for everyone. If it is not convenient, the likelihood of obtaining the information you need, from the number of people you need it, is inhibited. Making it easy for people to respond to your data collection process is important if you want to capture enough usable data.

If you want people to do something, make it easy; if you want them to stop doing something, make it hard.

Constraints on Data Collection

A final consideration when determining timing for data collection is constraints. Constraints are those roadblocks to the ideal timeframe. For example, in conducting an evaluation of a tax fundamentals course, the ideal timeframe for collecting the information was at multiple, three-month intervals occurring over the period of one year following the program. Unfortunately, one collection interval occurred the second week in April, which is peak tax season in the United States. Because it is not possible to get data at that point in time, the collection interval had to be adjusted around that particular constraint.

Timing for data collection is important. Collecting data at the right time, from the right people, and using the right approach will ensure you obtain the most credible and reliable data possible. However, you cannot always collect data at what you believe is the right time, from the right people, and using the right technique. Data collection is a balance between accuracy and cost, cost and benefit, art and science, and ideal and real.

Detour: You Need More Than One Method

In the ideal world you have clear, specific measures that you can transfer from objectives to a data collection instrument, such as a follow-up questionnaire. From there, you administer the questionnaire, analyze data, report results, and take action based on results. But in the real world, a single data collection instrument for a given level of evaluation may not be enough to get the job done. For example, you may lack specific measures and have to use multiple data collection techniques to ensure you take the right measures so the results are meaningful. This is where a mixed method approach to data collection can help you.

Mixed methods research (MMR) design is frequently used in organization research and is growing in use when evaluating programs of all types. This requires using both a qualitative and quantitative approach to collect data, and integrating results of one approach with the other to give credible, reliable output. Two commonly used techniques are sequential exploratory design and sequential explanatory design.

Sequential Exploratory Design

While it sounds difficult, in practice, sequential exploratory design is likely a technique you already use. It is an excellent approach to creating new questionnaires. The process requires that you capture initial data using a qualitative approach, use the findings to inform the quantitative data collection portion of your study, then interpret the findings.

For example, sequential exploratory design was used with a project in which a large pharmaceutical organization wanted to demonstrate the value of a leadership development program. When the program owners initially set out to implement this program, they focused on specific behaviors in their leadership competency model. Unfortunately, there was a misalignment between the desires of the program owners and those of the decision makers—the decision makers wanted to see the impact of the program and ultimately the ROI.

The participants represented a cross-functional group of team leaders, supervisors, and managers. The target audience for the evaluation totaled just over 100 participants. Given the size of the audience and the importance of the program, the team knew a questionnaire would be a primary mode of data collection. Because the participant group was cross-functional, any number of business measures could have improved as a result of the program. Rather than make their own inference as to what business measures should be included on the questionnaire, the project team held three focus groups of eight people, which was enough to reflect a perspective from each functional area represented in the program. Focus group members described how they used the leadership skills, as well as the impact the skills had on business measures. They helped the project team define measures that would likely represent those important to the entire participant group. After analyzing their findings from the focus group, the project team designed a more valid questionnaire than had they simply assumed the measures on their own. Figure 4-5 depicts the process.

FIGURE 4-5. SEQUENTIAL EXPLORATORY MMR DESIGN

This simple approach of holding focus groups (qualitative) to define valid measures to include on the questionnaire (quantitative) resulted in more meaningful measures and more credible results.

Sequential Explanatory Design

Another technique frequently employed is sequential explanatory design. As with the previous approach, it sounds complicated, but is likely an approach with which you are already familiar. In this approach, the quantitative data collection comes first. The results are then followed up by a qualitative approach that seeks explanation for the quantitative findings.

For example, a senior leader within the U.S. Department of Defense applied this design when assessing the department’s leadership and management succession planning process. The quantitative portion of the project, a questionnaire, addressed issues such as how leadership and management practices contribute to agency performance: employee satisfaction with leadership policies and practices, work environment, rewards and recognition, opportunities for growth, and opportunities for achieving the organization’s mission. This questionnaire was then followed up with a set of interview questions to further assess how the organization ensures alignment of leadership and management development with strategic plans for investment in skill development, and how the organization capitalizes on those strategic investments. Results of the questionnaire led to specific interview questions, which then gave greater insight into the results (Jeffries 2011).

The process flow for the sequential explanatory design is shown in Figure 4-6.

FIGURE 4-6. SEQUENTIAL EXPLANATORY MMR DESIGN

The key to using this approach is to ensure you have a clear understanding of the results derived from the quantitative approach and let those results inform the questions you ask in the qualitative follow-up.

There are a variety of ways you can mix data collection approaches at any level of evaluation to get the most robust information possible. However, you must keep in mind the costs versus benefits of doing so. Using a mixed method approach costs more, so you must consider the value of the information coming from the approach. Sometimes, while multiple data collection techniques may give you more robust data, one technique may provide you with good-enough data to serve the purpose of the evaluation.

Guideposts

When collecting data, consider the following guidelines:

Consider all of your options when selecting data collection methods. While it is easy for us to default to electronic surveys and questionnaires as the data collection technique of choice, it is important to consider all your options.

Consider multiple approaches to collecting data. Multiple methods of data collection can give you multiple views of the same issue. Mixed methods research approaches can help ensure you are asking the right questions if your measures are unclear and can provide additional insight into data at hand.

When planning for higher levels of evaluation, don’t be so comprehensive at the lower levels. It is important to remember that while the data you collect at the lower levels are important, when stakeholders want results at higher levels of evaluation, the investment should be allocated to those levels.

Go to the most credible source of data. The importance of identifying the most credible sources of data cannot be reinforced enough. Keep in mind that when collecting Level 3 Application data, your participants’ perspectives are critical. You can always supplement responses from one source with those from another, if the value of their input is worth the cost of collecting it.

Build it in. When possible, build data collection into the program. By doing so, you make data collection a seamless part of the program while avoiding the cost of data collection.

Point of Interest: Technology and Data Collection

There are a variety of technologies available to help with the data collection process. These technologies offer organizations the opportunity to collect data on more routine bases than in the past. While some organizations are implementing the new and interesting technologies that, to some people, may seem a bit creepy, others are having great success with the tried, true, and traditional.

New and Interesting

Sensors and monitoring software are much like Big Brother these days, particularly in organizations leading the way for big data and human capital analytics. Industries such as the casino industry, call centers, airline industry, and freight companies have long used sensors and monitoring technology to keep track of and monitor the behavior of their people. One example is Epicenter, a new hi-tech office block in Sweden. Here they are embedding RFID (radio frequency identification) chips under the skin to give employees access to doors, photocopiers, and other services (Celian-Jones 2015). Want to monitor behavior change? This is one way to do it.

Other companies are taking advantage of technologies to help their employees get off email, get out of meetings, and get to work. VoloMetrix is a people analytics company that offers an interesting tool that analyzes email headers and online calendars (the tool does not necessarily read your email). Mining these data help organizations identify the unproductive time of employees, as well as identify the culprits causing the problems. According to the Wall Street Journal (Shellenbarger 2014), Joan Motsinger, vice president of global operations strategy for Seagate Technology, worked with VoloMetrix to study how the company’s employee teams use time and work together. Analysis of 7,600 Seagate employees showed that some work groups were devoting more than 20 hours a week to meetings. It also found that one consulting firm was generating nearly 3,700 emails and draining 8,000 work hours annually from 228 Seagate employees.

FIGURE 4-7. COMMON EMAIL MISTAKES

Chantrelle Nielsen, head of customer solutions at VoloMetrix, reported in the article that in studying more than 25 companies, they found executives who consume more than 400 hours a week of their colleagues’ time, “the equivalent of 10 people working full-time every week just to read one manager’s email and attend his or her meetings.” Imagine having access to these data when evaluating the impact of an effective meeting skills program.

Of course, the Fitbit (and other wearable fitness tracking devices) is well on the way to helping organizations help their employees become healthy and fit by monitoring steps, sleep, stairs, and other indicators of good health. Since 2007, Fitbit has sold roughly 20.5 million of its fitness-tracking devices, with more than half sold in 2014. This small tracking device can help organizations save money by reducing the amount of sick leave, reducing insurance premiums, and keeping employees engaged. For example, Appirio is a global services company that uses cloud technology and a community of technical experts to partner with companies including Salesforce.com, Google, Workday, and CornerstoneOnDemand by providing these organizations with new ways to solve problems. As part of their corporate wellness program they call CloudFit, Appirio supplied 400 employees with Fitbits. While the Fitbit was only one element to the CloudFit program, Appirio was able to convince its insurance company to reduce premiums by 5 percent, saving the organization $280,000 (Koh 2015).

Collecting data using tracking technology and devices may seem intrusive to some individuals. But for others, they see it as an opportunity to improve performance. For example, Cornerstone OnDemands’s 2014 State of the Workplace Productivity report indicates that 80 percent of survey respondents would be motivated to use company-provided wearable technology that tracks health and wellness and provides their employers with data. Out of those who use wearable technology, 71 percent think it’s helped them be more productive.

Technology is creating new opportunities to assess needs and evaluate results of learning and development programs in ways we could only imagine a few years ago. Using sensors and monitoring devices helps keep the cost of data collection down and increases the reliability of the data. As organizations build capacity in human capital analytics, access to such technologies will become much more available to the learning and development function. But for those who do not have the resources to invest in monitoring devices, or whose culture won’t support it, the tried and true platforms that support data collection are ever present and just as useful.

Tried and True

There is a plethora of technologies that support data collection—from survey tools to qualitative data analysis tools to sample size calculators. While there are many tools available, here are a few with which you might want to become familiar:

Phillips Analytics by ROINavigator is a robust analytics tool that allows you to plan your program evaluation from beginning to end. The tool uses the five-level framework described in this book as the basis for clarifying stakeholder needs and developing your objectives. The output of the planning is a document similar to that described in chapter 3 of this book. Additionally, the tool allows you to develop a data collection instrument suitable for measuring the objectives of your program. One of the key elements of the system is the reporting tool. Reporting in Phillips Analytics is simple and flexible, to ensure you report the information to your stakeholders in a format that is meaningful to them.

Metrics-that-Matter by CEB has been on the market for several years. This system is one of the most robust in terms of collecting data and housing those data in such a way that users can benchmark against each other. The system is ideal for those organizations just getting started with measurement and evaluation because it provides a cookie-cutter approach. More advanced users can customize the tool.

PTG International’s TEMPO is the new kid on the block for some organizations. It has been on the market for some time, but its use is most recognized by government organizations. However, PTG also has applicability in the private sector. This tool follows some of the similar concepts presented in this book, and is useful for any type of learning and development program. It is a survey tool with a question pool and customizable reporting and dashboard tools. Its team will help as you integrate it into your processes.

Qualtrics is one of the most robust survey tools on the market. Used extensively in academic research, Qualtrics has also seen an uptick in its use to evaluate learning and development programs. The tool will allow you to ask any type of survey question you need to ask, given the objectives you are trying to measure. Additionally, it has an excellent language translator and will allow you to insert elements that make your survey instrument more interactive than the classic, static, online survey.

SurveyMonkey is far from a new tool. Originally known for its free surveys, SurveyMonkey has grown to become an excellent platform for those who want more from their online survey tool without breaking the bank. Their enhancements have made SurveyMonkey one of the top survey platforms available to program evaluators.

RaoSoft is a web-based survey platform that has been around for many years. We use it primarily for their sample size calculator, but it offers much more. It includes not only calculators, but a variety of other tools that can support you as you are planning your evaluation, developing your instrument, and even analyzing the data. It is worth a look.

Selecting the Right Technology

ROI Institute’s director of ROI Implementation suggests you follow a method based on Kepner-Tregoe’s problem-solving, decision-making process when selecting the best technology to support your evaluation practice. First answer the following questions regarding the information you will obtain from the systems:

1.   What do you want to know?

2.   How are you going to use it?

3.   Who are the intended users of the information?

Once you ask these questions and understand the answers, the next step is to identify and prioritize the intended uses of the information. These answers and actions will form the basis for developing the criteria to make a decision about the right technology for your organization. Examples of criteria might include keeping costs to a minimum, easy reporting, good technical help desk, access to raw data, advanced analytics, and easy implementation. Establishing the criteria is the beginning of your decision-making process. You will use these criteria as you assess the different technologies available to you.

Once you have identified the technologies of interest, classify your criteria into the musts and wants. Which of the elements must you have, and which of the elements do you want to have? A criterion is a must if it is a mandatory requirement, if it is measureable with a limit, and if it is realistic. For example, the criterion “keep costs to a minimum” cannot be a must because it is not measureable with a limit—the word minimum is too vague. A criterion such as “access to raw data” is measurable with a limit; either you can download the raw data or you cannot. Once you have classified your criteria into must and wants, the next step is to weigh the wants.

Wants do not all have the same importance, so you have to attach relative numerical weight to each one. Determine your most important wants and give them a weight of 10 (you can have more than one 10). The other wants are weighted relative to the 10s. For example, if another want is half as important as a 10, you will weigh it a 5. Table 4-4 provides an example of weighted objectives. In the left-hand column you see the weight given to each criterion. The criterion with an M next to it represents a must. With your weighted criterion in hand, all you have left to do is identify the different technologies and rate them against these criteria.

TABLE 4-4. WEIGHTED CRITERIA

Weight

Criteria

7

Keep costs to a minimum

6

Easy reporting

7

Good technical support

10

Customizable questionnaires

5

Robust analytics

M

Access to raw data

6

Easy implementation

Once you have identified the technology, there is one last issue to address: Identify any risk associated with that particular technology. Risks are those things that could go wrong in the short or long term if you implement that particular technology. There could be some disadvantage to that particular technology, or the culture will not support the implementation of that technology. Once you have thought through the risk associated with the technology and decide you can live with those risks, you can make your final decision and invest in the tool that will make data collection easy, cost effective, and successful.

Refuel and Recharge

This stretch of highway may have been long, but hopefully you gained some new insights into how you can collect data for your evaluation projects. Now that you have been introduced to the concepts, start completing the data collection plan from chapter 3 using concepts from this chapter.

Travel Guides

Aldrich, C. 2009. Learning Online With Games, Simulators, and Virtual Worlds: Strategies for Online Instructions. San Francisco: Jossey-Bass.

Byham, W.C. 2004. “The Assessment Center Method and Methodology: New Applications and Technologies.” www.ddiworld.com/DDIWorld/media/white-papers/AssessmentCenterMethods_mg_ddi.pdf?ext=.pdf

Cellan-Jones, R. 2015. “Office Puts Chips Under Staff’s Skin.” BBC News, January 29. www.bbc.com/news/technology-31042477.

Cresswell, J.W., and V.L.P. Clark. 2011. Designing and Conducting Mixed Methods Research. Thousand Oaks, CA: Sage Publications.

Ellet, W. 2007. The Case Study Handbook: How to Read, Discuss, and Write Persuasively About Cases. Boston: Harvard Business Press.

Jeffries, R.A. 2011. Investments in Leadership and Management Succession Planning at a Department of Defense Organization in the Southeastern United States: A Review of Strategic Implications. Doctoral dissertation. Human Capital Development: The University of Southern Mississippi.

Koh, Y. 2015. “Fitbit Files to Go Public.” Wall Street Journal, May 8. www.wsj.com/articles/fitbit-files-to-go-public-1431029109?KEYWORDS=fitbit&cb=logged0.11325690126977861.

Phillips, P.P., and C.A. Starwaski. 2008. Data Collection: Planning For and Collecting All Types of Data. San Francisco: Pfeiffer.

Phillips, P.P., and J.P. Phillips. 2013. Survey Basics. Alexandria, VA: ASTD Press.

Povah, N., and G.C. Thorton. 2011 Assessment Centres and Global Talent Management. Surrey, UK: Gower.

Shellenbarger, S. 2014. “Stop Wasting Everyone’s Time: Meetings and Emails Kill Hours, But You Can Identify the Worst Offenders.” Wall Street Journal, December 2. www.wsj.com/articles/how-to-stop-wasting-colleagues-time-1417562658.

Shrock, S., and W. Coscarelli. 2000. Criterion-Reference Test Development, 2nd ed. Silver Springs, MD: International Society of Performance Improvement. www.ispi.org.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.219.217