CHAPTER
13
Star Schema Designs for Human Resources

As in the previous chapter, the design of another proposed departmental data warehouse will be discussed. The EEOC (Equal Employment Opportunity Commission) Compliance Division of the Human Resources Department of ABC Corporation has been hearing how happy the eastern regional sales manager is with the data mart produced for that group. The division has heard that the data mart has drastically reduced the time and energy required to get reports on strategically important data and that with some appropriate DSS tools, it is fairly easy to analyze trends effectively.

Not wanting to be behind the times and having a real need, the EEOC division manager asks the IS department for assistance in producing a small data warehouse for the division. Elated that the data mart concept is catching on, the IS manager is more than willing to assign some people to the task. After a series of interviews, the IS team determines what questions the EEOC team is most interested in answering and develops a couple of star schemas for the data marts to specifically address those questions. This chapter will examine the structure for these data marts.

As stated before, the purpose of the models in this chapter is to provide examples that demonstrate the concepts of building a departmental data warehouse. This model may be tailored to fit a specific enterprise's needs; however, it should be understood that the design of this departmental data warehouse is highly dependent on the enterprise implementing it.

This model will, however, allow the DSS analyst in the EEOC division to answer questions such as the following:

  • How many programmer/analysts are African-American or Hispanic?
  • What is their annual salary in comparison to that of those who are white?
  • What is the average salary for female versus male workers?
  • Is there any group that has a higher or lower rate of salary increase compared to others?
  • How do annual salaries compare with respect to years of experience or years employed by the company?
  • How many minority workers are there in total? What percentage does this group represent?
  • How many employees of different statuses are there and what are their characteristics? For, instance how many part-time employees are there, and what is their average annual pay?
  • How many employees for what types of positions have been employed during what lengths of service? How many employees have been employed more than 10 years? More than 20 years?

The specific schemas presented in the following pages include the following:

  • Human resources star schema
  • Human resources star schema at a higher level of granularization

Human Resources Star Schema

Given the results of the interviews with the EEOC staff, the IS department determined that initially it would be very useful to be able to look at human resource measures such as number of employees, the average age, the average years of experience, the average years employed, and the average annual pay for all employees, across all departments, and to group them by position, EEOC category (white, Hispanic, African-American, Asian, Native American, etc.), status, pay grade, length of service, and gender. Discussions with the staff indicated that daily information would not be needed but that a picture of the organization at month end would allow for observation of some meaningful trends.

The star schema depicted in Figure 13.1 was developed—based on the data in the enterprise data warehouse—to address these needs. The central fact table, HUMAN_RESOURCES_FACT, contains not only the keys to the various dimensions, but also the important measures to be analyzed. The dimensions for this star schema include ORGANIZATIONS, POSITION_TYPES, GENDERS, LENGTH_OF_SERVICES, STATUSES, PAY GRADES, EEOC_TYPES, and TIME_BY_MONTH. As indicated, the data for this schema will likely be loaded at the end of each month.

Figure 13.1 Employee pay history.

13.1

Human Resource Fact Table

The measures for this fact table are as follows:

Number_of_employees. The count of employees meeting the specified dimension values.

Average_age. The mean age for employees meeting the specified dimension values.

Average_years_experience. This indicates the total number of years of work experience both in and outside ABC Corporation. In other words, the mean number of years experience that the employee has working in the field.

Average_years_employed. The average number of years that the employee has been employed for the enterprise.

Average_annual_pay. This represents the annualized salary for a given employee as of the end of the month in question. In other words, at the point in time when the snapshot was taken, this is the amount of money that an employee would make over the period of one year if the current pay rate remained the same.

The average_years_employed is a calculated measure based on the from_date and thru_date of all the positions an employee has held for the company as indicated in the enterprise warehouse. Even though these columns have changing values for employees, the fact that month is part of the key provides a history of what these values were at various points in time (in this case, at each month end).

As measures, the analysts can find out such things as the average age of employees in the company or the total years of experience within a position type in a department (this is calculated by dividing the average_years_experience by the number_of_employees for the selected dimensions). Questions such as these can be asked:

  • What is the average salary for those between the ages of 55 and 60 compared to those between 25 and 30?
  • What is the salary for those with over 15 years of experience (or employment) within each position type or each department?

The notation in chapter 12 for star schemas explicitly showed the primary key to each fact table. Since the key to the remaining fact tables in this book will always be a combination of the primary keys from each dimension, the key to each of the remaining fact tables will not be explicitly stated. The reader should know that the primary key to each fact table will be the combination of the primary keys of each dimension.

Organizations Dimension

Information about the internal organizations is taken from the warehouse table INTERNAL_ORG_ADDRESSES. Because no geographic information was required for this data mart, only the unique organization identifiers and names were selected to be moved from the enterprise warehouse. In addition, there are numerous org_type columns (i.e. level1_org_type, level2_org_type, and so on) that were also extracted to allow analysts to select the organizations to be analyzed based on a type such as department, division, or branch.

The structure of this table is somewhat different from that of the other tables seen so far. It has been denormalized to store the organizational hierarchy that is represented in the main warehouse by the org_type columns. The levell_org_type information represents the data associated directly with the lowest level HUMAN RESOURCE FACT measures records that were extracted. Level2_org_type is the name and type of the parent organization to the “level 1” organization, for example the division to which a department is within. Level3_org_type is the parent of “level 2,” and so on. The columns with name such as levell_name and level2_name function in a similar manner as was just explained.

The “level 2” and higher levels do not include an ID as that information is not needed. Each dimension record is defined by its organization_id, which is a “level 1” or lowest level unit. For instance, a dimension record may show that the “accounting department” is a “level 1” that is within a “level 2” finance division that is within a “level 3” eastern region that is within a “level 4” ABC Subsidiary that is part of “level 5” ABC Corporation. The inclusion of only five levels of structure was a decision based on the levels of organization found in ABC Corporation. This model could easily be adapted to other organizations by adding or subtracting levels as required.

Table 13.1 gives examples of the data that may be found in the organization dimension. It is shown only to “level 3” for illustration purposes. The key point is that each organization dimension represents the lowest level in an enterprise structure, allowing analysis to be summarized to any level desired. The flat structure of this table serves to enhance performance and simplify queries, rather than imposing a recursive structure on end users.

Table 13.1 Organization Dimension

13.1

Position Types Dimension

This information is extracted directly from the POSITIONS table in the enterprise warehouse which corresponds to the POSITION and POSITION TYPE entities in the logical data models. This dimension represents a unique list of all types of positions (position_type) represented along with the position_class

associated with that position. It was assumed that for any given position there was one and only one position type associated with it throughout time. Using this dimension, an analyst can select from HUMAN_RESOURCE_FACT by position_type or position_class (from the POSITION TYPE CLASSIFICATION entity), which is a higher-level position classification. This is similar in construct to the PRODUCTS dimension (discussed in the previous chapter), which includes both product_description and category_description columns (see Figure 12.1).

Genders Dimension

The enterprise may want to do analysis to see how many positions are occupied by males or females, how much they are getting paid on the average, and how much experience they have. Many human resource enterprises are required to maintain five categories of GENDERS descriptions: “male,” “female,” “male to female,” “female to male,” and “not provided.”

Length of Services Dimension

This dimension allows the analysis of various ranges regarding lengths of employment with the enterprise. For instance, how many employees have worked for the range 10-15 years. and what is their average annual pay? This LENGTH_OF_SERVICEs dimension may be used in conjunction with the average_years_employed measure. The analyst could determine the average_years_employed measure for employees in the length of service of 5-9 years, thus further refining their knowledge of employment history.

Statuses Dimension

The status provides information on number of employees, average salaries, or other metrics for each status in the organization. Possible statuses may be “part time,” “full_time,” “exempt,” “temporary,” and so on. This allows the enterprise to conduct analysis about different types of employees.

Pay Grades Dimension

This dimension allows for querying human resource measures by various pay grades and this dimension corresponds to the PAY GRADE entity from chapter 9, Figure 9.7. If more specific information is needed such as SALARY STEP, this could also be included as a level of this dimension.

This dimension can help analyze how much different types of human resource positions are paid. For instance, how many employees are in pay grade 5, and what is their average annual pay?

EEOC Types Dimension

This dimension is built by gathering the unique eeoc_type_id and associated description values from the EMPLOYEES table in the enterprise warehouse. Based on the analysis requirements, it was critical that this information be included as a dimension because the project is sponsored by the EEOC (Equal Employment Opportunity Commission) Compliance Division of the Human Resources Department.

Time_By_Month Dimension

The TIME_BY_MONTH table used for this schema is the same table used in the sales analysis data mart discussed in Chapter 12. To review, it contains a month_id that uniquely represents a fiscal year and month combination. The data loaded in this table should represent all time periods covered by the data in the fact table. It provides the EEOC analysts with the ability to gather data not only by month, but by quarter and year as well. In this way they should be able to comply with their reporting requirements easily.

The month_id column acts as the link to the time dimension. This information then represents the human resource facts for each month the enterprise does business.

Human Resources Star Schema at a Higher Level of Granularization

As previously stated, another principle in data warehousing is that of accommodating different levels of granularity. As in the sales analysis model, interviews with EEOC compliance staff members did indeed indicate that they needed some common averages for reporting to the federal government. They needed to evaluate average annual salaries based on the position type, EEOC category, and gender, across the organization. To accommodate this need, the IS department developed the higher-level schema based around the HUMAN_RESOURCES_SUMMARY_FACT table (see Figure 13.2).

Figure 13.2 Human resources star schema—higher level of granularity.

13.2

How is this table loaded? It could be loaded from the enterprise warehouse or from the HUMAN_RESOURCE_FACT table. Because these two schemas will be used by the same division, it may be most efficient to load the summary data from the HUMAN_RESOURCE_FACT table after completing the monthly load process (this saves having to scan the entire warehouse—data in the details table is essentially preselected). If this is the case, then the table would be loaded by selecting the detail data for a month, then selecting each of the measures from the previous star schema. Next, those sums are summed and allocated across a broader range using the same organization_id, position_type_id, eeoc_type_id, and gender_id. The calculations are then loaded into the columns number_of_employees, average_age, average_years_experience, average_years_employed, and average_annual_pay. (Note that in most relational DBMSs, there will likely be a SQL function that calculates averages easily.)

With this data in hand, the EEOC division at ABC Corporation can easily answer such questions as these:

  • What is the average salary for Hispanic females in data modeling positions compared to that of white males?
  • How do average age and average salary compare between female supervisors and male supervisors?
  • Is there any obvious trend comparing years of experience, salary, and EEOC category within a particular division?

WHY CREATE ANOTHER SUMMARIZED VIEW OF THIS HUMAN RESOURCE INFORMATION?

Any of the questions that can be answered by the schema in Figure 13.2 can also be answered in the schema of Figure 13.1. With this in mind, why would there be a need to create a less granular schema such as the one presented in Figure 13.2? The schema in Figure 13.2 is optimized for performance and will result in much quicker queries than the last schema. There may be many other variations of higher-level summarizations, which may be used by specific departments that need only certain data. The other reason for a more summarized star schema, such as this one, is that some personnel may be allowed to view only certain human resources information, and one method for providing data security to different parts of the enterprise is by providing different views of overlapping data while restricting sensitive data to certain groups of individuals.

Summary

This chapter contained details of the design of two sample star schemas built to support analysis of human resources data. The resulting design demonstrated some additional denormalization and several levels of summarization to assist in answering the questions revealed during end-user interviews.

As previously noted, what has been presented are several of the many possible schemas that could be derived from an enterprise data warehouse. The structure of a departmental model is in a large part determined by the questions it is designed to answer. It cannot be over-emphasized that end users of the data must be interviewed to determine their real business and information needs if a data warehouse strategy is going to be successful.

As illustrated, it is not uncommon for one department to wait to see how successful another department's venture is before it is willing to commit time and money to the effort. Once the methodology and technology have been proven, by producing not only data marts but happy and satisfied users, then others will soon want to join in that success.

Please refer to Appendix C for a listing of tables and columns for these mart designs. SQL scripts to build tables and columns derived from these data star schema designs can be found on the full-blown CD-ROM, which is licensed separately.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.235.79