|
Chapter 5
|
In God we trust, all others bring data.
W. Edwards Deming
When you have completed this chapter, you will be able to:
• Define data, information, knowledge, and wisdom
• Categorize quantitative and qualitative data attributes
• Write effective definitions for data attributes
• Name data attributes
• Specify metrics and KPIs.
In Chapter 4, you became aware of the technical architecture of data warehousing and business intelligence. You gained perspective on the big picture of data warehousing and business intelligence.
In this chapter, you will add to your perspective by understanding the essential ingredient of data warehousing and business intelligence, the data itself. Data is the raw material through which we gain understanding. It is a critical element in data modeling, statistics, and data mining. Figure 05-01 shows data as the foundation of the pyramid that leads to wisdom and informed action.
Figure 05-01: Data in Perspective
Later in this book, you will learn more about gaining wisdom for making effective decisions. For now, keep in mind the definitions in Table 05-01 quoted from the DAMA Dictionary of Data Management (DAMA 2008).
Table 05-01: Data, Information, Knowledge, and Wisdom Defined (DAMA 2008)
Term |
Definition |
Data |
Facts represented as text, numbers, graphics, images, sound or video. Data is the raw material used to create information. |
Information |
Data in context. The interpretation of data based on its context, including the 1) the business meaning of data elements and related terms, 2) the format in which the data is presented, 3) the timeframe represented by the data, and 4) the relevance of the data to a given usage. |
Knowledge |
1. Understanding; cognizance; awareness of a situation. 2. Expertise; familiarity gained through experience and association. 3. Understanding the significance of information; information in perspective, integrated into a viewpoint based on the recognition of patterns (such as trends and causes) based on other information and experience. |
Wisdom |
1. Accumulated knowledge; deep understanding, keen discernment and a capacity for sound judgment. 2. Knowledge in context; knowledge applied in the course of actions. |
Raw data is a set of data points without the additional context that would result in information. For example, a set of raw data of weights in pounds for five year old children might look like this:
(40, 52, 41, 43, 38, 42, 46, 39)
A set of raw data can be used as input to analysis, which can be put to good use. See Chapter 15 for insights into the use of statistics applied to analyzing raw data. Analysis and understanding are typically obtained by bringing together multiple data points and putting them in context. For example, you may want to understand the roles of weight influencing factors such as the school district, child height, gender, parental education level, and birthdate. An expanded set of data points, Table 05-02, shows an example.
Table 05-02: Data Points in Context
# |
School District |
Birth Date |
Gender |
Weight |
Height |
Parent Education Level |
1 |
196 |
1/2/2006 |
F |
40 |
44 |
16 |
2 |
196 |
4/3/2006 |
M |
52 |
54 |
12 |
3 |
196 |
10/12/2005 |
F |
41 |
41 |
20 |
4 |
201 |
12/31/2005 |
M |
43 |
42 |
8 |
5 |
202 |
3/17/2006 |
F |
38 |
40 |
12 |
6 |
196 |
5/12/2006 |
M |
42 |
43 |
18 |
7 |
201 |
11/21/2005 |
M |
46 |
51 |
12 |
8 |
202 |
5/3/2006 |
F |
39 |
40 |
18 |
There are a number of words with the same or similar meaning, which apply to individual pieces of data. Some of the more frequently used words are listed here:
• Attribute
• Property
• Data Element
• Field
• Column
• Cell
• Data Point.
For now, this book uses the word “attribute” when describing an individual piece of data. An attribute is a characteristic of an entity (person, place, thing, or idea) or a relationship between entities. Attributes characterize entities and relationships by containing data values. Some examples of attributes are:
• Customer Order Number
• Account Balance Amount
• Marital Status Code
• Product Description
• Skill Level Code.
Attributes can be further described by drilling into their definition and usage. Attributes can have names, data types, domains, initial values, rules, and definitions. Each of these characteristics of an attribute is described in Table 05-03. Key Performance Indicators (KPIs) are attributes of special interest and will be described in more depth later in this chapter.
Table 05-03: Data Characteristics
Characteristic |
Description |
Name |
Each attribute has a name, such as Account Balance Amount. An attribute name is a string that identifies and describes the attribute. |
Datatype |
The datatype is the format of the data stored in the attribute. It specifies whether the value must be a string, a number, or a date (for example). In addition, it specifies the size of the attribute. Datatype is also known as the data format. Examples of datatype are decimal(12,4), character(50), and date. |
Domain |
A domain such as Currency Amounts is a categorization of attributes by the range of values that can be stored in the attribute. |
Initial Value |
An initial value, such as 0.0000, is the default value that an attribute is assigned when it is first created. |
Rules |
Rules are constraints that limit the values that an attribute can contain. For example, “the attribute must be greater than or equal to 0.0000”. The use of rules helps to improve the quality of our data. |
Definition |
An attribute definition is a narrative that conveys or describes the meaning of the attribute. For example, “Account balance amount is a measure of the monetary value of a financial account such as a bank account or an investment account.” |
Qualitative attributes are descriptive or categorical, rather than numeric. Mathematical operations like addition and multiplication do not apply to qualitative attributes. Examples are:
• Gender
• City name
• Nationality
• Brand preference.
Qualitative attributes can be further classified as nominal and ordinal. Nominal attributes are descriptors whose values imply no order, while ordinal attributes have order. Examples of nominal attributes are credit card number, brand preference, and state code. In contrast, a good example of ordinal attributes is the Likert scale that is often used in surveys: 1=Strongly disagree; 2=Disagree; 3=Neutral; 4=Agree; 5=Strongly agree.
Quantitative attributes are numeric. Mathematical operations like subtraction and numeric comparison may be applied to quantitative attributes. Examples are:
• Inventory Count
• Unit Price
• Temperature
• Failure Rate.
Quantitative attributes can also be further classified into categories. For example:
• Interval – The interval category has no “true” zero, so division and multiplication do not apply. Examples of interval attributes include: time of day, credit scores, and temperatures measured in Celsius or Fahrenheit.
• Ratio – For the ratio category, there is a “true” zero, so division and multiplication do apply. Examples of ratio attributes include: time durations, counts, weights, and temperatures measured in Kelvin.
Attributes in each category have different properties and operations that can be performed with them. Distinctiveness specifies attributes which can be compared as equal or not equal. Order implies attributes are put in sequence – compared for less than or greater than. Additiveness specifies that attributes can be added and subtracted. And finally, multiplicativeness asserts attributes can be multiplied and divided. Table 05-04 shows how these properties and operations apply to nominal, ordinal, interval, and ratio type attributes.
Table 05-04: Qualitative and Quantitative Attribute Properties
Property |
Operation |
Nominal |
Ordinal |
Interval |
Ratio |
Distinctiveness |
= ≠ |
X |
X |
X |
X |
Order |
LT GT |
|
X |
X |
X |
Additiveness |
+ - |
|
|
X |
X |
Multiplicativeness |
* / |
|
|
|
X |
Attributes with descriptive, unique names greatly improve communication and make data models and associated databases much easier to understand and use. The goals are to:
• Make names understandable
• Make names consistent – the same name for the same thing (avoid synonyms)
• Make names distinct – different names for different things (avoid homonyms).
A proven method of creating meaningful and distinctive names is creating qualified names from modifier words, prime words, and class words, in that order. This pattern has proven very effective.
• Prime Word – The major noun that is the subject of a data attribute, such as customer, employee, asset, or product.
• Modifier Word – An optional adjective that modifies the subject of a data attribute, such as current, prior, or preferred.
• Class Word – A categorization of a data attribute such as: code, amount, or date.
The examples shown in Table 05-05 should help you to better understand the benefits of using qualified names with prime, modifier, and class word components.
Table 05-05: Attribute Naming Examples
Qualified Name (Do this) |
Unqualified Name (Avoid this) |
Last Received Amount |
Amount |
Order Issued Indicator |
Issued |
Open Order Count |
Open Orders |
Purchase Order Number |
Order Number |
There tends to be a small, fixed number of class words for each enterprise. A typical list of class words, based on my own experience, is provided in Table 05-06. Using a standard abbreviation for each class word will make attribute names more consistent.
Table 05-06: Class Words
Class Word |
Abbreviation |
Description |
---|---|---|
address |
addr |
A descriptor of the location of an entity, such as a street address or email address. |
amount |
amt |
A monetary measure of something, such as a premium amount. |
code |
code |
A word or abbreviation that classifies and/or describes a set of specific values. |
count |
cnt |
The number of something obtained by adding one for each occurrence of an item such as inventory count. |
date |
date |
A day identified in a calendar, typically including year, month, and day of month, such as 2007/02/14 (yyyy/mm/dd). |
description |
desc |
A narrative that provides an explanation about or definition of something. |
identifier |
id |
A reference that identifies a specific instance of an entity. It is often a logical or natural key. |
indicator |
ind |
A two valued descriptor such as: true/false and yes/no. |
name |
nme |
One or more words that describe what an entity is called, such as first name, product name, or city name. |
number |
nbr |
A string that identifies or describes an entity. Values are often determined by an external organization such as social security number or driver's license number. It is often but not necessarily numeric. For example, a license plate number could contain letters (DLZ 879). |
object identifier |
oid |
A system assigned identifier for an object. This type of identifier is used internally for efficient access and storage. It is often a physical key. It has meaning only within systems and is not used by/or known to the business. |
percent |
pct |
A number described in terms of parts per hundred; rates are excluded. |
quantity |
qty |
A numeric measure of something other than money, such as item order quantity. |
rate |
rate |
A quantity of something measured in terms of a fixed quantity of something else Employment rate and error rate are examples. |
text |
text |
A block of textual information other than a name or description. Customer comment text and correction procedure text are examples. |
time |
time |
A point in time on a specific day. |
KPIs are a type of data element that is of particular interest in the world of business intelligence and analytics. A KPI helps an organization define and measure their performance by indicating how well they are progressing towards a specific goal. KPIs have characteristics in common with other data elements including:
• Name
• Datatype
• Domain
• Initial Value
• Rules
• Definition.
Additional characteristics of metrics and KPIs are described in Table 05-07.
Table 05-07: KPI and Metric Characteristics
Characteristic |
Description |
---|---|
Responsibility Level |
Categorize the KPI as one of the following: • Operational – detailed information needed by supervisors to monitor operations and support immediate action, such as current customer orders, today's inspection results, and shipment backlogs. The data tends to be atomic and immediately actionable. • Tactical – department information needed by managers to improve processes through midterm efforts. It includes data such as weekly worker productivity, process breakdown trends, and unscheduled maintenance costs. • Strategic – enterprise information needed by executives to carry out long term plans such as improvement in market share or customer engagement. Strategic KPIs are often expressed as a percentage such as plant uptime percent, on time shipment percent, and asset availability percent. |
Responsible Business Roles |
Identifies the job roles that are measured by the KPI. This includes identification of drilldown levels of the KPI. |
Targets and Priority |
Specifies target levels of the metric for each business role, as well as its priority for each role. Rules that specify notifications and alerts for when a KPI deviates from the target should be specified. |
Align to Enterprise Scorecard Categories |
Relates the KPI to strategic categories such as those proposed in the Balanced Scorecard (BSC): • Customer Satisfaction • Financial Performance • Organizational Learning • Production and Innovation |
Frequency of Review |
Specifies how often the KPI should be reviewed and analyzed. Operational KPIs should be reviewed multiple times during the day, tactical KPIs should be reviewed daily or weekly, and strategic KPIs monthly or quarterly. |
Data Composition and Sources |
Specifies the data that makes up the KPI and from where the data should be obtained. Includes sources of drilldown details. |
Presentation Type |
Specifies the appropriate presentation type for the KPI, such as: • Bar Graphs • Broken Line Graphs • Pie Charts • Histograms • Gauges • Reports • Trees. |
Key Points • Data attributes are isolated facts that are stored in the form of numbers, letters, or images. • There is a hierarchy where data, information and knowledge build toward wisdom. • There are many words for pieces of data, including attribute, property, data element, field, column, cell, and data point. • Data has further characteristics that provide understanding and enable its management – name, data type, domain, initial value, rules, and definition. • Qualitative attributes are descriptive or categorical, rather than numeric, and are classified as nominal and ordinal. • Nominal attributes are descriptive without sequence, while ordinal attributes have order. • Quantitative attributes are numeric, subject to mathematical operations, and are classified as interval and ratio. • Assigning consistent, meaningful names to attributes makes attributes easier to understand and use. A proven pattern of attribute naming builds the attribute names from a combination of modifier words, prime word, and class word. |
Build your know-how in the area of data element basics using these resources.
Visit a website! |
Data Management International (DAMA) is a global association of professionals who are dedicated to the practice of data management: http://www.dama.org The KPI Library is a great place to learn about business performance, bench marking, and KPIs. http://www.kpilibrary.com/ |
Read about it! |
This book addresses data and its use in data mining and other analytics. Pyle, Dorian. Data Preparation for Data Mining. Morgan Kaufmann, 1999. |
3.142.69.168