Chapter 5
Data Attributes

In God we trust, all others bring data.

W. Edwards Deming

When you have completed this chapter, you will be able to:

  Define data, information, knowledge, and wisdom

  Categorize quantitative and qualitative data attributes

  Write effective definitions for data attributes

  Name data attributes

  Specify metrics and KPIs.

In Chapter 4, you became aware of the technical architecture of data warehousing and business intelligence. You gained perspective on the big picture of data warehousing and business intelligence.

In this chapter, you will add to your perspective by understanding the essential ingredient of data warehousing and business intelligence, the data itself. Data is the raw material through which we gain understanding. It is a critical element in data modeling, statistics, and data mining. Figure 05-01 shows data as the foundation of the pyramid that leads to wisdom and informed action.

Figure 05-01: Data in Perspective

Later in this book, you will learn more about gaining wisdom for making effective decisions. For now, keep in mind the definitions in Table 05-01 quoted from the DAMA Dictionary of Data Management (DAMA 2008).

Table 05-01: Data, Information, Knowledge, and Wisdom Defined (DAMA 2008)

Term

Definition

Data

Facts represented as text, numbers, graphics, images, sound or video. Data is the raw material used to create information.

Information

Data in context. The interpretation of data based on its context, including the 1) the business meaning of data elements and related terms, 2) the format in which the data is presented, 3) the timeframe represented by the data, and 4) the relevance of the data to a given usage.

Knowledge

1.       Understanding; cognizance; awareness of a situation.

2.       Expertise; familiarity gained through experience and association.

3.       Understanding the significance of information; information in perspective, integrated into a viewpoint based on the recognition of patterns (such as trends and causes) based on other information and experience.

Wisdom

1.       Accumulated knowledge; deep understanding, keen discernment and a capacity for sound judgment.

2.       Knowledge in context; knowledge applied in the course of actions.

 

Raw data is a set of data points without the additional context that would result in information. For example, a set of raw data of weights in pounds for five year old children might look like this:

(40, 52, 41, 43, 38, 42, 46, 39)

A set of raw data can be used as input to analysis, which can be put to good use. See Chapter 15 for insights into the use of statistics applied to analyzing raw data. Analysis and understanding are typically obtained by bringing together multiple data points and putting them in context. For example, you may want to understand the roles of weight influencing factors such as the school district, child height, gender, parental education level, and birthdate. An expanded set of data points, Table 05-02, shows an example.


Table 05-02: Data Points in Context

#

School District

Birth Date

Gender

Weight

Height

Parent Education Level

1

196

1/2/2006

F

40

44

16

2

196

4/3/2006

M

52

54

12

3

196

10/12/2005

F

41

41

20

4

201

12/31/2005

M

43

42

8

5

202

3/17/2006

F

38

40

12

6

196

5/12/2006

M

42

43

18

7

201

11/21/2005

M

46

51

12

8

202

5/3/2006

F

39

40

18

There are a number of words with the same or similar meaning, which apply to individual pieces of data. Some of the more frequently used words are listed here:

  Attribute

  Property

  Data Element

  Field

  Column

  Cell

  Data Point.

For now, this book uses the word attribute when describing an individual piece of data. An attribute is a characteristic of an entity (person, place, thing, or idea) or a relationship between entities. Attributes characterize entities and relationships by containing data values. Some examples of attributes are:

  Customer Order Number

  Account Balance Amount

  Marital Status Code

  Product Description

  Skill Level Code.

Attributes can be further described by drilling into their definition and usage. Attributes can have names, data types, domains, initial values, rules, and definitions. Each of these characteristics of an attribute is described in Table 05-03. Key Performance Indicators (KPIs) are attributes of special interest and will be described in more depth later in this chapter.

Table 05-03: Data Characteristics

Characteristic

Description

Name

Each attribute has a name, such as Account Balance Amount. An attribute name is a string that identifies and describes the attribute.

Datatype

The datatype is the format of the data stored in the attribute. It specifies whether the value must be a string, a number, or a date (for example). In addition, it specifies the size of the attribute. Datatype is also known as the data format. Examples of datatype are decimal(12,4), character(50), and date.

Domain

A domain such as Currency Amounts is a categorization of attributes by the range of values that can be stored in the attribute.

Initial Value

An initial value, such as 0.0000, is the default value that an attribute is assigned when it is first created.

Rules

Rules are constraints that limit the values that an attribute can contain. For example, the attribute must be greater than or equal to 0.0000. The use of rules helps to improve the quality of our data.

Definition

An attribute definition is a narrative that conveys or describes the meaning of the attribute. For example, Account balance amount is a measure of the monetary value of a financial account such as a bank account or an investment account.

Qualitative attributes are descriptive or categorical, rather than numeric. Mathematical operations like addition and multiplication do not apply to qualitative attributes. Examples are:

  Gender

  City name

  Nationality

  Brand preference.

Qualitative attributes can be further classified as nominal and ordinal. Nominal attributes are descriptors whose values imply no order, while ordinal attributes have order. Examples of nominal attributes are credit card number, brand preference, and state code. In contrast, a good example of ordinal attributes is the Likert scale that is often used in surveys: 1=Strongly disagree; 2=Disagree; 3=Neutral; 4=Agree; 5=Strongly agree.

Quantitative attributes are numeric. Mathematical operations like subtraction and numeric comparison may be applied to quantitative attributes. Examples are:

  Inventory Count

  Unit Price

  Temperature

  Failure Rate.

Quantitative attributes can also be further classified into categories. For example:

  IntervalThe interval category has no “true” zero, so division and multiplication do not apply. Examples of interval attributes include: time of day, credit scores, and temperatures measured in Celsius or Fahrenheit.

  RatioFor the ratio category, there is a true zero, so division and multiplication do apply. Examples of ratio attributes include: time durations, counts, weights, and temperatures measured in Kelvin.

Attributes in each category have different properties and operations that can be performed with them. Distinctiveness specifies attributes which can be compared as equal or not equal. Order implies attributes are put in sequence – compared for less than or greater than. Additiveness specifies that attributes can be added and subtracted. And finally, multiplicativeness asserts attributes can be multiplied and divided. Table 05-04 shows how these properties and operations apply to nominal, ordinal, interval, and ratio type attributes.

Table 05-04: Qualitative and Quantitative Attribute Properties

Property

Operation

Nominal

Ordinal

Interval

Ratio

Distinctiveness

=

X

X

X

X

Order

LT

GT

 

X

X

X

Additiveness

+ -

 

 

X

X

Multiplicativeness

* /

 

 

 

X

Attributes with descriptive, unique names greatly improve communication and make data models and associated databases much easier to understand and use. The goals are to:

  Make names understandable

  Make names consistent – the same name for the same thing (avoid synonyms)

  Make names distinct – different names for different things (avoid homonyms).

A proven method of creating meaningful and distinctive names is creating qualified names from modifier words, prime words, and class words, in that order. This pattern has proven very effective.

  Prime WordThe major noun that is the subject of a data attribute, such as customer, employee, asset, or product.

  Modifier WordAn optional adjective that modifies the subject of a data attribute, such as current, prior, or preferred.

  Class WordA categorization of a data attribute such as: code, amount, or date.

The examples shown in Table 05-05 should help you to better understand the benefits of using qualified names with prime, modifier, and class word components.

Table 05-05: Attribute Naming Examples

Qualified Name

(Do this)

Unqualified Name

(Avoid this)

Last Received Amount

Amount

Order Issued Indicator

Issued

Open Order Count

Open Orders

Purchase Order Number

Order Number

There tends to be a small, fixed number of class words for each enterprise. A typical list of class words, based on my own experience, is provided in Table 05-06. Using a standard abbreviation for each class word will make attribute names more consistent.


Table 05-06: Class Words

Class Word

Abbreviation

Description

address

addr

A descriptor of the location of an entity, such as a street address or email address.

amount

amt

A monetary measure of something, such as a premium amount.

code

code

A word or abbreviation that classifies and/or describes a set of specific values.

count

cnt

The number of something obtained by adding one for each occurrence of an item such as inventory count.

date

date

A day identified in a calendar, typically including year, month, and day of month, such as 2007/02/14 (yyyy/mm/dd).

description

desc

A narrative that provides an explanation about or definition of something.

identifier

id

A reference that identifies a specific instance of an entity. It is often a logical or natural key.

indicator

ind

A two valued descriptor such as: true/false and yes/no.

name

nme

One or more words that describe what an entity is called, such as first name, product name, or city name.

number

nbr

A string that identifies or describes an entity. Values are often determined by an external organization such as social security number or driver's license number. It is often but not necessarily numeric. For example, a license plate number could contain letters (DLZ 879).

object identifier

oid

A system assigned identifier for an object. This type of identifier is used internally for efficient access and storage. It is often a physical key. It has meaning only within systems and is not used by/or known to the business.

percent

pct

A number described in terms of parts per hundred; rates are excluded.

quantity

qty

A numeric measure of something other than money, such as item order quantity.

rate

rate

A quantity of something measured in terms of a fixed quantity of something else Employment rate and error rate are examples.

text

text

A block of textual information other than a name or description. Customer comment text and correction procedure text are examples.

time

time

A point in time on a specific day.

KPIs are a type of data element that is of particular interest in the world of business intelligence and analytics. A KPI helps an organization define and measure their performance by indicating how well they are progressing towards a specific goal. KPIs have characteristics in common with other data elements including:

  Name

  Datatype

  Domain

  Initial Value

  Rules

  Definition.

Additional characteristics of metrics and KPIs are described in Table 05-07.

Table 05-07: KPI and Metric Characteristics

Characteristic

Description

Responsibility Level

Categorize the KPI as one of the following:

    Operational – detailed information needed by supervisors to monitor operations and support immediate action, such as current customer orders, today's inspection results, and shipment backlogs. The data tends to be atomic and immediately actionable.

    Tactical – department information needed by managers to improve processes through midterm efforts. It includes data such as weekly worker productivity, process breakdown trends, and unscheduled maintenance costs.

    Strategic – enterprise information needed by executives to carry out long term plans such as improvement in market share or customer engagement. Strategic KPIs are often expressed as a percentage such as plant uptime percent, on time shipment percent, and asset availability percent.

Responsible Business Roles

Identifies the job roles that are measured by the KPI. This includes identification of drilldown levels of the KPI.

Targets and Priority

Specifies target levels of the metric for each business role, as well as its priority for each role. Rules that specify notifications and alerts for when a KPI deviates from the target should be specified.

Align to Enterprise Scorecard Categories

Relates the KPI to strategic categories such as those proposed in the Balanced Scorecard (BSC):

    Customer Satisfaction

    Financial Performance

    Organizational Learning

    Production and Innovation

Frequency of Review

Specifies how often the KPI should be reviewed and analyzed. Operational KPIs should be reviewed multiple times during the day, tactical KPIs should be reviewed daily or weekly, and strategic KPIs monthly or quarterly.

Data Composition and Sources

Specifies the data that makes up the KPI and from where the data should be obtained. Includes sources of drilldown details.

Presentation Type

Specifies the appropriate presentation type for the KPI, such as:

    Bar Graphs

    Broken Line Graphs

    Pie Charts

    Histograms

    Gauges

    Reports

    Trees.

 

Key Points

  Data attributes are isolated facts that are stored in the form of numbers, letters, or images.

  There is a hierarchy where data, information and knowledge build toward wisdom.

  There are many words for pieces of data, including attribute, property, data element, field, column, cell, and data point.

  Data has further characteristics that provide understanding and enable its managementname, data type, domain, initial value, rules, and definition.

  Qualitative attributes are descriptive or categorical, rather than numeric, and are classified as nominal and ordinal.

  Nominal attributes are descriptive without sequence, while ordinal attributes have order.

  Quantitative attributes are numeric, subject to mathematical operations, and are classified as interval and ratio.

  Assigning consistent, meaningful names to attributes makes attributes easier to understand and use. A proven pattern of attribute naming builds the attribute names from a combination of modifier words, prime word, and class word.

Build your know-how in the area of data element basics using these resources.

Visit a website!

Data Management International (DAMA) is a global association of professionals who are dedicated to the practice of data management: http://www.dama.org

The KPI Library is a great place to learn about business performance, bench marking, and KPIs.

http://www.kpilibrary.com/

Read about it!

This book addresses data and its use in data mining and other analytics.

Pyle, Dorian. Data Preparation for Data Mining. Morgan Kaufmann, 1999.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.69.168