2.3. Logical Model: Star Schema

The way of modeling data in the data warehouse corresponds to the need for reporting of business data. When one makes a report, business measures are analyzed, such as sales quantity, in various contexts (i.e., dimensions), such as products, customers, and time. As Edgar F. Codd, a British computer scientist who invented the relational model for database management, observed, “in principle, there are a great number of different dimensions, which can be used to analyze a certain set of data. This complex perspective that is a multi-faceted, notional picture seems to be the way in which most business people perceive their enterprise.”5

Table 2.1 shows a sales report of personal computers (PCs) by a computer hardware distribution company. In this report such measures as sales amount and costs for the product (e.g., purchase costs and logistics costs) are analyzed by looking at the dimension time (May 2008) and product (PC ABC, PC XYZ, and PC QWE).

In order to determine the data that are necessary to generate such a report, three tables must be defined:

  1. A fact table (sales and costs; see Table 2.2)6
  2. A dimension table (product; see Table 2.3)
  3. A dimension table (date; see Table 2.4)7

A data warehouse model is able to link the separate data on sales, product, and date and then generate the report presented in Table 2.1. It should be pointed out that not all the data from the three tables were used to generate the report and that the calculation of the sales amount requires reference to the sales table (Sales_Quantity and Price). Every line in every table has an unambiguous identifier: the primary key (PK).8 In the product table, it is ID_Product; in the date table, it is ID_Date; and in the sales table, the pair ID_Product and ID_Date is the composite key. The sales table is connected with the product table by ID_Product. ID_Product in the sales table is then the so-called foreign key (FK).9 The foreign key is a referential constraint between two tables. The foreign key identifies a column or a set of columns in one (referencing) table that refers to a set of columns in another (referenced) table. And analogically the sales table is connected with the date table by ID_Date; ID_Date is also a foreign key in the sales table. It is worth stressing that to one line (row) in the product table, zero, one, or many rows (records) in the sales table may correspond, while one row in the sales table may correspond exactly to one row in the product table. The sales table and the date table share exactly the same relationship. This type of relationship, described as 1:N (one to many) is the basis of modeling data in a star schema (see Figure 2.2). Every rectangle corresponds to one table with data.

Table 2.1. Demonstration of a Sales Report for a Computer Hardware Distributor, May 2008

Note: Profitability (in %) = (qualitative sales – costs total) ÷ qualitative sales.

Source: author.

Table 2.2. A Demonstration Fact Table: Sales

Source: author.

Table 2.3. A Demonstration Table for a Dimension: Product

Source: author.

Table 2.4. A Demonstration Table for Dimension: Date (Time)

Source: author.

The schema presented in Figure 2.2 is called a multidimensional model, or popularly a star schema, because there are tables with dimensions connected by a relationship 1:N in the fact table. In formal terms, the schema from Figure 2.2 should be described as follows:

  • Sales (ID_Product, ID_Date, Sales_Quantity, Price, Cost)
  • Product (ID_Product, Name, Producer, Processor)
  • Date (ID_Date, Month, Quarter, Year)

Sales, product, and date are called entity sets. A single entity in a set presented in the table is simply a row in this table. Expressions in brackets are descriptions of attributes that are characteristic of a given set of entities. The bold selected attributes represent a key (an identifier) entity in the entity set. Owing to this convention, the schema in Figure 2.2 is called an entity relationship schema, and it is a classical way of modeling databases. In fact, having determined the format of data for specific attributes, one can already generate certain data structures in a selected database management system. The model in the form of a schema of entity sets is called a logical (conceptual) model, and its realization in a specific database is a physical model.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.178.181