Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 2

Building a Super Model

In This Chapter

Understanding the best data modeling practices

Leveraging Excel functions to deliver data

Creating smart tables that expand with data

One of Excel’s most attractive features is its flexibility. You can create an intricate system of interlocking calculations, linked cells, and formatted summaries that work together to create a final analysis. However, years of experience have brought me face to face with an ugly truth: Excel is like the cool gym teacher who lets you do anything you want — the freedom can be fun, but a lack of structure in your data models can lead to some serious headaches in the long run.

What’s a data model? A data model provides the foundation upon which your reporting mechanism is built. When you build a spreadsheet that imports, aggregates, and shapes data, you’re essentially building a data model that feeds your dashboards and reports.

Creating a poorly designed data model can mean hours of manual labor maintaining and refreshing your reporting mechanisms. On the other hand, creating an effective model allows you to easily repeat monthly reporting processes without damaging your reports or your sanity.

The goal of this chapter is to show you the concepts and techniques that help you build effective data models. In this chapter, you discover that creating a successful reporting mechanism requires more than slapping data onto a spreadsheet. Although you see how to build cool dashboard components in later chapters, those components won’t do you any good if you can’t effectively manage your data models. On that note, let’s get started.

Data Modeling Best Practices

Building an effective model isn’t as complicated as you may think. It’s primarily a matter of thinking about your reporting processes differently. Most people spend very little time thinking about the supporting data model behind a reporting process. If they think about it at all, they usually start by imagining a mock-up of the finished dashboard and work backward from there.

Rather than see only the finished dashboard in your head, try to think of the end-to-end process. Where will you get the data? How should the data be structured? What analysis will need to be performed? How will the data be fed to the dashboard? How will the dashboard be refreshed?

Obviously, the answers to these questions are highly situation-specific. However, some data modeling best practices will guide you to a new way of thinking about your reporting process. These are discussed in the next few sections.

Separating data, analysis, and presentation

One of the most important concepts in a data model is the separation of data, analysis, and presentation. The fundamental idea is that you don’t want your data to become too tied into any one particular way of presenting that data.

To wrap your mind around this concept, think about an invoice. When you receive an invoice, you don’t assume that the financial data on the invoice is the true source of your data. It’s merely a presentation of data that’s actually stored in a database. That data can be analyzed and presented to you in many other manners: in charts, in tables, or even on websites. This sounds obvious, but Excel users often fuse data, analysis, and presentation.

For instance, I’ve seen Excel workbooks that contain 12 tabs, each representing a month. On each tab, data for that month is listed along with formulas, pivot tables, and summaries. Now what happens when you’re asked to provide a summary by quarter? Do you add more formulas and tabs to consolidate the data on each of the month tabs? The fundamental problem in this scenario is that the tabs actually represent data values that are fused into the presentation of your analysis.

For an example more in line with reporting, take a look at Figure 2-1. Hard-coded tables like this one are common. This table is an amalgamation of data, analysis, and presentation. Not only does this table tie you to a specific analysis, but there’s little to no transparency into what the analysis exactly consists of. Also, what happens when you need to report by quarter or when another dimension of analysis is needed? Do you import a table that consists of more columns and rows? How does that affect your model?

Figure 2-1: Avoid hard-coded tables that fuse data, analysis, and presentation.

The alternative is to create three layers in your data model: a data layer, an analysis layer, and a presentation layer. You can think of these layers as three different spreadsheets in an Excel workbook: one sheet to hold the raw data that feeds your report, one sheet to serve as a staging area where the data is analyzed and shaped, and one sheet to serve as the presentation layer. Figure 2-2 illustrates the three layers of an effective data model.

Figure 2-2: An effective data model separates data, analysis, and presentation.

As you can see in Figure 2-2, the raw dataset is located on its own sheet. Although the dataset has some level of aggregation applied to keep it manageably small, no further analysis is done on the Data sheet.

The analysis layer consists primarily of formulas that analyze and pull data from the data layer into formatted tables commonly referred to as staging tables. These staging tables ultimately feed the reporting components in your presentation layer. In short, the sheet that contains the analysis layer becomes the staging area where data is summarized and shaped to feed the reporting components. Notice on the Analysis tab in Figure 2-2, the formula bar illustrates that the table consists of formulas that reference the Data tab.

There are a couple of benefits to this setup. First, the entire reporting model can be refreshed easily by simply replacing the raw data with an updated dataset. The formulas on the Analysis tab continue to work with the latest data. Second, any additional analysis can easily be created by using different combinations of formulas on the Analysis tab. If you need data that doesn’t exist in the Data sheet, you can easily append a column to the end of the raw dataset without disturbing the Analysis or Presentation sheets.

You don’t necessarily have to place your data, analysis, and presentation layers on different spreadsheets. In small data models, you may find it easier to place your data in one area of a spreadsheet while building staging tables in another area of the same spreadsheet.

Along those same lines, remember that you’re not limited to three spreadsheets, either. That is to say, you can have several sheets that provide the raw data, several sheets that analyze, and several that serve as the presentation layer.

Wherever you choose to place the different layers, keep in mind that the idea remains the same. The analysis layer should primarily consist of formulas that pull data from the Data sheets into staging tables used to feed your presentation. Later in this chapter, you explore some of the formulas that can be used in your analysis sheets

Starting with appropriately structured data

Not all datasets are created equal. Although some datasets work in a standard Excel environment, they may not work for data modeling purposes. Before building your data model, ensure that your source data is appropriately structured for dashboarding purposes.

At the risk of oversimplification, I assert that datasets typically used in Excel come in three fundamental forms:

The spreadsheet report
The flat data file
The tabular dataset

The punch line is that only flat data files and tabular datasets make for effective data models. I review and discuss each of these different forms in the next few sections.

Spreadsheet reports make for ineffective data models

Spreadsheet reports display highly formatted, summarized data and are often designed as presentation tools for management or executive users. A typical spreadsheet report makes judicious use of empty space for formatting, repeats data for aesthetic purposes, and presents only high-level analysis. Figure 2-3 illustrates a spreadsheet report.

Although a spreadsheet report may look nice, it doesn’t make for an effective data model. Why? The primary reason is that these reports offer you no separation of data, analysis, and presentation. You’re essentially locked into one analysis.

Although you could make charts from the report shown in Figure 2-3, it’d be impractical to apply any analysis outside what’s already there. For instance, how would you calculate and present the average of all bike sales using this particular report? How would you calculate a list of the top ten best-performing markets?

With this setup, you’re forced into very manual processes that are difficult to maintain month after month. Any analysis outside the high-level ones already in the report is basic at best — even with fancy formulas. Furthermore, what happens when you’re required to show bike sales by month? When your data model requires analysis with data that isn’t in the spreadsheet report, you’re forced to search for another dataset.

Flat data files lend themselves nicely to data models

Another type of file format is a flat file. Flat files are data repositories organized by row and column. Each row corresponds to a set of data elements, or a record. Each column is a field. A field corresponds to a unique data element in a record. Figure 2-4 contains the same data as the report in Figure 2-3 but expressed in a flat data file format.

Notice that every data field has a column, and every column corresponds to one data element. Furthermore, there’s no extra spacing, and each row (or record) corresponds to a unique set of information. But the key attribute that makes this a flat file is that no single field uniquely identifies a record. In fact, you’d have to specify four separate fields (Region, Market, Business Segment, and a month’s sales amount) before you could uniquely identify the record.

Flat files lend themselves nicely to data modeling in Excel because they can be detailed enough to hold the data you need and still be conducive to a wide array of analysis with simple formulas — SUM, AVERAGE, VLOOKUP, and SUMIF, just to name a few. Later in this chapter, you explore formulas that come in handy in a reporting data model.

Tabular datasets are perfect for pivot table–driven data models

Many effective data models are driven primarily by pivot tables. Pivot tables (which I cover in Chapter 6) are Excel’s premier analysis tools. For those of you who have used pivot tables, you know they offer an excellent way to summarize and shape data for use by reporting components, such as charts and tables.

Tabular datasets are ideal for pivot table–driven data models. Figure 2-5 illustrates a tabular dataset. Note that the primary difference between a tabular dataset, as shown in Figure 2-5, and a flat data file is that in tabular datasets the column labels don’t double as actual data. For instance, in Figure 2-4, the month identifiers are integrated into the column labels. In Figure 2-5, the Sales Period column contains the month identifier. This subtle difference in structure is what makes tabular datasets optimal data sources for pivot tables. This structure ensures that key pivot table functions, such as sorting and grouping, work the way they should.

The attributes of a tabular dataset are as follows:

The first row of the dataset contains field labels that describe the information in each column.
The column labels don’t pull double duty as data items that can be used as filters or query criteria (such as months, dates, years, regions, or markets).
There are no blank rows or columns — every column has a heading, and a value is in every row.
Each column represents a unique category of data.
Each row represents individual items in each column.

Avoiding turning your data model into a database

In Chapter 1, you might have read that measures used on a dashboard should absolutely support the initial purpose of that dashboard. The same concept applies to the back-end data model. You should only import data that’s necessary to fulfill the purpose of your dashboard or report.

In an effort to have as much data as possible at their fingertips, many Excel users bring into their spreadsheets every piece of data they can get their hands on. You can spot these people by the 40-megabyte files they send through email. You’ve seen these spreadsheets — two tabs that contain some reporting or dashboard interface and then six hidden tabs that contain thousands of lines of data (most of which isn’t used). They essentially build a database in their spreadsheet.

What’s wrong with utilizing as much data as possible? Well, here are a few issues:

Aggregating data within Excel increases the number of formulas. If you’re bringing in all raw data, you have to aggregate that data in Excel. This inevitably causes you to exponentially increase the number of formulas you have to employ and maintain. Remember that your data model is a vehicle for presenting analyses, not processing raw data. The data that works best in reporting mechanisms is what’s already been aggregated and summarized into useful views that can be navigated and fed to dashboard components. Importing data that’s already been aggregated as much as possible is far better. For example, if you need to report on Revenue by Region and Month, there’s no need to import sales transactions into your data model. Instead, use an aggregated table consisting of Region, Month, and Sum of Revenue.
Your data model will be distributed with your dashboard. In other words, because your dashboard is fed by your data model, you need to maintain the model behind the scenes (likely in hidden tabs) when distributing the dashboard. Besides the fact that it causes the file size to be unwieldy, including too much data in your data model can actually degrade the performance of your dashboard. Why? When you open an Excel file, the entire file is loaded into memory to ensure quick data processing and access. The drawback to this behavior is that Excel requires a great deal of RAM to process even the smallest change in your spreadsheet. You may have noticed that when you try to perform an action on a large, formula-intensive dataset, Excel is slow to respond, giving you a Calculating indicator on the status bar. The larger your dataset is, the less efficient the data crunching in Excel is.
Large datasets can cause difficulty in scalability. Imagine that you’re working in a small company and you’re using monthly transactions in your data model. Each month holds 80,000 lines of data. As time goes on, you build a robust process complete with all the formulas, pivot tables, and macros you need to analyze the data that’s stored on your neatly maintained tab. Now what happens after one year? Do you start a new tab? How do you analyze two datasets on two different tabs as one entity? Are your formulas still good? Do you have to write new macros?

These are all issues that can be avoided by importing only aggregated and summarized data that’s useful to the core purpose of your reporting needs.

Using tabs to document and organize your data model

Wanting to keep your data model limited to one worksheet tab is natural. In my mind, keeping track of one tab is much simpler than using different tabs. However, limiting your data model to one tab has its drawbacks, including the following:

Using one tab typically places limits on your analysis. Because only so many datasets can fit on a tab, using one tab limits the number of analyses that can be represented in your data model. This in turn limits the analysis your dashboard can offer. Consider adding tabs to your data model to provide additional data and analysis that may not fit on just one tab.
Too much on one tab makes for a confusing data model. When working with large datasets, you need plenty of staging tables to aggregate and shape the raw data so that it can be fed to your reporting components. If you use only one tab, you’re forced to position these staging tables below or to the right of your datasets. Although this may provide all the elements needed to feed your presentation layer, a good deal of scrolling is necessary to view all the elements positioned in a wide range of areas. This makes the data model difficult to understand and maintain. Use separate tabs to hold your analysis and staging tables, particularly in data models that contain large datasets occupying a lot of real estate.
Using one tab limits the amount of documentation you can include. You’ll find that your data models easily become a complex system of intertwining links among components, input ranges, output ranges, and formulas. Sure, it all makes sense while you’re building your data model, but try coming back to it after a few months. You’ll find you’ve forgotten what each data range does and how each range interacts with the final presentation layer. To avoid this problem, consider adding a Model Map tab to your data model. The Model Map tab essentially summarizes the key ranges in the data model and allows you to document how each range interacts with the reporting components in the final presentation layer. As you can see in Figure 2-6, the model map is nothing fancy — just a table that lists key information about each range in the model.

Figure 2-6: A model map allows you to document how each range interacts with your data model.

You can include any information you think appropriate in your model map. The idea is to give yourself a handy reference that guides you through the elements in your data model.

Speaking of documenting your data model …

Another way to document the logic in your data model is to use comments and labels liberally. It’s amazing how a few explanatory comments and labels can help clarify your spreadsheets. The general idea here is that the logic in your model should be clear to you even after you’ve been away from your data model for a long time.

Also, consider using colors to identify the ranges in your data model. Using colors in your data model enables you to quickly look at a range of cells and get a basic indication of what that range does. The general concept behind this best practice is that each color represents a range type. For example, you could use yellow to represent staging tables used to feed the charts and the tables in your presentation layer. You could use gray to represent formulas that aren’t to be altered or touched, or purple to represent reference tables used for lookups and drop-down lists.

You can use any color you want; it’s up to you to give these colors meaning. The important thing is that you have a visual distinction between the various ranges being used in your data model.

Testing your data model before building reporting components on top of it

This best practice is simple. Make sure your data model does what it’s supposed to do before building dashboard components on top of it. In that vein, here are a few things to watch for:

Test your formulas to ensure they’re working properly. Make sure your formulas don’t produce errors and that each formula outputs expected results.
Double-check your main dataset to ensure it’s complete. Check that your data table was not truncated when transferring to Excel. Also, be sure that each column of data is present with appropriate data labels.
Make sure all numeric formatting is appropriate. Be sure that the formatting of your data is appropriate for the field. For example, check to see that dates are formatted as dates, currency values are formatted properly, and the correct number of decimal places is displayed where needed.

The obvious goal here is to eliminate easily avoidable errors that may cause complications later.

Excel Functions That Really Deliver

As you discover in this chapter, the optimal data model for any reporting mechanism is one in which data, analysis, and presentation are separated into three layers. Although all three layers are important, the analysis layer is where the real art comes into play. The fundamental task of the analysis layer is to pull information from the data layer and then create staging tables that feed your charts, tables, and other reporting components. To do this effectively, you need to employ formulas that serve as data delivery mechanisms — formulas that deliver data to a destination range.

You see, the information you need lives in the data layer (typically, a table containing aggregated data). Data delivery formulas are designed to get that data and deliver it to the analysis layer so it can be analyzed and shaped. The cool thing is that after you’ve set up the data delivery formulas, the analysis layer automatically updates each time the data layer is refreshed.

Confused? Don’t worry — in this section, I show you a few Excel functions that work particularly well in data delivery formulas. As you complete the examples here, you’ll start to see how these concepts come together.

The VLOOKUP function

The VLOOKUP function is the king of all lookup functions in Excel. I’d be willing to bet you’ve at least heard of VLOOKUP, if not used it a few times yourself. The purpose of VLOOKUP is to find a specific value from a column of data where the leftmost row value matches a given criterion.

VLOOKUP basics

Take a look at Figure 2-7 to get the general idea. The table on the left shows sales by month and product number. The bottom table translates those product numbers to actual product names. The VLOOKUP function can help in associating the appropriate name to each respective product number.

Figure 2-7: In this example, the VLOOKUP function helps to look up the appropriate product name for each product number.

To understand how VLOOKUP formulas work, take a moment to review the basic syntax. A VLOOKUP formula requires four arguments:

VLOOKUP(Lookup_value, Table_array, Col_index_num, Range_lookup)

Lookup_value: The Lookup_value argument identifies the value being looked up. This is the value that needs to be matched to the lookup table. In the example in Figure 2-7, the Lookup_value is the product number. Therefore, the first argument for all the formulas shown in Figure 2-7 references column C (the column that contains the product number).
Table_array: The Table_array argument specifies the range that contains the lookup values. In Figure 2-7, that range is D16:E22. Here are a couple of points to keep in mind with this argument. First, for a VLOOKUP to work, the leftmost column of the table must be the matching value. For instance, if you’re trying to match product numbers, the leftmost column of the lookup table must contain product numbers. Second, notice that the reference used for this argument is an absolute reference. This means the column and row references are prefixed with dollar ($) signs — as in $D$16:$E$22. This ensures that the references don’t shift while you copy the formulas down or across.
Col_index_num: The Col_index_num argument identifies the column number in the lookup table that contains the value to be returned. In the example in Figure 2-7, the second column contains the product name (the value being looked up), so the formula uses the number 2. If the product name column were the fourth column in the lookup table, the number 4 would be used.
Range_lookup: The Range_lookup argument specifies whether you’re looking for an exact match or an approximate match. If an exact match is needed, you’d enter FALSE for this argument. If the closest match will do, you’d enter TRUE or leave the argument blank.

Applying VLOOKUP formulas in a data model

As you can imagine, there are countless ways to apply a VLOOKUP in all kinds of analyses. Let’s take a moment to walk through a scenario where using a VLOOKUP can help enhance your dashboard model.

With a few VLOOKUP formulas and a simple drop-down list, you can create a data model that not only delivers data to the appropriate staging table, but also allows you to dynamically change data views based on a selection you make. Figure 2-8 illustrates the setup.

Figure 2-8: Using the VLOOKUP function to extract and shape data.

To see this effect in action, get the Chapter 2 Samples.xlsx workbook from this book’s companion website. Open that workbook to see a VLOOKUP1 tab.

The data layer in the model shown in Figure 2-8 resides in the range A9:F209. The analysis layer is held in range E2:F6. The data layer consists of all formulas that extract and shape the data as needed. As you can see, the VLOOKUP formulas use the Customer Name value in cell C3 to look up the appropriate data from the data layer. So if you entered Chevron in cell C3, the VLOOKUP formulas would extract the data for Chevron.

You may have noticed that the VLOOKUP formulas in Figure 2-8 specify a Table_array argument of $C$9:$F$5000. This means that the lookup table they’re pointing to stretches from C9 to F5000. That seems strange because the table ends at F209. Why would you force your VLOOKUP formulas to look at a range far past the end of the data table?

Well, remember that the idea behind separating the data layer and the analysis layer is so that the analysis layer can be automatically updated when the data is refreshed. When you get new data next month, you should be able to simply replace the data layer in the model without having to rework the analysis layer. Allowing for more rows than necessary in your VLOOKUP formulas ensures that if the data layer grows, records won’t fall outside the lookup range of the formulas.

Later in this chapter, I show you how to automatically keep up with growing data tables by using smart tables.

Using data validation drop-down lists in the data model

In the example illustrated in Figure 2-8, the data model allows you to select customer names from a drop-down list when you click cell C3. The customer name serves as the lookup value for the VLOOKUP formulas. Changing the customer name extracts a new set of data from the data layer. This allows you to quickly switch from one customer to another without having to remember and type the customer name.

Now, as cool as this seems, the reasons for this setup aren’t all cosmetic. There are practical reasons for adding drop-down lists to your data models.

Many of your models consist of multiple analytical layers in which each shows a different set of analyses. Although each analysis layer is different, they often need to revolve around a shared dimension, such as the same customer name, the same market, or the same region. For instance, when you have a data model that reports on Financials, Labor Statistics, and Operational Volumes, you want to make certain that when the model is reporting financials for the South region, the Labor Statistics are for the South region as well.

An effective way to ensure this happens is to force your formulas to use the same dimension references. If cell C3 is where you switch customers, every analysis that is customer-dependent should reference cell C3. Drop-down lists allow you to have a predefined list of valid variables located in a single cell. With a drop-down list, you can easily switch dimensions while building and testing multiple analysis layers.

Adding a drop-down list is relatively easy with Excel’s Data Validation functionality. To add a drop-down list, follow these steps:

Select the Data tab on the Ribbon.
Click the Data Validation button.
Select the Settings tab in the newly activated Data Validation dialog box. (See Figure 2-9.)
In the Allow drop-down list, choose List.
In the Source input box, reference the range of cells that contain your predefined selection list.

In our example, this would be the list of customers you want exposed through the dashboard.
Click OK.

Figure 2-9: You can use data validation to create a predefined list of valid variables for your data model.

The HLookup function

The HLOOKUP function is the less popular cousin of the VLOOKUP function. The H in HLOOKUP stands for horizontal. Because Excel data is typically vertically oriented, most situations require a vertical lookup, or VLOOKUP. However, some data structures are horizontally oriented, requiring a horizontal lookup; thus, the HLOOKUP function comes in handy. The HLOOKUP searches a lookup table to find a single value from a row of data where the column label matches a given criterion.

HLOOKUP basics

Figure 2-10 demonstrates a typical scenario in which HLOOKUP formulas are used. The table in C5 requires quarter-end numbers (March and June) for 2011. The HLOOKUP formulas use the column labels to find the correct month columns and then locate the 2011 data by moving down the appropriate number of rows. In this case, 2011 data is in row 4, so the number 4 is used in the formulas.

Figure 2-10: HLOOKUP formulas help to find March and June numbers from the lookup table.

To get your mind around how this works, take a look at the basic syntax of the HLOOKUP function.

HLOOKUP(Lookup_value, Table_array, Row_index_num, Range_lookup)

Lookup_value: The Lookup_value argument identifies the value being looked up. In most cases, these values are column names. In the example in Figure 2-10, the column labels are being referenced for the Lookup_value. This points the HLOOKUP function to the appropriate column in the lookup table.
Table_array: The Table_array argument identifies the range that contains the lookup table. In Figure 2-10, that range is B9:H12. Like the VLOOKUP examples earlier in this chapter, notice that the references used for this argument are absolute. This means the column and row references are prefixed with dollar ($) signs — as in $B$9:$H$12. This ensures that the reference doesn’t shift while you copy the formula down or across.
Row_index_num: The Row_index_num argument identifies the row number that contains the value you’re looking for. In the example in Figure 2-10, the 2011 data is located in row 4 of the lookup table. Therefore, the formulas use the number 4.
Range_lookup: The Range_lookup argument specifies whether you’re looking for an exact match or an approximate match. If an exact match is needed, you’d enter FALSE for this argument. If the closest match will do, you’d enter TRUE or leave the argument blank.

Applying HLOOKUP formulas in a data model

HLOOKUPs are especially handy for shaping data into structures appropriate for charting or other types of reporting. A simple example is demonstrated in Figure 2-11. With HLOOKUPs, the data shown in the raw data table at the bottom of the figure is reoriented in a staging table at the top. When the raw data is changed or refreshed, the staging table captures the changes.

Figure 2-11: In this example, HLOOKUP formulas pull and reshape data without disturbing the raw data table.

The Sumproduct furnction

The SUMPRODUCT function is actually listed under the math and trigonometry category of Excel functions. Because the primary purpose of SUMPRODUCT is to calculate the sum product, most people don’t know you can actually use it to look up values. In fact, you can use this versatile function quite effectively in most data models.

SUMPRODUCT basics

The SUMPRODUCT function is designed to multiply values from two or more ranges of data and then add the results together to return the sum of the products. Take a look at Figure 2-12 to see a typical scenario in which the SUMPRODUCT is useful.

Figure 2-12: Without the SUMPRODUCT, getting the total sales involves multiplying price by units and then summing the results.

In Figure 2-12, you see a common analysis in which you need the total sales for the years 2011 and 2012. As you can see, to get the total sales for each year, you first have to multiply Price by the number of Units to get the total for each Region. Then you have to sum those results to get the total sales for each year.

With the SUMPRODUCT function, you can perform the two-step analysis with just one formula. Figure 2-13 shows the same analysis with SUMPRODUCT formulas. Rather than use 11 formulas, you can accomplish the same analysis with just 3!

Figure 2-13: The SUMPRODUCT function allows you to perform the same analysis with just 3 formulas instead of 11.

The syntax of the SUMPRODUCT function is fairly simple:

SUMPRODUCT(Array1, Array2, …)

Array: Array represents a range of data. You can use anywhere from 2 to 255 arrays in a SUMPRODUCT formula. The arrays are multiplied together and then added. The only hard-and-fast rule you have to remember is that all arrays must have the same number of values. That is to say, you can’t use the SUMPRODUCT if range X has 10 values and range Y has 11 values. Otherwise, you get the #VALUE! error.

A twist on the SUMPRODUCT function

The interesting thing about the SUMPRODUCT function is that it can be used to filter out values. Take a look at Figure 2-14 to see what I mean.

Figure 2-14: The SUMPRODUCT function can be used to filter data based on criteria.

The formula in cell E12 is pulling the sum of total units for just the North region. Meanwhile, cell E13 is pulling the units logged for the North region in the year 2011.

To understand how this works, take a look at the formula in cell E12, shown in Figure 2-14. That formula reads SUMPRODUCT((C3:C10="North")*(E3:E10)).

In Excel, TRUE evaluates to 1 and FALSE evaluates to 0. Every value in column C that equals North evaluates to TRUE or 1. Where the value is not North, it evaluates to FALSE or 0. The part of the formula that reads (C3:C10="North") enumerates through each value in the range C3:C10, assigning a 1 or 0 to each value. Then internally, the SUMPRODUCT formula translates to

(1*E3)+(0*E4)+(0*E5)+(0*E6)+(1*E7)+(0*E8)+(0*E9)+(0*E10).

This gives you the answer of 1628 because

(1*751)+(0*483)+(0*789)+(0*932)+(1*877)+(0*162)+(0*258)+(0*517)

equals 1628.

Applying SUMPRODUCT formulas in a data model

As always in Excel, you don’t have to hard-code the criteria in your formulas. Rather than explicitly use "North" in the SUMPRODUCT formula, you could reference a cell that contains the filter value. You can imagine that cell A3 contains the word North, in which case you can use (C3:C10=A3) instead of (C3:C10="North"). This way, you can dynamically change your filter criteria, and your formula keeps up.

Figure 2-15 demonstrates how you can use this concept to pull data into a staging table based on multiple criteria. Note that each of the SUMPRODUCT formulas shown here references cells B3 and C3 to filter on Account and Product Line. Again, you can add data validation drop-down lists to cells B3 and C3, allowing you to easily change criteria.

Figure 2-15: The SUMPRODUCT function can be used to pull summarized numbers from the data layer into staging tables.

The Choose function

The CHOOSE function returns a value from a specified list of values based on a specified position number. For instance, if you enter the formulas CHOOSE(3, “Red”, “Yellow”, “Green”, “Blue”) into a cell, Excel returns Green because Green is the third item in the list of values. The formula CHOOSE(1, “Red”, “Yellow”, “Green”, “Blue”) would return Red. Although this may not look useful on the surface, the CHOOSE function can dramatically enhance your data models.

CHOOSE basics

Figure 2-16 illustrates how CHOOSE formulas can help pinpoint and extract numbers from a range of cells. Note that instead of using hard-coded values, like Red, Green, and so on, you can use cell references to list the choices.

Figure 2-16: The CHOOSE function allows you to find values from a defined set of choices.

Take a moment to review the basic syntax of the CHOOSE function:

CHOOSE(Index_num, Value1, Value2, …)

Index_num: The Index_num argument specifies the position number of the chosen value in the list of values. If the third value in the list is needed, the Index_num is 3. The Index_num argument must be an integer between one and the maximum number of values in the defined list of values. That is to say, if there are ten choices defined in the CHOOSE formula, the Index_num argument can’t be more than ten.
Value: Each Value argument represents a choice in the defined list of choices for that CHOOSE formula. The Value> arguments can be hard-coded values, cell references, defined names, formulas, or functions. You can have up to 255 choices listed in your CHOOSE formulas.

Applying CHOOSE formulas in a data model

The CHOOSE function is especially valuable in data models in which multiple layers of data need to be brought together. Figure 2-17 illustrates an example in which CHOOSE formulas help pull data together.

Figure 2-17: The CHOOSE formulas ensure that the appropriate data is synchronously pulled from multiple data feeds.

In this example, you have two data tables: one for Revenues and one for Net Income. Each contains numbers for separate regions. The idea is to create a staging table that pulls data from both tables so that the data corresponds to a selected region.

To understand what’s going on, focus on the formula in cell F3, shown in Figure 2-17. The formula is CHOOSE($C$2,F7,F8,F9,F10). The Index_num argument is actually a cell reference that looks at the value in cell C2, which happens to be the number 2. As you can see, cell C2 is actually a VLOOKUP formula that pulls the appropriate index number for the selected region. The list of defined choices in the CHOOSE formula is essentially the cell references that make up the revenue values for each region: F7, F8, F9, and F10. So the formula in cell F3 translates to CHOOSE(2, 27474, 41767, 18911, 10590). The answer is 41,767.

Using Smart Tables That Expand with Data

One of the challenges you can encounter when building data models is a data table that expands over time. That is to say, the table grows in the number of records it holds due to new data being added. To get a basic understanding of this challenge, take a look at Figure 2-18. In this figure, you see a simple table that serves as the source for the chart. Notice that the table lists data for January through June.

Figure 2-18: The date in both the table and chart ends in June.

Imagine that next month, this table expands to include July data. You’ll have to manually update your chart to include July data. Now imagine you had this same issue across your data model, with multiple data tables that link to multiple staging tables and dashboard components. You can imagine it’d be an extremely painful task to keep up with changes each month.

To solve this issue, you can use Excel’s Table feature (you can tell they spent all night coming up with that name). The Table feature allows you to convert a range of data into a defined table that’s treated independently of other rows and columns on the worksheet. After a range is converted to a table, Excel views the individual cells in the table as a single object with functionality that a typical data range doesn’t have.

For instance, Excel tables offer the following features:

They’re automatically enabled with Filter drop-down headers so that you can filter and sort easily.
They come with the ability to quickly add a Total row with various aggregate functions.
You can apply special formatting to Excel tables independent of the rest of the spreadsheet.
Most important for data modeling purposes, they automatically expand to allow for new data.

The Table feature exists in Excel 2003 under a different name: the List feature (found on Excel’s Data menu). The benefit of this fact is that Excel tables are fully compatible with Excel 2003 Lists.

Converting a range to an Excel table

To convert a range of data to an Excel table, follow these steps:

Highlight the range of cells that contain the data you want included in your Excel table.
On the Insert tab of the Ribbon, click the Table button.

This step opens the Create Table dialog box, as shown in Figure 2-19.
In the Create Table dialog box, verify the range for the table and specify whether the first row of the selected range is a header row.
Click OK to apply the changes.

Figure 2-19: Converting a range of data to an Excel table.

After the conversion takes place, notice a few small changes. Excel has put autofilter drop-downs on the header rows, the rows in the table now have alternate shading, and any header that didn’t have a value has been named by Excel.

You can use Excel tables as the source for charts, pivot tables, list boxes, or anything else for which you’d typically use a data range. In Figure 2-20, a chart has been linked to the Excel table.

Figure 2-20: Excel tables can be used as the source for charts, pivot tables, named ranges, and so on.

Here’s the impressive bit. When data is added to the table, Excel automatically expands the range of the table and incorporates the new range into any linked object. That’s just a fancy way of saying that any chart or pivot table tied to an Excel table automatically captures new data without manual intervention.

For example, if I add July and August data to the end of the Excel table, the chart automatically updates to capture the new data. In Figure 2-21, I added July with no data and August with data to show you that the chart captures any new records and automatically plots the data given.

Figure 2-21: Excel tables automatically expand when new data is added.

Take a moment to think about what Excel tables mean to a data model. They mean pivot tables that never have to be reconfigured, charts that automatically capture new data, and ranges that automatically keep up with changes.

Converting an Excel table back to a range

If you want to convert an Excel table back to a range, you can follow these steps:

Place the cursor in any cell inside the Excel table and select the Table Tools’ Design subtabs on the Ribbon.
Click the Convert to Range button, as shown in Figure 2-22.
When asked if you’re sure (via a message box), click the Yes button.

Figure 2-22: To remove Excel table functionality, convert the table back to a range.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 2: Building a Super Model

Create new playlist

Sign In

Sign Up