Tableau licensing has an impact on how a person can create and maintain data models. First, Tableau has role-based pricing for individual users of the platform. Second, Tableau has different tiers for how organizations license Tableau, which impacts data modeling capabilities.
Tableau data models are used by analysts and developers to create Tableau workbooks. Workbooks are a series of sheets, dashboards, and stories. Every workbook must be connected to a minimum of one data model. These data models can be embedded in the workbook, meaning they are only available in the workbook. Alternatively, they can be published separately and made available in many workbooks; these are called published data sources. We will create both types in this chapter.
Data models can also connect live to data sources. This means the queries from the model are serviced by the underlying data server. Data can also be extracted into the Tableau data engine, called Hyper. We will learn about both types of connections in this chapter.
Tableau also provides the capability to put a layer between the data model and the underlying data server through a virtual connection. We will also explore virtual connections in this chapter.
This chapter will give you the foundational knowledge of how data modeling fits into the Tableau platform. This knowledge is a crucial foundation for the exercises we will be doing in future chapters. The main topics we will be discussing are how Tableau connects to and publishes data and the licensing considerations for which users can perform these tasks.
In this chapter, we’re going to cover the following main topics:
For the complete list of requirements to run the practical examples in this chapter, please see the Technical requirements section in Chapter 1. The files used in the exercises in this chapter can be found at https://github.com/PacktPublishing/Data-Modeling-with-Tableau/.
The method Tableau uses for product licensing is important to understand as it has ramifications on the data modeling capabilities available to the user of the Tableau platform.
For Tableau Cloud and most Tableau Server deployments, Tableau’s first and primary licensing method is by user role. The three Tableau roles are:
Before leaving this section, we should acknowledge that two other roles exist and can determine some of the capabilities available to Viewers, Explorers, and Creators. These two roles are Server Administrator and Site Administrator. Neither of these roles is a licensed role, rather they are additional roles that are layered on top of the licensed roles. The Server Administrator role is a specific role to administer a Tableau Server on-premises deployment. The Site Administrator is a role intended for the management of content and users within a Tableau site. Tableau Server has both roles and Tableau Cloud only has the Site Administrator role, as the Server Administrator roles of installation, upgrade, and management are included in the Tableau Cloud service.
To have the role of Site Administrator, you must first be licensed either at the Explorer or Creator level.
Although we do not cover server and site administration functions in this chapter, it is important to know of the existence of these roles because both Site and Server Administrators have access to settings that could affect your ability to perform some of the tasks we do in this book. If you are working on these exercises and find you don’t have access, reach out to a Site Administrator for help.
Starting with the 2019.3 release, Tableau enhanced Data Management capabilities with the introduction of Tableau Prep Conductor and Tableau Catalog. Since the initial release of Data Management, Tableau has continued to add and enhance capabilities. The virtual connections functions we mentioned in the first part of this chapter are an example. These Data Management features require additional licensing costs.
It is important to understand this product add-on as it enables some of the Data Management functions that layer onto the modeling capabilities we will learn about in this book. As it is a set of bundled features that are licensed on top of user roles, you might not have these capabilities available to you if your organization has not purchased Data Management licenses. To get access to these features, you can download a 14-day free trial of Tableau Cloud with Data Management enabled, as described in the Technical requirements section in Chapter 1.
Table 1.1 outlines which capabilities are available with and without Data Management licensing. For all capabilities listed, the user role is Creator:
Base Tableau Creator license |
Capabilities requiring Data Management |
|
|
Table 2.1 – Additional capabilities with Data Management
In addition to the data modeling capabilities that are only available with this add-on, there are also some key data governance capabilities that are included. Data lineage, data catalog, and data warnings are all included with Data Management. We will look at these features in Chapter 13.
Prior to the Tableau 2021.4 release, data model creation started with a direct connection to the underlying data source. This meant that data modelers would most often connect directly to enterprise database tables in their organization. This approach works well when the data modeler understands databases, but it falls short in several important ways, especially when the organization wants to delegate the role of data modeling to less technical users in the business.
If the organization is going to delegate data model creation to business users, the information technology, data engineering, and security teams will often want to ensure that the data modeler:
Tableau virtual connections can only be created in the web client of either Tableau Server or Tableau Cloud. Virtual connections are intended for large organizations. For this reason, they are only available with Data Management and are only available for data servers and cloud data drives.
We will now go through the starting process for creating a virtual connection:
Figure 2.1 – New options on Tableau Server and Tableau Cloud
We will stop at this point, for now, and look at virtual connections in more detail in subsequent chapters. The most important takeaway from virtual connections is that they are not data models per se but a layer that sits between underlying databases and Tableau data models. Virtual connections are also not always needed in the data modeling pipeline, meaning you can create Tableau models directly on top of data sources without first creating a virtual connection.
Published data sources are the primary method of sharing data models between analytics users. A published data source contains all the information a person will need to start creating visualizations, namely, a published data source can have the following:
Published data sources can be created in three different places in Tableau: Tableau Desktop, Tableau Prep Builder, and from the home page in the web client. In this chapter, we will look at the creation of published data sources from Tableau Desktop and the home page in the web client. We will create a published data source from Tableau Prep Builder in Chapter 6, Data Output.
Let’s open Tableau Desktop and connect to the same Superstore sales 2022.csv file we used in the previous chapter:
Figure 2.2 – Go to Worksheet prompt upon connecting to data
Figure 2.3 – The Tableau Desktop data pane
Figure 2.4 – Publish Data Source
Figure 2.5 – Publish Data Source dialog box
Figure 2.6 – Adding security to a data model
The last option is a checkbox that says Update workbook to use published data source? If you plan to create visual analyses in this workbook, you should check the box. This ensures that the analysis will be kept up to date when the published data source is updated. In this case, we are only using the workbook to create a published data source so leave the box unchecked.
Before we hit the Publish button, you might notice two warnings. The first is specific to Tableau Cloud. If you use Tableau Cloud with data sources that are housed within your organization’s network or on your individual computer, you will need Tableau Bridge to create a connection between the data source and Tableau Cloud. We will explore Tableau Bridge in detail in Chapter 14, Scheduling Extract Refreshes. The other message might be Requires creating an extract on publish. We will discuss extracts later in this chapter.
Tableau will now open the page to your published data source in a browser. Please keep Tableau Desktop open in the background as we will begin the next section where we left off here.
In your browser, you might see the dialogue box in Figure 2.7. For now, please disregard this dialogue box by clicking on the cross in the top-right corner.
Figure 2.7 – Publishing Complete dialogue
Now that we have seen how to create and publish a Tableau published data source from Tableau Desktop, let’s look at published data sources on Tableau Server and Tableau Cloud.
We will now look at working with published data sources from the Tableau web user interface, that is, working with published data sources without needing to use Tableau Desktop:
Figure 2.8 – The New button from the published data source in the browser
Figure 2.9 – Create Extract Refresh dialogue
Figure 2.10 – Data Lineage
Figure 2.11 – Connecting to a data source in the browser
Figure 2.12 – Web client version UI to create a published data source
That covers the basics of understanding and creating published data sources, the primary method of creating shareable data models. In future chapters, we will look at creating more complex data models and publishing and maintaining them. Next, we will look at embedded data sources.
The other main data source type in Tableau is an embedded data source. An embedded data source has the data model embedded within the Tableau workbook. What does this mean?
When we publish an embedded data source, we don’t publish the data source, but rather we publish a workbook that is not connected to a previously published data source. This is what makes the data source embedded. It is embedded in the workbook that has been published.
To see how this works, let’s go back to where we left off in Tableau Desktop in the previous section, namely, step 10:
Figure 2.13 – Sales by region
Figure 2.14 – Renaming a sheet
Figure 2.15 – Publish Workbook... menu item
Figure 2.16 – Publish Workbook to Tableau Online dialog box
If the Data Sources option says 1 published separately, first edit this option to Embedded in workbook as per Figure 2.17:
Figure 2.17 – Changing the data source type
Embedded data sources have their place in the Tableau infrastructure. For individual analysts creating workbooks where the data model is not likely to be used by others, embedded data sources make sense to avoid the overhead of managing the workbook and its data source separately.
As the main goal of this chapter is Tableau data modeling, we will spend the rest of our time focusing on creating data models that will be available for your entire organization and perhaps beyond your organization. For this reason, we are going to focus exclusively on published data sources, but it is important for you to understand embedded data sources and when they make logical sense.
Tableau broadly gives two options for connection types for the data behind your data models. These are Live and Extract.
If you choose a live connection, Tableau will query your data source every time a user interacts with a visualization when it needs to get additional data that isn’t in the view. If you choose to extract the data, Tableau will move the data from where the data is sourced to a high-performance analytical store.
The most basic use case for live connections is when the analysis being performed needs to occur on up-to-the-minute data. When the analysis is slightly delayed, as of the end of the close of business of the previous day, for example, an extract will often make the most sense as it allows for faster query time and less impact on operational databases. These use cases often simplify the many nuisances that determine the best option between live connections and extracts. We will explore each of these considerations as they come up in the use cases we cover throughout the rest of this book.
Another use case for live connections is when your data is stored in a highly performant database that is already optimized for analytic analysis. Traditionally, this included cube technologies and in-memory database appliances. More recently, all the major cloud vendors, as well as companies such as Snowflake and Databricks, offer analytical databases as a service.
To create an extract in Tableau Desktop, navigate to the Data Source tab in the bottom left-hand corner of the screen. Near the top right-hand corner of the data source screen, you should see the option for a Live or Extract connection as per Figure 2.18:
Figure 2.18 – Changing the connection type
It is as simple as clicking the appropriate radio button and Tableau will handle the rest for you. Next up, we will discuss Tableau Hyper, the database engine that powers Tableau extracts.
Tableau extracts are stored in a proprietary database engine called Hyper. The Hyper engine is included with Tableau Desktop, Tableau Prep Builder, Tableau Server, and Tableau Cloud and therefore does not have any additional licensing requirements. In a technical sense, the term extract refers to data that is moved from its original source for the purposes of analysis. The term Hyper refers to the Tableau technology that houses and manages the extracted data. However, the terms are often used interchangeably or even together as Hyper extract. In fact, the extracted data will sit on a disk with a .hyper extension.
It is important to mention Hyper in this chapter as it is an important piece of the data modeling stack in Tableau. For the purposes of this book, we never interact with Hyper directly, but it is often at work behind the scenes when our data models are from extracted data. We have now explored published and embedded data sources and live and extract connections. We are now ready to tackle Tableau Prep Builder in Chapter 3.
In this chapter, we explored the different Tableau licensing options and how they impact us as data modelers.
We then looked at how Tableau uses data models in workbooks. We looked at embedded data sources, which are data models that are linked to one workbook and cannot be used in others. We explored published data sources as a way to share our data models to be used by many analysts and developers in the creation of their workbooks. We also looked at live and extract connections and when to use each of them.
In the next chapter, we will build on what we learned by working with Tableau Prep Builder and understanding the role it plays in creating data models.
18.219.132.107