© Dmitry Anoshin, Dmitry Shirokov, Donna Strok 2020
D. Anoshin et al.Jumpstart Snowflakehttps://doi.org/10.1007/978-1-4842-5328-1_11

11. Designing a Modern Analytics Solution with Snowflake

Dmitry Anoshin1 , Dmitry Shirokov2 and Donna Strok3
(1)
British Columbia, Canada
(2)
Burnaby, BC, Canada
(3)
Seattle, WA, USA
 

You are now familiar with the Snowflake data warehouse (DW) and its advantages over other DW solutions. However, a typical organization won’t be using Snowflake alone. Snowflake is part of an analytics solution that consists of multiple components, including business intelligence and data integration tools.

In this chapter, you will learn about a modern solution architecture and the role of Snowflake in it. We will cover the following topics:
  • Modern analytics solution architecture

  • Snowflake partner ecosystem

  • Integration with Matillion ETL and Tableau

This chapter will help you to learn how to build an end-to-end solution using leading cloud tools for business intelligence and data integration. You will launch Matillion ETL and load data into the Snowflake DW. In addition, you will connect to Tableau Desktop and build dashboards.

Modern Analytics Solution Architecture

Nowadays, every organization wants to be data-driven to generate more value for customers and stakeholders. The organization’s management understands the value of data and treats it as an asset. They are ready to invest in modern cloud solutions like Snowflake that are scalable and secure. However, Snowflake is just one part of the analytical ecosystem. It is the core data storage for all organization data, and it provides robust access to the data.

You need more elements in order to build the right solution. These elements include data integration tools, business intelligence, and data modeling tools. Figure 11-1 highlights the key elements of a modern analytics solution.
../images/482416_1_En_11_Chapter/482416_1_En_11_Fig1_HTML.jpg
Figure 11-1

Modern analytics solution architecture

Figure 11-1 gives you an idea of how a typical analytics solution can look. I’ve already added Matillion ETL and Tableau to the diagram because we are going to use them in this chapter. However, you have a choice to use other products as well.

Table 11-1 describes additional information for each element of the architecture.
Table 11-1

Key Elements of Architecture Diagram

Element

Description

Source layer

The source layer includes all the data sources available at your organization. This could include transactional databases, files, NoSQL databases, business applications, external APIs, sensors, and IoT.

Storage layer

The storage layer is the core of solution. You may hear about data platforms, data lakes, and data warehouses. This is the place for all of them. You are ingesting data into the storage layer from the source layers, and you store this data for further analysis, data discovery, or the decision-making process.

Access layer

The access layer is nontechnical. The main goal is to provide access for business users and allow them to interact with data through BI and SQL.

Stream

Streaming is a method of data ingesting using real-time data injection. For example, you can collect data from sensors, and you have a strict SLA to analyze the data and make decisions.

Batch

Batch Processing is a method of data ingesting. For example, for DWs, we load data once per day. Sometimes, we should load data more frequently.

Snowflake

Snowflake is cloud data warehouse that can serve as a data lake. It can collect data from both batching and streaming pipelines.

Matillion ETL

Matillion ETL is a cloud-native tool that is responsible for the extract, load, and transform (ELT) process. It was built for the cloud and provides a visual interface for building data pipelines. The ELT tool is responsible for all data movement and data transformation.

SqlDBM

SqlDBM is a cloud data modeling tool. It was the first cloud tool that was built for Snowflake. Without a proper data model, you can’t deliver a quality DW. Moreover, it helps to communicate with business stakeholders and collaborate with a team.

Tableau

Tableau is a visual analytics platform that connects to Snowflake and provides access for the business users and helps them slice/dice data and deliver insights. In other words, it is business intelligence tool.

Data science tools

Data science tools provide advanced analytics capabilities. It could be an open source product, programming language (R/Python), or enterprise solution like Spark Databricks.

In this chapter, we will show how to build simple solutions using Matillion ETL, Snowflake, and Tableau. We won’t spend much time on setting up a real source system and will use sample data sets that we will load into Snowflake with Matillion and then visualize with Tableau. Moreover, we won’t build a streaming solution or talk about lambda architecture. Based on our experience in 80 percent of use cases, using a data warehouse, business intelligence, and ELT is sufficient for a typical organization.

Snowflake Partner Ecosystem

Snowflake has many technology partners, and it provides good integration with them. In addition, it has a convenient feature called Partner Connect that allows you to launch a solution via the Snowflake web interface, as shown in Figure 11-2.
../images/482416_1_En_11_Chapter/482416_1_En_11_Fig2_HTML.jpg
Figure 11-2

Snowflake Partner Connect page

Moreover, Snowflake provides native drivers like JDBC, ODBC, and others for connecting to third-party tools such as Tableau, SqlDBM, Spark, and others. Figure 11-3 shows the list of available drivers. You can click Help ➤ Download to get to this menu.
../images/482416_1_En_11_Chapter/482416_1_En_11_Fig3_HTML.jpg
Figure 11-3

Snowflake drivers

For our solution, we should choose a data integration tool and BI tool. Based on our rich experience with DW, BI, and data integration, our favorite tools for working with Snowflake are Matillion ETL and Tableau. They are leaders in their area and allow us to build a modern analytics solution and meet business requirements and SLAs.

Building Analytics Solutions

Let’s get started.

Getting Started with Matillion ETL

Matillion ETL is cloud data integration tool. It is available for Snowflake, Redshift, and BigQuery. It increases development speed, secures data, provides rich data transformation functionality, and offers many prebuilt data connectors for Salesforce, Mailchimp, Facebook, and others. One of the biggest advantages of the tool is that it looks and feels like a traditional ETL tool with a friendly user interface where developers can drag and drop components to build their data pipeline.

To start with Matillion ETL, click the Matillion box in Figure 11-1. This will open a new window and ask permission to create objects within a Snowflake account. You can see the list of objects in Table 11-2.
Table 11-2

List of Matillion Objects

Object

Object

Database

PC_MATILLION_DB

Warehouse

PC_MATILLION_WH (X-Small)

Role

PC_MATILLION_ROLE

Username

Snowflake-snowflake

After activation, the tool will immediately transfer you to the Matillion ETL web interface. This is connected to your Snowflake cluster, and you may start to work immediately. This increases your time to market.

Let’s load some initial data into Snowflake using Matillion.

Note

Our Snowflake cluster is hosted on AWS. When we launched a Matillion ETL instance from the Partner Connect page, we created the EC2 instance with Matillion ETL. It was created in a different AWS account. We can launch Matillion ETL in our AWS account by finding it in the AWS Marketplace. In this case, we will get full control over the Matillion ETL instance, connect via SSH, use an application load balancer, adjust security groups, and so on.

Running Our First Job with Matillion ETL

We will use a demo Matillion ETL job and sample airport data in order to create our first ELT job and then load and transform data for our Snowflake DW. Let’s get started.
  1. 1.

    Log into Matillion ETL. You can use the URL, password, and username that you’ve received in the Matillion activation e-mail.

     
  2. 2.

    Navigate to Partner Connect Group ➤ Snowflake Project. You will find two demo jobs, called dim_airport_setup and dim_airports.

     
  3. 3.

    Open the dim_airport_setup job by clicking it twice. In Figure 11-4, we are showing key elements of the Matillion web interface.

     
../images/482416_1_En_11_Chapter/482416_1_En_11_Fig4_HTML.jpg
Figure 11-4

Modern analytics solution architecture

When you are working with Matillion, you are working mostly from a browser. The same is true for Snowflake. Table 11-3 describes the key elements of the Matillion ETL web interface shown in Figure 11-4.
Table 11-3

Key Elements of Matillion ETL Web Interface

Element in Figure 11-4

Description

1

The job list pane includes all the jobs that you are building for this project. Moreover, you can organize jobs with folders. There are two types of jobs. A transformation job (green) is responsible for data transformation inside Snowflake. An orchestration job (blue) is responsible for extracting and loading data from/to external sources.

2

The Components pane includes all the components available for this job type (blue or green). You can easily drag and drop components and build pipelines.

3

The “Shared job” pane lists the shared jobs. A shared job is a kind of aggregated job. For example, you can build a new job and reuse it as a single component.

4

The Environments pane lists the environments. For example, by default we have one environment that is connected to our Snowflake cluster. If we want to load data into a different Snowflake cluster or from another AWS account, we should create a new environment and specify the credentials.

5

Canvas is our development environment where we can drag and drop components from the Components pane or “Shared job” pane and organize them into the data pipeline. Moreover, we can add notes with tips and documentation.

6

The Project menu is the main menu for Matillion ETL, where you can manage existing projects, switch to others, and manage variables, schedules, API profiles, and many others.

7

The Admin menu is available for the Matillion administrator. From this menu you can manage Matillion users, manage backups, and download logs.

8

The Help menu allows you to get support information, manage active sessions, and manage license keys.

9

Component options connect you to Snowflake and provide access to business users and help them slice/dice data and deliver insights. In other words, this is a business intelligence tool.

10

The Status menu provides information about currently running tasks, shows the command log, and displays notices about available updates.

You have learned about the key elements of the Matillion web interface, so you can now run a job. Click the right button on the canvas and choose “Run job (snowflake).” Matillion will run the current job using the environment name snowflake. This job consists of multiple steps.

  1. a.

    Create tables using the Create Table component.

     
  2. b.

    Load data from S3 into the staging tables using the S3 Load component.

     
  3. c.

    Execute the transformation job dim_airport that will transform raw semistructured data into a tabular format and load it into a dimension table.

    Note During this exercise, we loaded the Matillion sample data set that is stored in an Amazon S3 bucket of Matillion. This bucket is public and is available to everyone. If you have Snowflake on Azure, then you will load data from Blob Storage.

     
  1. 4.

    After the job is finished, we can go back to the Snowflake web UI and check the new objects that were created by Matillion. Figure 11-5 shows the list of Snowflake tables that were created by the Matillion orchestration job.

     
../images/482416_1_En_11_Chapter/482416_1_En_11_Fig5_HTML.jpg
Figure 11-5

Snowflake tables created by Matillion ETL

We launched Matillion ETL and loaded sample data into the Snowflake DW. In a real-world scenario, we would create many more jobs and collect data from external sources. For example, for marketing analytics use cases, we need to load data from social media platforms such as Facebook, Twitter, YouTube, and so on. Matillion ETL provides prebuilt connectors that will save time for data engineers or ETL developers.

Moreover, for a quality solution, we should design a data model for querying our data. We might choose a technique like using Data Vault, dimensional modeling, and so on. The best choice for the Snowflake data model is SqlDBM.

The final step is to connect to a BI tool. We need a BI tool for simplifying access for nontechnical users. With Tableau, business users can do data discovery using drag-and-drop methods and powerful analytics and visualization capabilities. For our sample solution, we will install Tableau Desktop and connect to the matillion_dim_airport table in order to visualize data.

Getting Started with Tableau

Tableau is a leading visual analytics platform. There are many tools available on the market, but Tableau stands out among them. We have worked with many different tools from leading vendors and found that Tableau is the most powerful tool for business intelligence and self-service. Moreover, it has a large and friendly community. If you have never worked with Tableau, now is a good time to try it. Connecting Tableau to Snowflake allows us to use best-of-breed technologies working together. Tableau is available in Server and Desktop versions. Moreover, it has a mobile application. Let’s get Tableau and connect to the Snowflake cluster.

Building Our First Visualization with Tableau and Snowflake

During this exercise, we will install Tableau Desktop and connect it to the Snowflake DW. Then we will visualize the matillion_airport_dim data.

  1. 1.

    Let’s download and install Tableau Desktop. Go to https://www.tableau.com/products/desktop/download and download a recent version of Tableau Desktop. It is available for macOS and Windows. Then install it.

     
  2. 2.
    Open Tableau Desktop and connect to Snowflake, as shown in Figure 11-6.
    ../images/482416_1_En_11_Chapter/482416_1_En_11_Fig6_HTML.jpg
    Figure 11-6

    Tableau Desktop connection to Snowflake

    Note To connect to the Snowflake DW, you need to download the ODBC driver from the Snowflake web UI. Select Help ➤ Download ➤ ODBC driver. Download it and install.

     
  1. 3.
    Then, you should enter your credentials in order to connect to Snowflake from Tableau. You can use the Matillion credentials that were created during the Matillion ETL initializing, including the user role, or you can use your master credentials. You should use your admin Snowflake credentials. Figure 11-7 shows an example of the connection options.
    ../images/482416_1_En_11_Chapter/482416_1_En_11_Fig7_HTML.jpg
    Figure 11-7

    Snowflake connection window

     
  1. 4.
    Click Sign In and then enter the following:
    1. a.

      Warehouse: PC_MATILLION_WH

       
    2. b.

      Database: PC_MATILLION_DB

       
    3. c.

      Schema: Public

      Then drag and drop the matillion_dim_airports table to the connection canvas.

       
     
  1. 5.

    Click Sheet 1, and you will jump into the development area. You just created your first Tableau live data source.

    Note The Tableau data source supports live and extract options. Extract will query all data from the data source and cache it into an internal columnar data store called Hyper. The live connection will query data from the data source on demand. This is the right strategy for a big volume of data. With a live connection, Snowflake will do the heavy lifting, and Tableau will render the result. This is the secret to doing big data analytics.

     
  1. 6.
    Let’s create a quick visualization using the available data. Say we want to know the number of airports across states and order them in descending order. In Figure 11-8 you can see the Tableau Desktop interface and a simple report.
    ../images/482416_1_En_11_Chapter/482416_1_En_11_Fig8_HTML.jpg
    Figure 11-8

    Number of airports in Alaska

     
Did you know that Alaska has the most airports of all states? This is good insight. We built this report by dragging and dropping the dimension (blue) State Name into the Rows pane and Calculated Field #Airports into the Columns pane. To create a calculated field, click the right button in the Measures pane and choose Create Calculated Field. Then use the following syntax:
COUNTD([Airport])
This count the distinct (unique) number of airports names. We will put a measure also in the Label pane to provide a label for each bar.
  1. 7.
    It is interesting to look at Snowflake to see what was happening when we built our report. From the Snowflake web UI on the History tab, we can see the SQL query that was generated by Tableau, shown here:
    SELECT "matillion_dim_airports"."iata" AS "iata",
      "matillion_dim_airports"."state" AS "state"
    FROM "PUBLIC"."matillion_dim_airports" "matillion_dim_airports"
    GROUP BY 1,  2

    Moreover, we are able to look at the execution plan. This is helpful when we are working with large data sets and multiple tables.

     

We have connected the Snowflake DW with Tableau Desktop. The next logical step is to publish the report to the Tableau server and share it with stakeholders.

Note

With Tableau, you can leverage the unique features of Snowflake such as querying and visualizing semistructured data, working with the Time Travel feature, sharing data, implementing role-based security, and using custom aggregation. Moreover, we can integrate Tableau and Matillion. You can find more good information about this at https://rockyourdata.cloud/best-practices-matillion-etl-and-tableau/.

Summary

In this chapter, we covered the Snowflake partner ecosystem, and you learned about a modern analytics architecture and its key elements. Moreover, we connected to the best cloud ELT tool for Snowflake, which is Matillion ETL, and ran our first job. Then, we built a report with the best visual analytics tool, called Tableau. At the end of this chapter, we created analytics solution that can be scaled and is ready for use in production. Using this example, you can build your analytics solution and get immediate value.

In the next chapter, we will talk about some data use cases for Snowflake. You will learn how Snowflake can handle a large volume of data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.96.94