© Dmitry Anoshin, Dmitry Shirokov, Donna Strok 2020
D. Anoshin et al.Jumpstart Snowflakehttps://doi.org/10.1007/978-1-4842-5328-1_3

3. Building a Virtual Warehouse

Dmitry Anoshin1 , Dmitry Shirokov2 and Donna Strok3
(1)
British Columbia, Canada
(2)
Burnaby, BC, Canada
(3)
Seattle, WA, USA
 
A Snowflake virtual warehouse is a cluster of compute resources for your Snowflake database. These compute resources include CPU, memory allocation, and temporary storage. Virtual warehouse concepts are important to understand because a virtual warehouse is the foundation of what you will build inside your Snowflake account. In this chapter, we will cover the following:
  • Overview of Snowflake virtual data warehouses

  • Understanding warehouse use cases

  • Virtual data warehouse considerations

  • Building your first Snowflake virtual data warehouse

Overview of Snowflake Virtual Warehouses

In this section, we will cover the different warehouse types and strategies for keeping costs down. The following are the topics that will be covered:
  • Warehouse sizing and features

  • Multicluster warehouses

Warehouse Sizes and Features

Snowflake virtual warehouse sizes (as mentioned in Chapter 2) are T-shirt sizes such as X-Small, Small, Medium, Large, and so on. The credit charges for a virtual warehouse start at 1 for X-Small and double each size you go up. This makes it an easy pricing model to remember.

Choosing the Right Size

Even with the simple pricing model, it still might seem like a daunting task to come up with the approximate size needed for your virtual warehouse. Data loading and query processing are the two considerations that need to be mapped out when determining the size you need. As queries get more complex, the time it takes for the server to execute can increase. Likewise, as more data gets loaded into Snowflake, the loading performance might be affected.

Note

Larger virtual warehouses may not result in better performance for data loading or query processing.

Concurrency

The number of queries that a virtual warehouse can concurrently process is determined by the size and complexity of each query. As queries are submitted, the virtual warehouse calculates and reserves the compute resources needed to process each query. If the virtual warehouse does not have enough remaining resources to process a query, the query is queued, pending resources that become available as other running queries complete.

Snowflake provides some object-level parameters that can be set to help control query processing and concurrency.
  • STATEMENT_QUEUED_TIMEOUT_IN_SECONDS

  • STATEMENT_TIMEOUT_IN_SECONDS

Note

If queries are queuing more than desired, another virtual warehouse can be created, and queries can be manually redirected to the new virtual warehouse. In addition, resizing a virtual warehouse can enable limited scaling for query concurrency and queuing; however, virtual warehouse resizing is primarily for improving query performance.

To enable fully automated scaling for concurrency, Snowflake recommends multicluster virtual warehouses, which provide essentially the same benefits as creating additional virtual warehouses and redirecting queries but without requiring manual intervention. Multicluster virtual warehouses are discussed later in this chapter.

Default Virtual Warehouses in Sessions

When a session is initiated in Snowflake, the session does not, by default, have a virtual warehouse associated with it. Until a session has a virtual warehouse associated with it, queries cannot be submitted within the session. Snowflake allows sessions to specify which virtual warehouse they will default to. Sessions can be initiated by users or tools, and sessions may also change to another virtual warehouse by using the USE WAREHOUSE command .

To facilitate querying immediately after a session is initiated, Snowflake supports specifying a default virtual warehouse for each individual user. The default warehouse for a user is used as the warehouse for all sessions initiated by that user. A default warehouse can be specified when creating or modifying the user, either by using the web interface or by using CREATE USER/ALTER USER.

Snowflake clients (SnowSQL, JDBC driver, ODBC driver, Python connector, etc.) can specify a default warehouse through their connections or configuration files, as appropriate. Chapters 4, 5, and 6 discuss these Snowflake clients and provide more information on how to set up the default warehouse.

Multicluster Virtual Warehouses

Some organizations may replicate data into separate data marts. They may also shift some data workloads outside of normal business hours or queue usage to boost performance. Snowflake offers users the ability to automatically scale out their virtual warehouse by distributing replicated data, in memory, across separate compute clusters.

Resizing your virtual warehouse can provide performance benefits for slow-running queries and data loading. Multicluster warehouses will automatically increase or decrease the number of queries.

Multicluster virtual warehouses are an Enterprise Edition feature. If multicluster virtual warehouses are enabled for your account, you can also set the maximum and minimum number of clusters for the warehouse. Multicluster warehouses will use multiple clusters of servers to handle cases where fluctuating numbers of concurrent queries occur, such as during peak and off-peak hours. As the load increases, the virtual warehouse automatically starts more clusters to prevent queries from queuing. When the additional clusters are no longer needed, it shuts them down.

Note

Multicluster virtual warehouses are a Snowflake Enterprise Edition feature.

Overview of Multicluster Virtual Warehouses

Multicluster virtual warehouses improve concurrency and are designed specifically for handling queuing and performance issues related to large numbers of concurrent queries by many users. In addition, multicluster virtual warehouses can be automated to scale if your number of users/queries tends to fluctuate.

When deciding whether to use multicluster virtual warehouses and the number of clusters to use per virtual warehouse, consider the following:
  • All your virtual warehouses should be configured as multicluster virtual warehouses.

  • Multicluster virtual warehouses should be configured to run in autoscaling mode, which enables Snowflake to automatically start and stop clusters as needed.

When choosing the minimum and maximum number of clusters for a multicluster warehouse, consider the following:
  • Minimum: Keep the default value of 1; this ensures that additional clusters are started only as needed. However, if high availability of the virtual warehouse is a concern, set the value higher than 1. This helps ensure virtual warehouse availability and continuity in the unlikely event that a cluster fails.

  • Maximum: Set this value as large as possible, while being mindful of the virtual warehouse size and corresponding credit costs. For example, an X-Large virtual warehouse (16 servers) with maximum clusters (10) will consume 160 credits in an hour if all 10 clusters run continuously for the hour.

Multicluster Credits and Usage

To see your consumption in the web interface, click the name of a warehouse to display the average load on the warehouse for all queries processed and queued over the last two weeks. The page displays the query load in intervals of five minutes or one hour depending on the length of the viewing window. Each time a virtual warehouse is resumed or increases in size, your account is billed for one minute of usage; after the first minute, billing is calculated per second. Figure 3-1 shows what the usage screen looks like.
../images/482416_1_En_3_Chapter/482416_1_En_3_Fig1_HTML.jpg
Figure 3-1

A newly created virtual warehouse’s load over time

Credits are charged based on the following:
  • The number of servers per cluster, which is determined by the virtual warehouse size

  • The number of clusters, if using multicluster virtual warehouses

  • The length of time each server in each cluster runs

Virtual Warehouse Considerations

While reading this book, you may be looking for a solution for your use case. Typical virtual warehouse design considerations include the following:
  • Query design

  • Query caching, reuse of frequently run queries

  • Scaling

Query Design

The number of servers required to process a query depends on the size and complexity of the query. Queries will scale linearly as a virtual warehouse size increases, particularly for larger, more complex queries. Therefore, consider the following in your query design:
  • Table size is more significant than the number of rows.

  • Filters using predicates impact processing.

  • A higher number of joins will also impact processing.

Tip

To achieve the best results, try to execute relatively homogeneous queries (size, complexity, data sets, etc.) on the same warehouse; executing queries of widely varying size and/or complexity on the same warehouse makes it more difficult to analyze warehouse load, which can make it more difficult to select the best size to match the size, composition, and number of queries in your workload.

Caching Impacts

When a virtual warehouse is in a running state, it maintains a cache of table data that has been accessed from previously completed queries. The size of the cache is determined by the number of servers in the virtual warehouse. The size of the cache increases as the virtual warehouse size increases.

The cache is dropped when the virtual warehouse is suspended, which may result in slower initial performance for some queries as the virtual warehouse is resumed. The cache is rebuilt when a warehouse is resumed, and queries are then able to take advantage of the improved performance that the cache provides.

Consider the trade-off between saving credits by suspending a virtual warehouse versus maintaining the cache of data from previous queries to help with performance.

Scaling

Virtual warehouse resizing is supported by Snowflake at any time. You can even resize a warehouse while the virtual warehouse is running. When you have a slow query and have other similarly sized queries queued up, it might make sense to resize the warehouse while it’s running.

But be aware that larger is not necessarily faster; for smaller, basic queries that are already executing quickly, you may not see any significant improvement after resizing. If a query is already running, then it will not be impacted by the resized virtual warehouse. The additional virtual warehouses will be used only by the queued or new queries that will be processed.

Building a Snowflake Virtual Warehouse

There are two ways you can build a Snowflake virtual warehouse. It can be done either through the web interface or through SQL commands. We will demonstrate how to do it in both ways. We will be doing this through an already created Snowflake account. If you haven’t yet created your Snowflake account, please review Chapter 2 to get your Snowflake account set up. Let’s get started!

Creating a Virtual Warehouse

Snowflake Web Interface

  1. 1.

    Log in to your Snowflake web interface.

     
  2. 2.

    Click Warehouses + Create.

     
  3. 3.

    Give your virtual warehouse a name, choose a size, and set the maximum idle time before the virtual warehouse automatically suspends. Click Finish. Figure 3-2 shows this information entered into the web interface.

     
../images/482416_1_En_3_Chapter/482416_1_En_3_Fig2_HTML.jpg
Figure 3-2

Create Warehouse dialog in web interface after selecting Create Warehouse

Tip

After entering the details, click Show SQL at the bottom of the Create Warehouse dialog to see the SQL that would perform this same task. Figure 3-3 shows what this would look like based on the setup in Figure 3-2.
../images/482416_1_En_3_Chapter/482416_1_En_3_Fig3_HTML.jpg
Figure 3-3

Show SQL will display the SQL code that can be used in a script. In this example, AUTO_SUSPEND automatically translates the minutes entered into the Create Warehouse dialog into seconds

SQL Commands

  1. 1.

    The following CREATE WAREHOUSE command will give the same results as what was done in the web interface:

    CREATE WAREHOUSE TestWarehouse
    WITH WAREHOUSE_SIZE = 'MEDIUM'
    WAREHOUSE_TYPE = 'STANDARD'
    AUTO_SUSPEND = 600
    AUTO_RESUME = TRUE;
     
  2. 2.
    Figure 3-4 shows that our warehouse, TESTWAREHOUSE, is created after provisioning was completed.
    ../images/482416_1_En_3_Chapter/482416_1_En_3_Fig4_HTML.jpg
    Figure 3-4

    TESTWAREHOUSE has completed provisioning and is now appearing in our list of warehouses

    Important

    Always ensure that auto suspend and auto resume are set in your warehouse. By default, these settings are set for you when a virtual warehouse is provisioned. Auto suspend stops a warehouse if it sits idle for a specified period of time, while auto resume starts a suspended virtual warehouse when queries are submitted to it. This is important because a running warehouse will consume Snowflake credits only when compute resources are being utilized. Shutting down your warehouse, when they are not in use, will help conserve credits and control costs.

     

Starting, Resuming, Suspending, and Resizing

You can also manually start/resume, suspend, and resize a virtual warehouse. These are especially helpful for one-offs or if you don’t want to wait for the automatic start or suspension of your virtual warehouse. A virtual warehouse can be started at any time, even when it’s first created.

Web Interface

Here are the steps for common tasks:
  • Start/Resume: Click Warehouses and select your warehouse. Click Resume.

  • Suspend: Click Warehouses and select your warehouse. Click Suspend.

  • Resize: Click Warehouses and select your warehouse. Click Configure. Select the new size from the Size drop-down. Click Finish.

SQL Command

Here are commands for common tasks:
  • Start/Resume: Use the ALTER WAREHOUSE command with RESUME.

    ALTER WAREHOUSE TestWarehouse
    RESUME IF SUSPENDED
  • Suspend: Use the ALTER WAREHOUSE command with SUSPEND.

    ALTER WAREHOUSE TestWarehouse
    SUSPEND
  • Resize: Use the ALTER WAREHOUSE command with SET WAREHOUSE_SIZE.

    ALTER WAREHOUSE TestWarehouse
    SET WAREHOUSE_SIZE = XSMALL

Using a Warehouse

The USE WAREHOUSE command is used only as a SQL command. This simply allows the SQL statements to connect to the correct virtual warehouse during execution. When using the Worksheets page in the web interface, the USE WAREHOUSE command is implied by selecting the virtual warehouse at the top right of the screen, as shown in Figure 3-5.
../images/482416_1_En_3_Chapter/482416_1_En_3_Fig5_HTML.jpg
Figure 3-5

This shows how to view or change the virtual warehouse within the worksheet

Setting Up Load Monitoring

The load monitoring dashboard is used for the performance monitoring of your virtual warehouses within your account. There is also a feature of the dashboard that lets you monitor your credits over a period of time. Log into your Snowflake web interface to begin setting up load monitoring.
  1. 1.

    Click Warehouses and highlight your chosen warehouse. A page will appear on the right allowing you to grant privileges.

     
  2. 2.

    Click Grant Privileges.

     
  3. 3.

    For “Privileges to grant,” select MONITOR.

     
  4. 4.

    For “Grant privileges to,” select ACCOUNTADMIN.

     
  5. 5.

    Click Grant, as shown in Figure 3-6.

     
../images/482416_1_En_3_Chapter/482416_1_En_3_Fig6_HTML.jpg
Figure 3-6

The information entered into the dialog

To view load monitoring, do the following:
  1. 1.

    Click Warehouses.

     
  2. 2.

    Click your choice of warehouse name (which should be hyperlinked).

     
Figure 3-7 shows how the loading monitoring chart looks on a newly created warehouse. There is one bar at the end of the Warehouse Load Over Time chart, because we ran one query. At the bottom there is a date slider that allows you to increase or decrease the date range you want to see in the chart. There is no history in this chart; therefore, no more bars appear.
../images/482416_1_En_3_Chapter/482416_1_En_3_Fig7_HTML.jpg
Figure 3-7

Warehouse Load Over Time chart on the TestWarehouse warehouse

Note

The date slider has a minimum/maximum range of 8 hours to 14 days. Load monitoring data is not available previous to 14 days.

Understanding Load Monitoring

The Warehouse Load Over Time bar chart contains bars that represent time intervals. Each bar, or interval, represents the query loads that are based on query statuses for the queries executing or queuing at that time. There are four types of query statuses: Running, Queued, Queued Provisioning, and Queued Repairing.
  • Running: Queries that were actively running during the interval. These queries may have started before or during the interval.

  • Queued: Queries that are in wait status. The wait could be because of the warehouse load being maxed out and therefore would need to wait for running queries to complete processing.

  • Queued Provisioning: Queries that are in wait status because the warehouse is provisioning, usually after a warehouse resumes.

  • Queued Repairing: Queries that are in wait status while the warehouse is repaired. While rare, this occurs only for a few minutes.

Query load is calculated by taking the execution time for each query and dividing it by the interval time in seconds. This produces a percent that is then additive for all the queries in that interval. The percent is divided up by query status, which is displayed in the bar chart.

Best Practices for Load Monitoring
These are some best practices for load monitoring:
  • A high query load and queuing are indicative of needing to start up a separate warehouse. Move the queued queries to the new warehouse.

  • A low query load could mean your running queries need more compute resources. Resizing the warehouse can take care of this.

  • Use the load monitor to study how the daily workloads look over the course of two weeks. Notice the trends in spikes and consider creating a warehouse specifically for peak workloads. Or consider switching to multicluster so that autoscaling takes care of the peak traffic.

  • Use the load monitor to look at trends in Snowflake credit usages. See whether the minimum cluster count or warehouse size needs to be decreased to save money on credits.

Summary

This chapter covered the basics of Snowflake virtual warehouses. You should now know the different virtual warehouse sizes and features and what a multicluster virtual warehouse is. We also reviewed some important things to consider with your virtual warehouse. Last, we showed how to create your first virtual warehouse and highlighted the load monitoring tool.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.10.45