© Dmitry Anoshin, Dmitry Shirokov, Donna Strok 2020
D. Anoshin et al.Jumpstart Snowflakehttps://doi.org/10.1007/978-1-4842-5328-1_7

7. Snowflake Administration

Dmitry Anoshin1 , Dmitry Shirokov2 and Donna Strok3
(1)
British Columbia, Canada
(2)
Burnaby, BC, Canada
(3)
Seattle, WA, USA
 

Snowflake is a database, and as such it comes with similar administration features as any other database. It was also the first data warehouse as a service (DWaaS), meaning that end users can do a minimum of administration and maintenance.

This chapter provides an overview of options for managing your Snowflake account, geared primarily to Snowflake administrators. However, it is also useful for end users to understand the key concepts of Snowflake administration and management.

There are several main tasks required of administrators:
  • Administering roles and users

  • Administering account parameters

  • Administering resource consumption

  • Administering databases and warehouses

  • Administering data shares

  • Administering database objects

  • Administering clustered tables

We will cover all these topics and show how it works using our Snowflake demo.

Administering Roles and Users

Snowflake uses roles for managing access and operations. In other words, you can create custom roles with a set of privileges to control the granularity of the access granted. For instance, say we want to create a role for our marketing team, which will grant the team members access to the data and allow them to run SQL queries using a virtual warehouse. According to the Snowflake model, access to securable objects is managed by privileges assigned to roles. Moreover, roles can be assigned to other roles and users.

Snowflake leverages the following access control models:
  • Discretionary access control (DAC): Each object has an owner, and this owner can manage the access of the object.

  • Role-based access control (RBAC): Roles are created and assigned privileges, and then the roles are assigned to users.

    Note A securable object is a Snowflake entity to which access can be granted (i.e., database, table, access, and so on). A privilege is a level of access to an object.

Figure 7-1 shows an example of the Marketing role that grants the privileges USAGE, MODIFY, and OPERATE to the securable objects DATABASE and WAREHOUSE for marketing users.
../images/482416_1_En_7_Chapter/482416_1_En_7_Fig1_HTML.jpg
Figure 7-1

Example of Marketing role that is granted specific privileges for marketing users

When we launched our example Snowflake account, it had a number of predefined default roles.
  • ACCOUNTADMIN: This account administrator role is the top-level role for a Snowflake account.

  • SYSADMIN: This system administrator role is for creating and managing databases and warehouse.

  • PUBLIC: This is a pseudo-role that can be assigned to any object, but they all will be available for all account users.

  • SECURITYADMIN: This security administrator role is for creating and managing roles and users.

You can create custom roles with the SECURITYADMIN role, or you can grant the CREATE ROLE privilege to any new role. For any custom role, you should think about the role hierarchy in order to assign your new custom role to the one of the high-level administration roles. Figure 7-2 shows an example of this hierarchy. It shows the Marketing role, which has privileges for the marketing database, schema, and warehouse that belong to the SYSADMIN role.
../images/482416_1_En_7_Chapter/482416_1_En_7_Fig2_HTML.jpg
Figure 7-2

This is an example of a custom role hierarchy

Enforcement Model

When you are connecting your Snowflake account with the web interface or ODBC/JDBC, a session is initiated, and it has a current role that consists of managed permissions for user. It is possible to change the role using the USE ROLE command or to switch roles by using the menu in the top-right corner.
../images/482416_1_En_7_Chapter/482416_1_En_7_Fig3_HTML.jpg
Figure 7-3

Role switching

When a user wants to perform any action in Snowflake, Snowflake will compare the user’s role privileges against the required privileges.

Note

You may be familiar with the concept of a super-user or super-role with other database vendors, but you will not find this functionality in Snowflake. All access requires the appropriate access privileges.

Working with Roles and Users

Snowflake allows you to control your data warehouse at a granular level within roles. To create a role, you can execute DDL commands or use the web interface. The following commands are available:
  • CREATE ROLE: Creates a new role

  • ALTER ROLE: Changes an existing role

  • DROP ROLE: Drops an existing role

  • SHOW ROLES: Shows a list of available roles

  • USE ROLE: Switches a role for the session

Let’s create a new role. Log into your Snowflake account and make sure that you choose the SECURITYADMIN role.
CREATE ROLE MARKETING_TEAM;
This command will create a role. Next, grant permissions for this role and attach users. The following commands are available for user management:
  • CREATE USER: Creates a new user

  • ALTER USER: Changes an existing user

  • DESCRIBE USER: Describes an existing user

  • SHOW PARAMETERS: Shows the parameters of the user

In addition, we can specify the following options for users:
  • userProperties: Properties such as password, display name, and so on.

  • sessionParams: Session options such as default warehouse and namespace

Let’s run a command that will create a new user and assign him to the MARKETING_TEAM role.
CREATE USER marketing_analyst PASSWORD = 'RockYourData' COMMENT = 'Marketing Analyst' LOGIN_NAME = 'marketing_user1' DISPLAY_NAME = 'Marketing_Analyst' DEFAULT_ROLE = "MARKETING_TEAM" DEFAULT_WAREHOUSE = 'SF_TUTS_WH' MUST_CHANGE_PASSWORD = TRUE;
GRANT ROLE "MARKETING_TEAM" TO USER marketing_analyst;

You can achieve the same result using the web interface.

The last part of the code grants privileges for the MARKETING_TEAM role so new users can run SQL queries. We should grant OPERATE and USAGE for the virtual warehouse to our new role, like this:
GRANT USAGE ON WAREHOUSE SF_TUTS_WH TO ROLE MARKETING_TEAM;
GRANT OPERATE ON WAREHOUSE SF_TUTS_WH TO ROLE MARKETING_TEAM;

Note

SF_TUTS_WH is a small virtual warehouse that was created previously, but you can use your own warehouse. For demo purposes, it is always good to use the smallest computing instance.

Again, we can use the web interface to perform the same actions. Then we can log in with a new user, using login marketing_user1, and run this sample query:
SELECT * FROM "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF1"."REGION"
If you want to achieve the same result using the web interface, you should navigate to the Account menu, as shown in Figure 7-4.
../images/482416_1_En_7_Chapter/482416_1_En_7_Fig4_HTML.jpg
Figure 7-4

Account menu

By default the Account menu is available for the role ACCOUNTADMIN. This menu is usually accessible for Snowflake administrators. It allows them to manage users and roles, control credit usage, and so on.

Administering Resource Consumption

The next important topic for Snowflake administrators is resource consumption. Keeping track of storage and compute resources is critical for Snowflake customers. Snowflake provides administrative capabilities for monitoring credit and storage usage as well as provides resource monitors that can send alerts on usage spikes and automatically suspend the virtual warehouse.

By default, only the ACCOUNTADMIN role has access to the billing information. But this access can be provided for other users and roles with the monitor usage permissions.

As you know, Snowflake has a unique architecture that splits compute resources (virtual warehouses) and data storage. The cost of Snowflake consists of these two elements and is based on credits. In our case, when we created the Snowflake demo account for this book, we were granted 400 credits, and we are tracking consumption.

Virtual Warehouse Usage

You are already familiar with virtual warehouses (VWs) and its T-shirt sizes. Snowflake will charge credits for using a VW. In other words, the price depends on the number of VWs, their size, and how long they are running queries.

Note

Credits are billed per second, with a 60-second minimum.

You can use the table function WAREHOUSE_METERING_HISTORY that will show us hourly credit usage, or you can use web interface and click Account ➤ Billing & Usage. Let’s run this code to see the usage for the last seven days:
select * from table(information_schema.warehouse_metering_history(dateadd('days',-7,current_date())));
In addition, we can specify the VW name as a parameter. Figure 7-5 shows an example of sample usage.
../images/482416_1_En_7_Chapter/482416_1_En_7_Fig5_HTML.jpg
Figure 7-5

Sample usage of credits for virtual warehouse for the XS, XL, and 3XL instance sizes

Data Storage Usage

Another aspect of the price is storage. Snowflake calculates the price of storage monthly based on the average daily storage space. It includes files stored in the Snowflake stage, data stored in databases, and historical data maintained for a fail-safe. Moreover, time traveling and cloned objects are consuming storage. The price is based on a flat rate per terabyte (TB).

Note

The TB price depends on the type of account (capacity or on-demand), region, and cloud provider.

We can review the usage data using the web interface, as shown in Figure 7-6.
../images/482416_1_En_7_Chapter/482416_1_En_7_Fig6_HTML.jpg
Figure 7-6

Snowflake usage report

Also, you can leverage table functions and a Snowflake view, as shown here:
#Database Storage for last 7 days
select * from table(information_schema.database_storage_usage_history(dateadd('days',-7,current_date()),current_date()));
#Stage Storage for last 7 days
select * from table(information_schema.stage_storage_usage_history(dateadd('days',-7,current_date()),current_date()));
#Table Storage utilization
select * from table_storage_metrics

Note

Make sure that data is in a compressed format in the Snowflake staging area. Another consideration is to use external storage options like Amazon S3 where you can set the data lifecycle policy and archive cold data.

Data Transfer Usage

Snowflake is available in multiple regions for AWS, Azure, Google Cloud Platform. You should take into consideration one more aspect of possible cost. If you are using an external stage (AWS S3 or Azure Blob Storage), you may be charged for data transfers between regions.

Snowflake charges a fee for unloading data into S3 or Blog Storage within the same region or across regions.

Note

Snowflake won’t charge you for loading data from external storage.

There is an internal Snowflake function that will help us to track this cost, as shown here:
#Cost for the last 7 days
select * from table(information_schema.data_transfer_history(date_range_start=>dateadd('day',-7,current_date()),date_range_end=>current_date()));

Administering Databases and Warehouses

There are a number of actions we can do with databases and warehouses. As usual, you have a choice to use the web interface or execute SQL commands.

We covered VWs in Chapter 2. In this section, we will review actions that we can do with VWs and databases.

Managing Warehouses

As an administrator, you can use the following commands with warehouses:
  • CREATE WAREHOUSE

  • DROP WAREHOUSE

  • ALTER WAREHOUSE

  • USE WAREHOUSE

When you are creating a new warehouse, you are specifying parameters such as size, type, and so on. Let’s create a new warehouse by executing this command:
CREATE WAREHOUSE RYD WITH WAREHOUSE_SIZE = 'XSMALL' WAREHOUSE_TYPE = 'STANDARD' AUTO_SUSPEND = 300 AUTO_RESUME = TRUE COMMENT = 'Rock Your Data Virtual Warehouse';
We chose the smallest possible warehouse size, XSMALL. In addition, we have two additional parameters.
  • AUTO SUSPEND: This will stop the warehouse if it is idle for more than 300 seconds.

  • AUTO RESUME: This will start a suspended warehouse when needed.

You also have an option to resize the warehouse using the ALTER WAREHOUSE command . Finally, you can use the command USE WAREHOUSE to specify which warehouse to use for the current session.

Note

ALTER WAREHOUSE is a unique feature. It exists only in Snowflake. This command suspends or resumes a virtual warehouse or aborts all queries (and other SQL statements) for a warehouse. It can also be used to rename or set/unset the properties for a warehouse. There are more details available at https://docs.snowflake.net/manuals/sql-reference/sql/alter-warehouse.html.

Managing Databases

All data in Snowflake is stored in database tables. It is structured as a collection of columns and rows. For each database, we can define one or many schemas. Inside each schema, we are creating database objects such as tables and views.

Note

Snowflake doesn’t have a hard limit on the number of databases, schemas, or database objects.

These are the commands available for database management:
  • CREATE DATABASE

  • CREATE DATABASE CLONE

  • ALTER DATABASE

  • DROP DATABASE

  • UNDROP DATABASE

  • USE DATABASE

  • SHOW DATABASES

These commands could be executed via the web interface of SQL. Let’s create a database.
CREATE DATABASE MARKETING_SANDBOX;

In addition, we can grant privileges such as CREATE SCHEMA, MODIFY, MONITOR, and USAGE for a specific role.

Overall, the operations look similar to traditional databases. However, there are a couple of unique features that are worth mentioning.

First is UNDROP DATABASE . Let’s imagine that you accidentally drop the production database. Restoring it from backup could take at least a day. But not with Snowflake, where you can instantly restore the most recent version of a dropped database if you are within the defined retention window for that database.

Zero-Copy Cloning

Another unique feature is zero-copy cloning, which creates a snapshot of a database. This snapshot is writable and independent. These types of features are like a “dream come true” for data warehouse DBAs.

There are many situations where people need to copy their database to test or experiment with their data to avoid altering their sensitive production database. However, copying data can be painful and time-consuming because all the data needs to be physically moved from the production database to the database copy. This is extremely expensive because both copies of the data need to be paid for. When a production database gets updates, the database copy becomes stale and requires an update.

Snowflake takes a different approach. It enables us to test and experiment with our data more freely. It allows us to copy databases in seconds. Snowflake doesn’t physically copy data. It continues to reference the original data and will store new records only when you update or change the data; therefore, you will pay for each unique record only once. Finally, we can use zero-copy cloning with the Time Travel feature.

Figure 7-7 shows an option for cloning a database using the web interface.
../images/482416_1_En_7_Chapter/482416_1_En_7_Fig7_HTML.jpg
Figure 7-7

Web interface for cloning a database

As usual, we have the option to execute a command. Here are examples of commands with definitions:
--Clone a database and all objects within the database at its current state:
create database mytestdb_clone clone mytestdb;
--Clone a schema and all objects within the schema at its current state:
create schema mytestschema_clone clone testschema;
--Clone a table at its current state:
create table orders_clone clone orders;
--Clone a schema as it existed before the date and time in the specified timestamp:
create schema mytestschema_clone_restore clone testschema before (timestamp => to_timestamp(40*365*86400));
--Clone a table as it existed exactly at the date and time of the specified timestamp:
create table orders_clone_restore clone orders at (timestamp => to_timestamp_tz('04/05/2013 01:02:03', 'mm/dd/yyyy hh24:mi:ss'));

Administering Account Parameters

Parameters control the behavior of our Snowflake account, individual user sessions, and objects. All parameters are available on the Snowflake documentation page.

We can split all the parameters into types:
  • Account parameters: These are set at the account level.

  • Sessions parameters (majority): These are set for the session, user, and account.

  • Object parameters: These are set for the account and object.

To override the default parameters, you can use the following commands:
  • ALTER ACCOUNT

  • ALTER SESSION

  • CREATE <object> or ALTER <object>

To see the available parameters and their options, run the following:
show parameters;

Moreover, we can look the parameters for a specific database or warehouse.

These are some examples of parameters:
  • STATEMENT_TIMEOUT_IN_SECONDS: Specifies the amount of time after which a running SQL statement is canceled by the system. This parameter will help to control end users and prevent bad and heavy queries.

  • MAX_CONCURRENCY_LEVEL: Specifies the maximum number of SQL statements a warehouse cluster can execute concurrently.

  • TIMEZONE: Specifies the time zone setting for the session.

Administering Database Objects

One of the most common administration tasks within Snowflake is to manage database objects such as tables, views, schemas, stages, file formats, and so on.

All database objects are created under the schema. Traditional databases objects such as table, view, materialized view, and sequence have similar options:
  • CREATE

  • ALTER

  • DROP

  • SHOW

  • DESCRIBE

Moreover, Snowflake Administartor may leverage Snowflake unique capabilities like UNDROP and zero-copy cloning.

Another set of schema-level objects that are used in Snowflake include the following:
  • Stage: Used for storing data files; could be internal and external

  • File format: File format options (CSV, Parquet, etc.) and formatting options for each file type

  • Pipe: Single copy statement for loading a set of data files

  • UDF: User-defined function; a custom function that consists of SQL and JavaScript

As a Snowflake administrator, you may need to manage these objects.

Administering Data Shares

Secure data shares are another unique feature of Snowflake and will be covered in Chapter 10. This feature allows you to become a data provider by creating a data share using the CREATE SHARE command . By default, this is available only for the ACCOUNTADMIN role.

These are the available commands:
  • CREATE SHARE

  • ALTER SHARE

  • DROP SHARE

  • DESCRIBE SHARE

  • SHOW SHARE

Note

As a share creator, you are responsible for data security. Before you create a share, you should spend some time to learn more about data and use cases to prevent the sharing of sensitive data. Secure views and UDFs are handy to use when creating shares.

After share creation, an admin can view, grant, or revoke access to database objects using the following commands:
  • GRANT <privilege> TO SHARE: Grants access to share

  • REVOKE <privilege> TO SHARE: Revokes access to share

  • SHOW GRANTS TO SHARE: Shows all object privileges that have been granted to share

  • SHOW GRANTS OF SHARE: Shows all accounts for the share and accounts that are using shares

In some cases, if you don’t need to share anymore and want to drop it, you should consider the downstream impact for all consumers. As an option, you may revoke grants on some objects and see the result.

Administering Clustered Tables

As you know, Snowflake is a data warehouse as a service. The idea here is simple: you just use the data warehouse, and you don’t need to think about data distribution, sorting, and table statistics.

One aspect of Snowflake performance is micro-partitioning. When we are loading data into Snowflake, it is automatically divided into micro-partitions with 50 MB to 500 MB of compressed data. These micro-partitions are organized in a columnar fashion. In addition, Snowflake collects and stores metadata in micro-partitions. This helps to optimize query plans and improve query performance by avoiding unnecessary scanning of micro-partitions through an operation known as partition pruning .

Snowflake also stores data in tables and tries to sort it along natural dimensions such as date and/or geographic regions. This is called data clustering , and it is a key factor for query performance. It is important, especially for large tables. By default, Snowflake uses automatic clustering. However, in some cases we may define the clustering key within the CREATE TABLE statement to change the default behavior. This should be an exception rather than a rule. In most cases, admins will not need to cluster. Best practice is to avoid clustering unless there is a specific query pattern that does not meet the SLA. In general, you should not need to cluster unless the table is at least 1 TB.

As a Snowflake administrator, you may need to review table clustering and run reclustering processes to identify all the problem tables and provide the best possible performance.

There are two system functions that allow us to monitor clustering information for tables:

If you need to improve the clustering of data, you should create a new table with a new clustering key and insert data into the new table, or you can use materialized views (MVs) to create a version of the table with the new cluster key. Then the MV function will automatically keep the MV data in sync with the new data added to the base table.

Note

A table with clustering keys defined is considered to be clustered. Clustering keys aren’t important for all tables. Whether to use clustering depends on the size of a table and the query performance, and it is most suitable for multiterabyte tables.

Snowflake Materialized Views

When working with Teradata, we worked with materialized views (MVs) a lot. Basically, we had a complex SQL query that could produce metrics and dimensions for a business intelligence solution. Because of complex SQL logic, joins, and derived columns, we used MVs for improving query performance. However, traditional MVs had their downsides. For example, it was important to keep data in the VM up-to-date and refresh it daily with an ETL process. In addition, we experienced slowdowns in performance while updating the MVs using Data Manipulation Language commands.

Snowflake engineers didn’t abandon the MV concept and added this functionality to Enterprise Edition. According to Snowflake, a materialized view is a precomputed data set derived from a query specification (the SELECT in the view definition) and stored for later use. Because the data is precomputed, querying a materialized view is faster than executing the original query. This performance difference can be significant when a query is run frequently or is sufficiently complex.

Note

Materialized views are designed to improve query performance for workloads composed of common, repeated query patterns. However, materializing intermediate results incurs additional costs. As such, before creating any materialized views, you should consider whether the costs are offset by the savings from reusing these results frequently.

There are a couple use cases when we can benefit from using MVs:
  • The query results contain a small number of rows and/or columns relative to the base table (the table on which the view is defined).

  • The query results require significant processing, including the following:
    • Analysis of semistructured data

    • Aggregates that take a long time to calculate

The main benefit of Snowflake MVs is that they solve the issues of traditional MVs. MVs are views that are automatically maintained by Snowflake. There is a background service that updates the materialized view after changes are made to the base table. This is more efficient and less error-prone than manually maintaining the equivalent of a materialized view at the application level.

Table 7-1 shows the key similarities and differences between tables, regular views, cached query results, and materialized views.
Table 7-1

Key Similarities and Differences

 

Performance Benefits

Security Benefits

Simplifies Query Logic

Supports Clustering

Uses Storage

Uses Credits for Maintenance

Regular table

   

../images/482416_1_En_7_Chapter/482416_1_En_7_Figa_HTML.gif

../images/482416_1_En_7_Chapter/482416_1_En_7_Figa_HTML.gif

 

Regular view

 

../images/482416_1_En_7_Chapter/482416_1_En_7_Figa_HTML.gif

../images/482416_1_En_7_Chapter/482416_1_En_7_Figa_HTML.gif

   

Cached query result

../images/482416_1_En_7_Chapter/482416_1_En_7_Figa_HTML.gif

     

Materialized view

../images/482416_1_En_7_Chapter/482416_1_En_7_Figa_HTML.gif

../images/482416_1_En_7_Chapter/482416_1_En_7_Figa_HTML.gif

../images/482416_1_En_7_Chapter/482416_1_En_7_Figa_HTML.gif

../images/482416_1_En_7_Chapter/482416_1_En_7_Figa_HTML.gif

../images/482416_1_En_7_Chapter/482416_1_En_7_Figa_HTML.gif

../images/482416_1_En_7_Chapter/482416_1_En_7_Figa_HTML.gif

Summary

In this chapter, we covered the main Snowflake administrative duties (e.g., user and role administration), and you learned about key Snowflake objects (e.g., warehouses and schema-level objects). In addition, we reviewed billing and usage information. Finally, we covered data shares and data clustering concepts as well as materialized views.

In the next chapter, you will learn about one of the key elements of cloud analytics: security.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.35.194