Snowflake is a database, and as such it comes with similar administration features as any other database. It was also the first data warehouse as a service (DWaaS), meaning that end users can do a minimum of administration and maintenance.
This chapter provides an overview of options for managing your Snowflake account, geared primarily to Snowflake administrators. However, it is also useful for end users to understand the key concepts of Snowflake administration and management.
Administering roles and users
Administering account parameters
Administering resource consumption
Administering databases and warehouses
Administering data shares
Administering database objects
Administering clustered tables
We will cover all these topics and show how it works using our Snowflake demo.
Administering Roles and Users
Snowflake uses roles for managing access and operations. In other words, you can create custom roles with a set of privileges to control the granularity of the access granted. For instance, say we want to create a role for our marketing team, which will grant the team members access to the data and allow them to run SQL queries using a virtual warehouse. According to the Snowflake model, access to securable objects is managed by privileges assigned to roles. Moreover, roles can be assigned to other roles and users.
Discretionary access control (DAC): Each object has an owner, and this owner can manage the access of the object.
Role-based access control (RBAC): Roles are created and assigned privileges, and then the roles are assigned to users.
Note A securable object is a Snowflake entity to which access can be granted (i.e., database, table, access, and so on). A privilege is a level of access to an object.
ACCOUNTADMIN: This account administrator role is the top-level role for a Snowflake account.
SYSADMIN: This system administrator role is for creating and managing databases and warehouse.
PUBLIC: This is a pseudo-role that can be assigned to any object, but they all will be available for all account users.
SECURITYADMIN: This security administrator role is for creating and managing roles and users.
Enforcement Model
When a user wants to perform any action in Snowflake, Snowflake will compare the user’s role privileges against the required privileges.
Note
You may be familiar with the concept of a super-user or super-role with other database vendors, but you will not find this functionality in Snowflake. All access requires the appropriate access privileges.
Working with Roles and Users
CREATE ROLE: Creates a new role
ALTER ROLE: Changes an existing role
DROP ROLE: Drops an existing role
SHOW ROLES: Shows a list of available roles
USE ROLE: Switches a role for the session
CREATE USER: Creates a new user
ALTER USER: Changes an existing user
DESCRIBE USER: Describes an existing user
SHOW PARAMETERS: Shows the parameters of the user
userProperties: Properties such as password, display name, and so on.
sessionParams: Session options such as default warehouse and namespace
You can achieve the same result using the web interface.
Note
SF_TUTS_WH is a small virtual warehouse that was created previously, but you can use your own warehouse. For demo purposes, it is always good to use the smallest computing instance.
By default the Account menu is available for the role ACCOUNTADMIN. This menu is usually accessible for Snowflake administrators. It allows them to manage users and roles, control credit usage, and so on.
Administering Resource Consumption
The next important topic for Snowflake administrators is resource consumption. Keeping track of storage and compute resources is critical for Snowflake customers. Snowflake provides administrative capabilities for monitoring credit and storage usage as well as provides resource monitors that can send alerts on usage spikes and automatically suspend the virtual warehouse.
By default, only the ACCOUNTADMIN role has access to the billing information. But this access can be provided for other users and roles with the monitor usage permissions.
As you know, Snowflake has a unique architecture that splits compute resources (virtual warehouses) and data storage. The cost of Snowflake consists of these two elements and is based on credits. In our case, when we created the Snowflake demo account for this book, we were granted 400 credits, and we are tracking consumption.
Virtual Warehouse Usage
You are already familiar with virtual warehouses (VWs) and its T-shirt sizes. Snowflake will charge credits for using a VW. In other words, the price depends on the number of VWs, their size, and how long they are running queries.
Note
Credits are billed per second, with a 60-second minimum.
Data Storage Usage
Another aspect of the price is storage. Snowflake calculates the price of storage monthly based on the average daily storage space. It includes files stored in the Snowflake stage, data stored in databases, and historical data maintained for a fail-safe. Moreover, time traveling and cloned objects are consuming storage. The price is based on a flat rate per terabyte (TB).
Note
The TB price depends on the type of account (capacity or on-demand), region, and cloud provider.
Note
Make sure that data is in a compressed format in the Snowflake staging area. Another consideration is to use external storage options like Amazon S3 where you can set the data lifecycle policy and archive cold data.
Data Transfer Usage
Snowflake is available in multiple regions for AWS, Azure, Google Cloud Platform. You should take into consideration one more aspect of possible cost. If you are using an external stage (AWS S3 or Azure Blob Storage), you may be charged for data transfers between regions.
Snowflake charges a fee for unloading data into S3 or Blog Storage within the same region or across regions.
Note
Snowflake won’t charge you for loading data from external storage.
Administering Databases and Warehouses
There are a number of actions we can do with databases and warehouses. As usual, you have a choice to use the web interface or execute SQL commands.
We covered VWs in Chapter 2. In this section, we will review actions that we can do with VWs and databases.
Managing Warehouses
CREATE WAREHOUSE
DROP WAREHOUSE
ALTER WAREHOUSE
USE WAREHOUSE
AUTO SUSPEND: This will stop the warehouse if it is idle for more than 300 seconds.
AUTO RESUME: This will start a suspended warehouse when needed.
You also have an option to resize the warehouse using the ALTER WAREHOUSE command . Finally, you can use the command USE WAREHOUSE to specify which warehouse to use for the current session.
Note
ALTER WAREHOUSE is a unique feature. It exists only in Snowflake. This command suspends or resumes a virtual warehouse or aborts all queries (and other SQL statements) for a warehouse. It can also be used to rename or set/unset the properties for a warehouse. There are more details available at https://docs.snowflake.net/manuals/sql-reference/sql/alter-warehouse.html.
Managing Databases
All data in Snowflake is stored in database tables. It is structured as a collection of columns and rows. For each database, we can define one or many schemas. Inside each schema, we are creating database objects such as tables and views.
Note
Snowflake doesn’t have a hard limit on the number of databases, schemas, or database objects.
CREATE DATABASE
CREATE DATABASE CLONE
ALTER DATABASE
DROP DATABASE
UNDROP DATABASE
USE DATABASE
SHOW DATABASES
In addition, we can grant privileges such as CREATE SCHEMA, MODIFY, MONITOR, and USAGE for a specific role.
Overall, the operations look similar to traditional databases. However, there are a couple of unique features that are worth mentioning.
First is UNDROP DATABASE . Let’s imagine that you accidentally drop the production database. Restoring it from backup could take at least a day. But not with Snowflake, where you can instantly restore the most recent version of a dropped database if you are within the defined retention window for that database.
Zero-Copy Cloning
Another unique feature is zero-copy cloning, which creates a snapshot of a database. This snapshot is writable and independent. These types of features are like a “dream come true” for data warehouse DBAs.
There are many situations where people need to copy their database to test or experiment with their data to avoid altering their sensitive production database. However, copying data can be painful and time-consuming because all the data needs to be physically moved from the production database to the database copy. This is extremely expensive because both copies of the data need to be paid for. When a production database gets updates, the database copy becomes stale and requires an update.
Snowflake takes a different approach. It enables us to test and experiment with our data more freely. It allows us to copy databases in seconds. Snowflake doesn’t physically copy data. It continues to reference the original data and will store new records only when you update or change the data; therefore, you will pay for each unique record only once. Finally, we can use zero-copy cloning with the Time Travel feature.
Administering Account Parameters
Parameters control the behavior of our Snowflake account, individual user sessions, and objects. All parameters are available on the Snowflake documentation page.
Account parameters: These are set at the account level.
Sessions parameters (majority): These are set for the session, user, and account.
Object parameters: These are set for the account and object.
ALTER ACCOUNT
ALTER SESSION
CREATE <object> or ALTER <object>
Moreover, we can look the parameters for a specific database or warehouse.
STATEMENT_TIMEOUT_IN_SECONDS: Specifies the amount of time after which a running SQL statement is canceled by the system. This parameter will help to control end users and prevent bad and heavy queries.
MAX_CONCURRENCY_LEVEL: Specifies the maximum number of SQL statements a warehouse cluster can execute concurrently.
TIMEZONE: Specifies the time zone setting for the session.
Administering Database Objects
One of the most common administration tasks within Snowflake is to manage database objects such as tables, views, schemas, stages, file formats, and so on.
CREATE
ALTER
DROP
SHOW
DESCRIBE
Moreover, Snowflake Administartor may leverage Snowflake unique capabilities like UNDROP and zero-copy cloning.
Stage: Used for storing data files; could be internal and external
File format: File format options (CSV, Parquet, etc.) and formatting options for each file type
Pipe: Single copy statement for loading a set of data files
UDF: User-defined function; a custom function that consists of SQL and JavaScript
As a Snowflake administrator, you may need to manage these objects.
Administering Data Shares
Secure data shares are another unique feature of Snowflake and will be covered in Chapter 10. This feature allows you to become a data provider by creating a data share using the CREATE SHARE command . By default, this is available only for the ACCOUNTADMIN role.
CREATE SHARE
ALTER SHARE
DROP SHARE
DESCRIBE SHARE
SHOW SHARE
Note
As a share creator, you are responsible for data security. Before you create a share, you should spend some time to learn more about data and use cases to prevent the sharing of sensitive data. Secure views and UDFs are handy to use when creating shares.
GRANT <privilege> TO SHARE: Grants access to share
REVOKE <privilege> TO SHARE: Revokes access to share
SHOW GRANTS TO SHARE: Shows all object privileges that have been granted to share
SHOW GRANTS OF SHARE: Shows all accounts for the share and accounts that are using shares
In some cases, if you don’t need to share anymore and want to drop it, you should consider the downstream impact for all consumers. As an option, you may revoke grants on some objects and see the result.
Administering Clustered Tables
As you know, Snowflake is a data warehouse as a service. The idea here is simple: you just use the data warehouse, and you don’t need to think about data distribution, sorting, and table statistics.
One aspect of Snowflake performance is micro-partitioning. When we are loading data into Snowflake, it is automatically divided into micro-partitions with 50 MB to 500 MB of compressed data. These micro-partitions are organized in a columnar fashion. In addition, Snowflake collects and stores metadata in micro-partitions. This helps to optimize query plans and improve query performance by avoiding unnecessary scanning of micro-partitions through an operation known as partition pruning .
Snowflake also stores data in tables and tries to sort it along natural dimensions such as date and/or geographic regions. This is called data clustering , and it is a key factor for query performance. It is important, especially for large tables. By default, Snowflake uses automatic clustering. However, in some cases we may define the clustering key within the CREATE TABLE statement to change the default behavior. This should be an exception rather than a rule. In most cases, admins will not need to cluster. Best practice is to avoid clustering unless there is a specific query pattern that does not meet the SLA. In general, you should not need to cluster unless the table is at least 1 TB.
As a Snowflake administrator, you may need to review table clustering and run reclustering processes to identify all the problem tables and provide the best possible performance.
SYSTEM$CLUSTERING_DEPTH: This calculates the average depth of the table according to the specific columns.
SYSTEM$CLUSTERING_INFORMATION: This calculates clustering details, including clustering depth, for a specific table.
If you need to improve the clustering of data, you should create a new table with a new clustering key and insert data into the new table, or you can use materialized views (MVs) to create a version of the table with the new cluster key. Then the MV function will automatically keep the MV data in sync with the new data added to the base table.
Note
A table with clustering keys defined is considered to be clustered. Clustering keys aren’t important for all tables. Whether to use clustering depends on the size of a table and the query performance, and it is most suitable for multiterabyte tables.
Snowflake Materialized Views
When working with Teradata, we worked with materialized views (MVs) a lot. Basically, we had a complex SQL query that could produce metrics and dimensions for a business intelligence solution. Because of complex SQL logic, joins, and derived columns, we used MVs for improving query performance. However, traditional MVs had their downsides. For example, it was important to keep data in the VM up-to-date and refresh it daily with an ETL process. In addition, we experienced slowdowns in performance while updating the MVs using Data Manipulation Language commands.
Snowflake engineers didn’t abandon the MV concept and added this functionality to Enterprise Edition. According to Snowflake, a materialized view is a precomputed data set derived from a query specification (the SELECT in the view definition) and stored for later use. Because the data is precomputed, querying a materialized view is faster than executing the original query. This performance difference can be significant when a query is run frequently or is sufficiently complex.
Note
Materialized views are designed to improve query performance for workloads composed of common, repeated query patterns. However, materializing intermediate results incurs additional costs. As such, before creating any materialized views, you should consider whether the costs are offset by the savings from reusing these results frequently.
The query results contain a small number of rows and/or columns relative to the base table (the table on which the view is defined).
- The query results require significant processing, including the following:
Analysis of semistructured data
Aggregates that take a long time to calculate
The main benefit of Snowflake MVs is that they solve the issues of traditional MVs. MVs are views that are automatically maintained by Snowflake. There is a background service that updates the materialized view after changes are made to the base table. This is more efficient and less error-prone than manually maintaining the equivalent of a materialized view at the application level.
Key Similarities and Differences
Performance Benefits | Security Benefits | Simplifies Query Logic | Supports Clustering | Uses Storage | Uses Credits for Maintenance | |
---|---|---|---|---|---|---|
Regular table | ||||||
Regular view | ||||||
Cached query result | ||||||
Materialized view |
Summary
In this chapter, we covered the main Snowflake administrative duties (e.g., user and role administration), and you learned about key Snowflake objects (e.g., warehouses and schema-level objects). In addition, we reviewed billing and usage information. Finally, we covered data shares and data clustering concepts as well as materialized views.
In the next chapter, you will learn about one of the key elements of cloud analytics: security.