© Dmitry Anoshin, Dmitry Shirokov, Donna Strok 2020
D. Anoshin et al.Jumpstart Snowflakehttps://doi.org/10.1007/978-1-4842-5328-1_8

8. Snowflake Security Overview

Dmitry Anoshin1 , Dmitry Shirokov2 and Donna Strok3
(1)
British Columbia, Canada
(2)
Burnaby, BC, Canada
(3)
Seattle, WA, USA
 

For many organizations, it is challenging to be able to provide security today, especially in the cloud, given the number of threats and attacks that are occurring daily. Safeguarding data is paramount for Snowflake. The Snowflake services platform was built with security in mind from the beginning. The company has implemented a security framework that we believe addresses a lot of their customers’ compliance challenges today.

Security is an important aspect in today’s world. Developers have to secure their data and prevent unauthorized access to it, which is why Snowflake encrypts all the data automatically, including data at rest and in transit. In addition, Snowflake provides multifactor authentication and performs federated authentication.

One of the challenges with on-premises solutions is that data can reside at many different locations, so controlling the data flow and who’s accessing it is challenging. With the cloud, you can build the right security controls to safeguard your data, but security doesn’t stop there. There are many more aspects that are related to monitoring and ensuring the system is constantly protected.

The Snowflake platform is a cloud-native solution, and it provides security so that you don’t need to worry; in other words, it is managed for you. Snowflake provides an end-to-end security solution to its customers, from when the data leaves a customer’s premises through the untrusted Internet to the point when it arrives at the Snowflake storage; all along the way, the data is protected. Moreover, Snowflake hardens all the virtual machines that data resides on. Snowflake encrypts data, does audits, monitors, sends alerts, and installs patches on a continuous basis. All of this actually simplifies and facilitates the security efforts of customers. So, the customer does not necessarily have to incur all the procedural and compliance costs associated with security.

In this chapter, you will learn about the main Snowflake security features:
  • Snowflake security reference architecture

  • Network and site access

  • Account and user authentication

  • Object security

  • Data security

  • Security validations

Snowflake Security Reference Architecture

As you might know from previous chapters, Snowflake has a multicluster shared data architecture. It separates the process of working with data and information into three distinct layers.
  • Storage layer, where all the data is stored in a columnar compressed format and is always encrypted.

  • Compute layer, comprised of virtual warehouses, which are the compute nodes that perform all of the data processing. Multiple virtual warehouses can work on the same data at the same time.

  • Services layer, also known as the “brains” of Snowflake. This is where all security information/metadata is stored and also where all query processing is completed. The service layer also includes transaction management, which coordinates across all of the virtual warehouses, allowing for a consistent set of operations against the same data at the same time.

This unique architecture allows Snowflake to ensure a high standard of security for its customers. Figure 8-1 shows Snowflake’s security reference architecture. It describes the components that make up Snowflake’s secure data warehouse. We will cover the key elements of this diagram in this chapter.

Note

This chapter will cover the security features that are available to date. Snowflake is constantly working on adding new features.

../images/482416_1_En_8_Chapter/482416_1_En_8_Fig1_HTML.jpg
Figure 8-1

Snowflake security reference architecture

Virtual Private Cloud

First is the concept of a virtual private cloud (VPC) . Snowflake is implemented as a VPC within the cloud provider’s infrastructure. If a customer requires complete isolation from other Snowflake customers because of strict security requirements such as in the case of a financial institution, the Virtual Private Snowflake (VPS) edition must be used. When implemented, VPS is a Snowflake implementation entirely on its own VPC within the cloud provider’s infrastructure.

Physical Security

Each cloud provider, including Amazon Web Services, Microsoft Azure, and Google Cloud Platform, provide their own infrastructure and physical security to guard all of their cloud data. Physical security includes 24-hour armed guards and video surveillance to ensure no unauthorized access is allowed in the data center. Neither Snowflake personnel nor Snowflake customers have access to these data centers. Data redundancy is also a standard practice implemented by the cloud provider for data recovery.

You can learn more about physical security from each cloud vendor by visiting their documentation.

Network and Site Access

All customer access to the Snowflake service via the Internet is made via the secure protocol HTTPS. Moreover, all Internet communications between users and the Snowflakes service are secured and encrypted using TLS1.2 or higher.

All communication between connection methods and Snowflake is secure, regardless of the method used to connect, whether via the web user interface or ODBC or JDBC connectors. Authentication is required to gain access to Snowflake. These connections are encrypted and communicate solely over HTTPS.

Access to Snowflake is subject to network policies. These policies provide options for managing network configurations to the Snowflake service, such as restricting access to an account based on a user IP address. Currently, Snowflake customers can implement a network policy to create an IP whitelist, which is a list of allowed IP addresses, as well as an IP blacklist, which lists those IP addresses that are forbidden access.

Figure 8-2 shows the Snowflake web UI for managing access policies.

Moreover, you can manage policies using SQL commands. Usually, we will specify the IP address of our organization and will give access to Snowflake only to our employees. We don’t want to have a publicly available Snowflake endpoint.
../images/482416_1_En_8_Chapter/482416_1_En_8_Fig2_HTML.jpg
Figure 8-2

Managing access policies

For increased network connectivity security, private and direct communication between Snowflake and other VPCs can be set up via an AWS private link (in the case of AWS deployment). This feature, which effectively creates a private tunnel of communication between Snowflake and the VPC, is currently available only for the Business Critical Edition, formerly known as Enterprise for Sensitive Data (ESD), or VPS customers.

Account and User Authentication

For account access and user authentication, multifactor authentication (MFA) can be implemented for increased security on account access by users. MFA support is provided as an integrated Snowflake feature powered by the Duo security service and managed completely by Snowflake. The only additional task after enabling MFA is to install the Duo mobile application, which is supported on multiple smartphone platforms including iOS, Android, and Windows.

Currently, each user must enable MFA by themselves. As a security best practice, all users with the account admin role should enroll with MFA.

Single sign-on (SSO) is a user authentication method that, once enabled, allows users to authenticate through an external SAML 2.0–compliant identity provider known as an IDP.

When authenticated, users can securely initiate one or more sessions in Snowflake for the duration of their IDP session. These sessions can be initiated from within the interface provided by the IDP or directly from within Snowflake. This feature is available for customers on Enterprise Edition and up.

Object Security

Access to specific objects within Snowflake, such as warehouses, databases, schemas, tables, etc., is controlled by a hybrid model of discretionary access control (DAC) and role-based access control (RBACK) .

Note

Discretionary access control (DAC) is when each object has an owner, who can in turn grant access to that object. Role-based access control (RBAC) is when access privileges are assigned to roles, which are in turn assigned to users.

Discretionary access control means that each object created has an owner and that owner has control over the object. Role-based access control, as shown in Figure 8-3, makes use of roles that can be granted access to objects. These roles, in turn, can be granted to other roles or directly to users. The security admin system role in Snowflake is responsible for managing these privileges.
../images/482416_1_En_8_Chapter/482416_1_En_8_Fig3_HTML.jpg
Figure 8-3

Role-based access control

Data Security

Encryption is enabled by default in Snowflake. All customer data is encrypted at rest. This includes not only the database data but also the virtual warehouse cache and query results cache, which are both used for performance optimization within Snowflake. All communication is encrypted in transit over public networks and even within the Snowflake virtual private cloud for customers who use the Business Critical Edition.

Note

Advanced Encryption Standard (AES) is a symmetric encryption algorithm. The algorithm was developed by two Belgian cryptographers, Joan Daemen and Vincent Rijmen. AES was designed to be efficient in both hardware and software and supports a block length of 128 bits and key lengths of 128, 192, and 256 bits.

All files that are stored in Snowflake internal stage objects are automatically encrypted using either AES128 or AES256 strong encryption. Specific additions of Snowflake also provide periodic rekeying of encrypted data and support for customer-managed encryption keys.

Business Critical Edition of Snowflake allows us to use the Tri-Secret Secure feature. This encryption is achieved using key wrapping, which means using one key to lock up another. For example, if a user attempts to access encrypted data within Snowflake, the data must first be decrypted. To decrypt it, the data key is necessary, but the data key itself is also encrypted or wrapped and requires another key, which is the table key. Again, the table key is locked and requires yet another key, the account key, to unlock it. The account key is also locked and can be accessed using the root key that is stored in the hardware security model, or Amazon CloudHSM within the cloud provider in the case of an AWS implementation.

Amazon CloudHSM is a piece of hardware that is specialized for encryption. The account key would need to be passed into CloudHSM and unlocked by the root key. Then the hierarchy of table and data keys can be subsequently unlocked, and the unencrypted data can be returned to the user.

Encryption keys are rotated automatically for accounts running on certain editions of Snowflake. The entire process of rotating encryption keys is completed behind the scenes and is transparent to the end user. With key rotation, a new version of a key is created, and the previous version of this key is retired. The new version of the key is used to encrypt data, while the previous version of the key is retired and used only to decrypt data. In other words, with key rotation, new data gets fresh keys.

Snowflake takes security seriously, which is why the end-to-end encryption of data is a default feature of the service. Whether data is in flight between the customer and internal stage or at rest and stored in a Snowflake database table, the data is always in an encrypted state.

To protect data against loss, Snowflake leverages data redundancy implemented by the cloud infrastructure provider. Each cloud provider region is geographically dispersed to several data centers across several miles within the region. The cloud infrastructure within each region provides automatic synchronous replication of data to three different zones for redundancy, should one’s own have a failure. The data is available from one of the other two zones in the region.

Security Validation

Snowflakes supports multiple compliances, as described in Table 8-1. This makes Snowflake is an attractive platform for the financial, government and health industries where there are high compliance standards.
Table 8-1

Snowflake Security Validations

Type

Description

SOC 2 Type II

Designed for service providers storing customer data in the cloud. It requires companies to establish and follow strict information security policies and procedures encompassing the security, availability, processing, integrity, and confidentiality of customer data.

HIPPA

Stands for Health Insurance Portability and Accountability Act. Passed in 1996, HIPAA is a federal law that sets a national standard to protect medical records and other personal health information.

PCI DSS

Stands for Payment Card Industry Data Security Standard. This standard sets the requirements for organizations and sellers to safely and securely accept, store, process, and transmit card holder data during credit card transactions to prevent fraud and data breaches.

CAIQ

Stands for the Consensus Assessments Initiative Questionnaire (CAIQ). This is a survey provided by the Cloud Security Alliance (CSA) for cloud consumers and auditors to assess the security capabilities of a cloud service provider.

Snowflake Audit and Logging

Application audit logs are also available for tracking activity within Snowflake. All activity against the Snowflake service is logged within the service’s layer. To access this activity log, go to the History tab in the Snowflake web user interface. From this page, users can view each command that was attempted along with the user who attempted the command, when the action occurred, and whether it was successful. Figure 8-4 shows the History tab.
../images/482416_1_En_8_Chapter/482416_1_En_8_Fig4_HTML.jpg
Figure 8-4

Query History tab

If you click the SQL text, a dialog will pop up with a success or failure message, as well as with actions to take to resolve any errors.

Another field in the activity log is Query ID. This ID can be used by Snowflake Support to look up a specific query instance for troubleshooting. Again, Snowflake personnel do not have access to customer data but can access metadata such as the query statement and query plan.

Clicking the Query ID field in the activity log will jump to the Query Profiler, allowing the user to view how the query optimizer worked and if there are any bottlenecks to resolve.

Query Profiler

When we work with data warehouse and business intelligence, often we have to deal with performance issues. To understand why our query or our report is slow, we should understand the mechanics of querying. Query Profiler helps us to spot typical mistakes in SQL query expressions to identify potential performance bottlenecks and improvement opportunities.

To access it, go to the History tab or Worksheets tab. If we navigate to the History tab and choose any query ID and then navigate to Profile, we will see the visual plan for query execution, as per Figure 8-5.
../images/482416_1_En_8_Chapter/482416_1_En_8_Fig5_HTML.jpg
Figure 8-5

Query Profiler

Table 8-2 describes the key elements of the Query Profiler interface.
Table 8-2

Key Elements of Query Profiler

Element

Description

Steps

If the query was processed in multiple steps, you can toggle between each step.

Operator tree

The middle pane displays a graphical representation of all the operator nodes for the selected step, including the relationships between each operator node.

Node list

The middle pane includes a collapsible list of operator nodes by execution time.

Overview

The right pane displays an overview of the query profile. The display changes to operator details when an operator node is selected.

You can find more information about Query Profiler in the Snowflake documentation at https://docs.snowflake.net/manuals/user-guide/ui-query-profile.html.

Login History Audit Logs

Snowflake provides table functions for extracting audit log history from the metadata. The login history family of table functions can be used to look up user login history with various filters such as time range or specific user.

Additional SQL predicates can be used to further filter the results. This data remains available within the Snowflake metadata for seven days from the login event. Therefore, it can be extracted and loaded into a Snowflake schema or an external system such as a security information and event management system for more detailed audit history tracking. Table 8-3 describes the available tables and their purpose.
Table 8-3

Login History Audit Functions

Function

Description

LOGIN_HISTORY

Returns queries within a specified time range

LOGIN_HISTORY_BY_SESSION

Returns queries within a specified session and time range

Here is some example code for logging into the history audit logs:
--Retrieve up to the last 100 login events of the current user:
select *
from table(information_schema.login_history_by_user())
order by event_timestamp;
--Retrieve up to the last 1000 login events of the specified user:
select *
from table(information_schema.login_history_by_user('USER1', result_limit=>1000))
order by event_timestamp;
--Retrieve up to 100 login events of every user your current role is allowed to monitor in the last hour:
select *
from table(information_schema.login_history(dateadd('hours',-1,current_timestamp()),current_timestamp()))
order by event_timestamp;

Query History Audit Logs

The query logs in Snowflake can also be queried and extracted, just like the login history logs. The information in the query history family of functions is similar to the web user interface’s History tab output. The query history can be filtered by time range, by session user, or even by specific warehouse query. Query history is also available only for seven days. So, for extended query history tracking, it is recommended you export the data to an external system or Snowflake table. Table 8-4 describes the available functions and their purpose.
Table 8-4

Query History Audit Log Functions

Function

Description

QUERY_HISTORY

Returns queries within a specified time range

QUERY_HISTORY_BY_SESSION

Returns queries within a specified session and time range

QUERY_HISTORY_BY_USER

Returns queries submitted by a specified user within a specified time range

QUERY_HISTORY_BY_DATAWAREHOUSE

Returns queries executed by a specified warehouse within a specified time range

Here is some example code for query history audit logs:
--Retrieve up to the last 100 queries run in the current session:
select *
from table(information_schema.query_history_by_session())
order by start_time;
--Retrieve up to the last 100 queries run by the current user (or run by any user on any warehouse on which the current user has the MONITOR privilege):
select *
from table(information_schema.query_history())
order by start_time;
--Retrieve up to the last 100 queries run in the past hour by the current user (or run by any user on any warehouse on which the current user has the MONITOR privilege):
select *
from table(information_schema.query_history(dateadd('hours',-1,current_timestamp()),current_timestamp()))
order by start_time;

Penetration Testing

Penetration tests are an integral part of Snowflake’s ongoing testing of security controls and procedures. Seven to ten tests are performed each year to ensure no new holes or flaws arise in security. If a vulnerability is found, the security team will log and track it to closure. The results of these penetration tests are available to customers under NDA with Snowflake.

You can find more information about penetration testing in the article “Snowflake: Serious about security” by Susan Walsh at https://www.snowflake.com/blog/snowflake-seriously-serious-security/.

Summary

In this chapter, we briefly covered the key Snowflake security features in the following areas:
  • Network/site access

  • Account/user authentication

  • Object security

  • Data security

  • Security validation

  • Audit and logging

For each category, Snowflake provides extensive online documentation.

In the next chapter, you will learn about Snowflake’s unique capabilities of working with semistructured data formats like JSON, XML, and AVRO.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.171.125