In Chapter 4, you worked through some examples of account controls and learned how to implement Snowflake features to keep the bad guys out and protect your accounts. Chapter 4 is also referenced to implement corresponding monitoring and alerting.
Let’s take a step back and ask ourselves what other operational functions we can facilitate by building out monitoring, and see whether the Account Usage store
can fulfill our needs. The functional requirements typically include the following.
Audit: Both internal and external system audits would want to access the Account Usage store and run ad hoc queries. We cannot know the full suite of future requirements, but we can make an educated guess at the likely use cases an audit function would want to satisfy.
Consumption charging: Every finance department wants to know how many credits have been consumed and by whom and looks at consumption growth trends over time. The procurement team will want to know consumption numbers to project future trends and inform contract renewal negotiations.
Metadata cataloging: Enterprise data model mapping, object and attribute capture, and data lineage are valid use cases.
Event monitoring: This chapter’s primary purpose is tooling, which frequently monitors activity and raises alerts.
Please ensure your account is the Business Critical Edition. The examples here are dependent upon Business Critical Edition features. Otherwise, please re-register using a different email address by selecting the Business Critical option at https://signup.snowflake.com.
Snowflake continually enhances the Account Usage views and capability. Wherever possible, we want to build the ability to future-proof usage and enable new monitoring requirements with minimal re-work. I suggest the operational support team conduct monthly reviews of Account Usage views to identify changes proactively. Better to be on top of change and present our cybersecurity colleagues with an opportunity to tighten controls and add value.
Automating each control using streams and tasks
brings its own challenges and the obligation to record the status and logging of each task run. For simplicity and to reduce the page count, I have not implemented a status setting or log in each procedure. I leave this for you to investigate but note that you may encounter concurrency issues with multiple high-frequency processes updating the same status and logging table. Likewise, the code presented is a skeleton. You may wish to implement additional attributes such as URLs to confluence pages containing control information or an active_flag for each parameter. Just note the need to extend all dependent objects such as views and MERGE statements.
And add comments omitted due to page constraints. Hopefully, the code is sufficiently self-explanatory.
This chapter could—and probably should—be a book in its own right. The more I dig into the Account Usage store, the more I find of interest. After writing the monitoring suite over a year ago, it is a delight revisiting this subject, and I hope this chapter inspires you to investigate further. Please also see https://docs.snowflake.com/en/sql-reference/account-usage.html.
Let’s now discuss Snowflake Account Usage, what it is, how to use it, and offer practical hands-on suggestions to make your job easier when building monitoring tooling
.
What Is the Account Usage Store?
All RDBMS have a metadata catalog, Oracle has data dictionary views, dynamic performance views (v$), and a plethora of k$ and x$ tables—complex and hard to navigate. SQL Server and MySQL have the information_schema, and Snowflake has followed the same pattern by implementing an information schema for each created database.
Except for a single Snowflake account level holistic source of information, Snowflake supplies an imported database confusingly called SNOWFLAKE, where the Account Usage store resides, maintained by Snowflake, which also contains an information schema. Please note there are some features to be aware of. Account Usage views have a latency of between 45 minutes and 3 hours. Therefore, transactions can take time to appear in the views but typically store information for up to one year. But there is a way to gain access to data in real time by using information_schema, noting data is only stored for 14 days. I demonstrate both approaches. The distinction is important. The Account Usage store
is an aggregated view of Snowflake’s centrally maintained data presented as a shared database. The information schema is driven from the global services layer accessing the FoundationDB key-value store. We use both throughout this chapter.
Depending upon the role in use, the imported Snowflake database may not be visible, which changes the role. ACCOUNTADMIN is the only role enabled by default to access Account Usage store content.
USE ROLE accountadmin;
Then refresh as shown in Figure 6-1, after which the Snowflake database is viewable. SNOWFLAKE and SNOWFLAKE_SAMPLE_DATA are both imported databases. An imported database is one supplied by Snowflake as part of the initial account delivery. Like any other object, they can be dropped.
Note
Do not drop the SNOWFLAKE database!
You may drop SNOWFLAKE_SAMPLE_DATA, but note UNDROP is not supported for an imported database that must be re-created along with granting imported entitlement to roles.
By default, only the ACCOUNTADMIN role can access the Account Usage views. Other roles can see the Snowflake database but not access the contents. Immutable history
is restricted to one year, which is insufficient to support audit requests for greater than one year. We have a further requirement to periodically capture and retain all history.
Let’s dive into the Snowflake Account Usage store, along with an investigation of the global services layer. In line with the hands-on approach, I offer a script and walk-through with a strong recommendation to cut and paste each section as a practical learning experience. The code looks familiar to what is in Chapter 5, but the implementation is quite different.
Best Practices
Snowflake best practice
suggests using the one-year immutable history to check events against, but I suggest this does not fulfill all requirements. There are scenarios where account settings have aged out of the history and are not available for comparison or reinstatement back to original values. I show you how to mitigate against this risk in the history retention section. In other words, always expect the unexpected, don’t trust anything, prove assertions and verify the recommendations match your requirements. Those words of wisdom have served me well in nearly 30 years of development.
Accessing the Account Usage Store
As discussed
in Chapter 5, the only way to grant entitlement is via a role. The first step is to design our approach, as shown in Figure 6-2.
Our approach is to create a new database (MONITOR) containing two schemas: one for static reference data and the second to wrap the source Account Usage views to enrich, contextualize, summarize, and pre-filter the source data to make reporting easier.
While we must explicitly enable access to the Account Usage store, each database information schema is immediately accessible
.
Preamble
Our script starts with some declarations
enabling later automation.
SET monitor_database = 'MONITOR';
SET monitor_reference_schema = 'MONITOR.reference_owner';
SET monitor_owner_schema = 'MONITOR.monitor_owner';
SET monitor_warehouse = 'monitor_wh';
SET monitor_reference_role = 'monitor_reference_role';
SET monitor_owner_role = 'monitor_owner_role';
SET monitor_reader_role = 'monitor_reader_role';
The SET command allows us to declare a label that resolves its actual value when the script runs. You see how the labels are interpreted shortly, providing a single point where we can make declarations for later automation
.
Database, Warehouse, and Schema
The following is a command set that changes role to SYSADMIN, then creates a database with 90-day time travel. Also created is a single, extra-small warehouse with two schemas in the database. Refresh your browser to see the new database and schemas
.
CREATE OR REPLACE WAREHOUSE IDENTIFIER ( $monitor_warehouse ) WITH
WAREHOUSE_SIZE = 'X-SMALL'
AUTO_SUSPEND = 60
AUTO_RESUME = TRUE
MIN_CLUSTER_COUNT = 1
MAX_CLUSTER_COUNT = 4
SCALING_POLICY = 'STANDARD'
INITIALLY_SUSPENDED = TRUE;
CREATE OR REPLACE SCHEMA IDENTIFIER ( $monitor_reference_schema );
CREATE OR REPLACE SCHEMA IDENTIFIER ( $monitor_owner_schema );
In the preceding example, the labels are expanded by using the IDENTIFIER keyword. Using the SYADMIN role ensures all created objects remain owned by SYSADMIN and is a safeguard against unexpected manipulation
.
Your worksheet should reflect Figure 6-3 when refreshed.
Roles
Let’s switch roles to SECURITYADMIN to create custom roles for later use resolving IDENTIFIER labels
.
USE ROLE securityadmin;
CREATE OR REPLACE ROLE IDENTIFIER ( $monitor_reference_role ) COMMENT = 'MONITOR.monitor_reference Role';
CREATE OR REPLACE ROLE IDENTIFIER ( $monitor_owner_role ) COMMENT = 'MONITOR.monitor_owner Role';
CREATE OR REPLACE ROLE IDENTIFIER ( $monitor_reader_role ) COMMENT = 'MONITOR.monitor_reader Role';
In line with Snowflake’s best practices, let’s assign our newly created roles to SECURITYADMIN. (Note that we may choose to have a custom
local admin role as an intermediary role—some homework for you!)
GRANT ROLE IDENTIFIER ( $monitor_reference_role ) TO ROLE securityadmin;
GRANT ROLE IDENTIFIER ( $monitor_owner_role ) TO ROLE securityadmin;
GRANT ROLE IDENTIFIER ( $monitor_reader_role ) TO ROLE securityadmin;
Role Grants
With the SECURITYADMIN role in use, let’s enable each role to use the database and warehouse
.
GRANT USAGE ON DATABASE IDENTIFIER ( $monitor_database ) TO ROLE IDENTIFIER ( $monitor_reference_role );
GRANT USAGE ON WAREHOUSE IDENTIFIER ( $monitor_warehouse ) TO ROLE IDENTIFIER ( $monitor_reference_role );
GRANT OPERATE ON WAREHOUSE IDENTIFIER ( $monitor_warehouse ) TO ROLE IDENTIFIER ( $monitor_reference_role );
GRANT USAGE ON SCHEMA IDENTIFIER ( $monitor_reference_schema ) TO ROLE IDENTIFIER ( $monitor_reference_role );
GRANT USAGE ON DATABASE IDENTIFIER ( $monitor_database ) TO ROLE IDENTIFIER ( $monitor_owner_role );
GRANT USAGE ON WAREHOUSE IDENTIFIER ( $monitor_warehouse ) TO ROLE IDENTIFIER ( $monitor_owner_role );
GRANT OPERATE ON WAREHOUSE IDENTIFIER ( $monitor_warehouse ) TO ROLE IDENTIFIER ( $monitor_owner_role );
GRANT USAGE ON SCHEMA IDENTIFIER ( $monitor_reference_schema ) TO ROLE IDENTIFIER ( $monitor_owner_role );
GRANT USAGE ON SCHEMA IDENTIFIER ( $monitor_owner_schema ) TO ROLE IDENTIFIER ( $monitor_owner_role );
GRANT USAGE ON DATABASE IDENTIFIER ( $monitor_database ) TO ROLE IDENTIFIER ( $monitor_reader_role );
GRANT USAGE ON WAREHOUSE IDENTIFIER ( $monitor_warehouse ) TO ROLE IDENTIFIER ( $monitor_reader_role );
GRANT OPERATE ON WAREHOUSE IDENTIFIER ( $monitor_warehouse ) TO ROLE IDENTIFIER ( $monitor_reader_role );
GRANT USAGE ON SCHEMA IDENTIFIER ( $monitor_owner_schema ) TO ROLE IDENTIFIER ( $monitor_reader_role );
Now let’s grant specific object entitlements for each of the two schemas.
GRANT CREATE TABLE ON SCHEMA IDENTIFIER ( $monitor_reference_schema ) TO ROLE IDENTIFIER ( $monitor_reference_role );
GRANT CREATE VIEW ON SCHEMA IDENTIFIER ( $monitor_reference_schema ) TO ROLE IDENTIFIER ( $monitor_reference_role );
GRANT CREATE SEQUENCE ON SCHEMA IDENTIFIER ( $monitor_reference_schema ) TO ROLE IDENTIFIER ( $monitor_reference_role );
GRANT CREATE STREAM ON SCHEMA IDENTIFIER ( $monitor_reference_schema ) TO ROLE IDENTIFIER ( $monitor_reference_role );
GRANT CREATE MATERIALIZED VIEW ON SCHEMA IDENTIFIER ( $monitor_reference_schema ) TO ROLE IDENTIFIER ( $monitor_reference_role );
GRANT CREATE TABLE ON SCHEMA IDENTIFIER ( $monitor_owner_schema ) TO ROLE IDENTIFIER ( $monitor_owner_role );
GRANT CREATE VIEW ON SCHEMA IDENTIFIER ( $monitor_owner_schema ) TO ROLE IDENTIFIER ( $monitor_owner_role );
GRANT CREATE SEQUENCE ON SCHEMA IDENTIFIER ( $monitor_owner_schema ) TO ROLE IDENTIFIER ( $monitor_owner_role );
GRANT CREATE FUNCTION ON SCHEMA IDENTIFIER ( $monitor_owner_schema ) TO ROLE IDENTIFIER ( $monitor_owner_role );
GRANT CREATE PROCEDURE ON SCHEMA IDENTIFIER ( $monitor_owner_schema ) TO ROLE IDENTIFIER ( $monitor_owner_role );
GRANT CREATE STREAM ON SCHEMA IDENTIFIER ( $monitor_owner_schema ) TO ROLE IDENTIFIER ( $monitor_owner_role );
GRANT CREATE MATERIALIZED VIEW ON SCHEMA IDENTIFIER ( $monitor_owner_schema ) TO ROLE IDENTIFIER ( $monitor_owner_role );
GRANT CREATE FILE FORMAT ON SCHEMA IDENTIFIER ( $monitor_owner_schema ) TO ROLE IDENTIFIER ( $monitor_owner_role );
Assigning Roles to Users
When all three roles have been created and all entitlement granted, we must enable one
or more users to interact with our empty schemas. We do this by granting each role to nominated users.
GRANT ROLE IDENTIFIER ( $monitor_reference_role ) TO USER <your_name>;
GRANT ROLE IDENTIFIER ( $monitor_owner_role ) TO USER <your_name>;
GRANT ROLE IDENTIFIER ( $monitor_reader_role ) TO USER <your_name>;
Enabling the Account Usage Store
The following command entitles monitor_owner_role to access the Account Usage store
.
GRANT IMPORTED PRIVILEGES ON DATABASE snowflake TO ROLE IDENTIFIER ( $monitor_owner_role );
Switch to monitor_owner_role as this is the only custom role that can view Account Usage store objects and then run a sample query.
USE ROLE IDENTIFIER ( $monitor_owner_role );
USE DATABASE IDENTIFIER ( $monitor_database );
USE SCHEMA IDENTIFIER ( $monitor_owner_schema );
USE WAREHOUSE IDENTIFIER ( $monitor_warehouse );
We can now query the Account Usage views, in this example, selecting metadata for all active databases
in our account.
SELECT * FROM SNOWFLAKE.account_usage.databases WHERE deleted IS NULL;
Building Reference Data Sets
In the previous sections, we walked through setting up our environment. Now let’s focus on the practical implementation of monitoring objects. To do this, we must understand what we are monitoring and why.
Starting with a simple use case, our first objective is to examine the sample controls developed in Chapter 4 and determine whether a pattern-based approach can be derived, enabling future reuse. A design pattern is a code template that can easily be replicated, extended, or, better still, be data driven.
From experience, there are six different patterns to be developed that cover all known scenarios. Let’s begin our investigation into these patterns, starting with the simplest and adding notes to build up to the most complex. Note space does not permit a full explanation of each pattern, and this chapter builds upon principles outlined in Chapter 5, albeit in a different format.
With a fully historized reference data set
, we achieve two objectives. First, we establish a temporal history for audit purposes because settings age out of the history after a year, and this is a weakness we must correct. Second, in the unlikely event a parameter is incorrectly set, we have a record of what the current value should be and of all historical values.
We must identify a way of ensuring parameter values are recorded for comparison, hence the need for reference data created in the reference_owner schema. We also introduce a way of historizing our reference data, we provide historical data to ensure the data lineage is preserved, and a brief digression is in order. Data warehouses often re-create a report at any point where the data warehouse holds information. But, if we only hold the most recent values for a record, we cannot store the history too. Enter Slowly Changing Dimension (SCD) modeling
. A full explanation is beyond the scope of this chapter, but more information is at https://en.wikipedia.org/wiki/Slowly_changing_dimension. For our purposes, we need to know account parameter values at any point in history. SCD2 provides the answer; hence all historization for reference data is SCD2 based.
Finally, to ease building out each control pattern, we need to identify a way of grouping like controls in our reference data. Please take time to understand the companion script, Chapter_6_Account_Usage_Demo.sql, and all will become clear.
Referencing Schema Objects
Our reference schema contains
three core tables along with their historized counterparts and a single secure view that denormalizes the data model for ease of use. A full explanation of denormalization is at https://en.wikipedia.org/wiki/Denormalization. Figure 6-4 shows the three tables and secure view we will create, along with their relationships.
The general pattern for creating a single table SCD2 implementation is shown in Figure 6-5. Please refer to the accompanying script, Chapter_6_Account_Usage_Demo.sql, for the complete implementation of all three tables and a secure view. Note Chapter_6_Account_Usage_Demo.sql takes a few minutes to run as there are many SQL statements
.
Figure 6-5 shows the full suite of SQL statements and data flow required to implement our example.
Let’s overview Figure 6-5. We create sequence monitor_group_id to generate a surrogate key. Then we create tables monitor_group and hist_monitor_group also stream strm_monitor_group, which captures records as they are inserted into the monitor_group table. We define view v_monitor_group against both stream and history table, which does all the hard work of identifying delta records
. With all objects created, we insert data into monitor_group and finally run the merge statement, which loads the SCD2 records into hist_monitor_group. Don’t worry. All will become clear. Just know there are dependencies on object creation order, which I now explain.
Change the role to reference_owner role.
USE ROLE IDENTIFIER ( $monitor_reference_role );
USE DATABASE IDENTIFIER ( $monitor_database );
USE SCHEMA IDENTIFIER ( $monitor_reference_schema );
USE WAREHOUSE IDENTIFIER ( $monitor_warehouse );
Create a sequence.
CREATE OR REPLACE SEQUENCE monitor_group_id_seq START WITH 10000;
Create tables.
CREATE OR REPLACE TABLE monitor_group
(
monitor_group_id NUMBER(10) PRIMARY KEY NOT NULL,
monitor_group_name VARCHAR(255) NOT NULL,
last_updated TIMESTAMP_NTZ DEFAULT current_timestamp()::TIMESTAMP_NTZ NOT NULL,
CREATE OR REPLACE STREAM strm_monitor_group ON TABLE monitor_group;
Create a history view
. Note that UNION ALL joins three sections. The first code block identifies records for INSERTs, the second for UPDATEs, and the third for DELETEs. An UPDATE is two actions: a logical DELETE and an INSERT. Hence in v_monitor_group declaration, dml_flag is never set to ‘U’.
CREATE OR REPLACE VIEW v_monitor_group
AS
SELECT monitor_group_id,
monitor_group_name,
valid_from,
valid_to,
current_flag,
'I' AS dml_type
FROM (
SELECT monitor_group_id,
monitor_group_name,
last_updated AS valid_from,
LAG ( last_updated ) OVER ( PARTITION BY monitor_group_id ORDER BY last_updated DESC ) AS valid_to_raw,
CASE
WHEN valid_to_raw IS NULL
THEN '9999-12-31'::TIMESTAMP_NTZ
ELSE valid_to_raw
END AS valid_to,
CASE
WHEN valid_to_raw IS NULL
THEN 'Y'
ELSE 'N'
END AS current_flag,
'I' AS dml_type
FROM (
SELECT strm.monitor_group_id,
strm.monitor_group_name,
strm.last_updated
FROM strm_monitor_group strm
WHERE strm.metadata$action = 'INSERT'
AND strm.metadata$isupdate = 'FALSE'
)
)
UNION ALL
SELECT monitor_group_id,
monitor_group_name,
valid_from,
valid_to,
current_flag,
dml_type
FROM (
SELECT monitor_group_id,
monitor_group_name,
valid_from,
LAG ( valid_from ) OVER ( PARTITION BY monitor_group_id ORDER BY valid_from DESC ) AS valid_to_raw,
CASE
WHEN valid_to_raw IS NULL
THEN '9999-12-31'::TIMESTAMP_NTZ
ELSE valid_to_raw
END AS valid_to,
CASE
WHEN valid_to_raw IS NULL
THEN 'Y'
ELSE 'N'
END AS current_flag,
dml_type
FROM (
SELECT strm.monitor_group_id,
strm.monitor_group_name,
strm.last_updated AS valid_from,
'I' AS dml_type
FROM strm_monitor_group strm
WHERE strm.metadata$action = 'INSERT'
AND strm.metadata$isupdate = 'TRUE'
UNION ALL
SELECT tgt.monitor_group_id,
tgt.monitor_group_name,
tgt.valid_from,
'D' AS dml_type
FROM hist_monitor_group tgt
WHERE tgt.monitor_group_id IN
(
SELECT DISTINCT strm.monitor_group_id
FROM strm_monitor_group strm
WHERE strm.metadata$action = 'INSERT'
AND strm.metadata$isupdate = 'TRUE'
)
AND tgt.current_flag = 'Y'
)
)
UNION ALL
SELECT strm.monitor_group_id,
strm.monitor_group_name,
tgt.valid_from,
current_timestamp()::TIMESTAMP_NTZ AS valid_to,
NULL,
'D' AS dml_type
FROM hist_monitor_group tgt
INNER JOIN strm_monitor_group strm
ON tgt.monitor_group_id = strm.monitor_group_id
WHERE strm.metadata$action = 'DELETE'
AND strm.metadata$isupdate = 'FALSE'
AND tgt.current_flag = 'Y';
Determine monitor groups, for example, we have account parameters, network policies, and time travel. there are others to add later when we encounter them, but for now, these three suffice. create INSERT statements
for each.
Run the MERGE statement to historize the data, as dml_flag is never set to ‘U’, we only need to consider INSERT and (logical) DELETE operations. Note the number of rows inserted and number of rows updated in the returned result set.
MERGE INTO hist_monitor_group tgt
USING v_monitor_group strm
ON tgt.monitor_group_id = strm.monitor_group_id
AND tgt.valid_from = strm.valid_from
AND tgt.monitor_group_name = strm.monitor_group_name
WHEN MATCHED AND strm.dml_type = 'D' THEN
UPDATE
SET tgt.valid_to = strm.valid_to,
tgt.current_flag = 'N'
WHEN NOT MATCHED AND strm.dml_type = 'I' THEN
INSERT
(
tgt.monitor_group_id,
tgt.monitor_group_name,
tgt.valid_from,
tgt.valid_to,
tgt.current_flag
)
VALUES
(
strm.monitor_group_id,
strm.monitor_group_name,
strm.valid_from,
strm.valid_to,
strm.current_flag
);
Now query the history table to ensure the MERGE created the three records noting the valid_to date is set to ‘9999-12-31’ and current_flag = ‘Y’.
SELECT * FROM hist_monitor_group;
Testing
Please refer to the
corresponding script, Chapter_6_Account_Usage_Demo.sql, where additional SQL commands are found to INSERT, UPDATE, and DELETE test parameters, always remembering to run the MERGE statement which historizes the data. Additional steps for automation (and possibly overkill as maintenance is a very low volume occasional activity) would be to create a task; see Chapter 5 for an example.
Completing the Reference Schema Data Flow
Due to space constraints
, I have only provided a single table walk-through. Two more tables conforming to the same pattern, including foreign key references, are required to be built contained in the Chapter_6_Account_Usage_Demo.sql corresponding script.
Note
If you add attributes to the sample code, ensure the MERGE statement is extended to capture the new attributes.
You should begin to see that we build our code according to patterns. Adopting this approach leads us to consider writing code to generate the patterns ensuring repeatability and a consistent rule-based approach to development. We explore this concept later in this book as the implication for building the Snowflake data cloud are profound.
When complete, we can build the secure view on top to abstract our data model into a single usable object
.
Creating a Secure View
Assuming Chapter_6_Account_Usage_Demo.sql has been run and all objects created, let’s now build a secure view to denormalize our reference data which delivers a single object for lookups. We use a secure view
to prevent the underlying SQL from being visible (a security feature). Views provide abstraction meaning less code to write when referencing the underlying model.
USE ROLE IDENTIFIER ( $monitor_reference_role );
USE DATABASE IDENTIFIER ( $monitor_database );
USE SCHEMA IDENTIFIER ( $monitor_owner_schema );
USE WAREHOUSE IDENTIFIER ( $monitor_warehouse );
CREATE OR REPLACE SECURE VIEW v_monitor_data COPY GRANTS
AS
SELECT mg.monitor_group_id,
mg.monitor_group_name,
mc.monitor_control_id,
mc.monitor_control_name,
mp.monitor_parameter_id,
mp.monitor_parameter_name,
mp.monitor_parameter_value
FROM monitor_group mg,
monitor_control mc,
monitor_parameter mp
WHERE mg.monitor_group_id = mc.monitor_group_id
AND mc.monitor_control_id = mp.monitor_control_id;
We should check v_monitor_data returns expected values.
SELECT * FROM v_monitor_data;
And finally, grant entitlement to monitor_owner_role.
GRANT SELECT, REFERENCES ON v_monitor_data TO ROLE IDENTIFIER ( $monitor_owner_role );
Storing Monitored Output
This section describes how to store result sets and maintain a full audit history of monitoring activity and extension when our monitoring requirements change. Figure 6-6 illustrates the components required to identify changes to our baseline security configuration held in reference data, then compared to the actual values set on our account, with any differences reported for consumption by monitoring teams
.
We begin (as usual) by setting our context.
USE ROLE IDENTIFIER ( $monitor_owner_role );
USE DATABASE IDENTIFIER ( $monitor_database );
USE SCHEMA IDENTIFIER ( $monitor_owner_schema );
USE WAREHOUSE IDENTIFIER ( $monitor_warehouse );
We must determine what is to be logged. The reference data, actual values, and context, plus anything useful in assisting future investigation such as record creation timestamp. From these requirements, our surrogate key sequence and primary logging table look like the following.
CREATE OR REPLACE SEQUENCE monitor_log_id_seq START WITH 10000;
CREATE OR REPLACE TABLE monitor_log
(
monitor_log_id NUMBER PRIMARY KEY NOT NULL,
event_description STRING NOT NULL,
event_result STRING NOT NULL,
monitor_control_name STRING NOT NULL,
monitor_parameter_name STRING NOT NULL,
monitor_parameter_value STRING NOT NULL,
last_updated TIMESTAMP_NTZ DEFAULT current_timestamp()::TIMESTAMP_NTZ NOT NULL
);
There is a second logging table for one complex use case described later
.
Asynchronous Monitoring
These pattern runs asynchronously and should be called from a task periodically. Then the logged output should be queried via a second task for event propagation. We adopt this approach since it is impossible to embed the SHOW, LIST, and DESCRIBE commands’ output in a view or other abstraction mechanism, such as UDF.
Publication could be via external procedure wrapping email functionality or writing to an external stage. Monitoring screens can also be developed against the monitor_log table for use by your operational support team.
Account Parameters
The following pattern
is for simple account parameter monitoring, TRUE/FALSE flag setting for a named parameter.
You may recall in Chapter 4 that we developed controls to prevent data from being unloaded to user-specified Internet locations and restrict data unloads to specified, system-mapped Internet locations.
USE ROLE accountadmin;
ALTER ACCOUNT SET prevent_unload_to_inline_url = TRUE;
ALTER ACCOUNT SET require_storage_integration_for_stage_creation = TRUE;
ALTER ACCOUNT SET require_storage_integration_for_stage_operation = TRUE;
Two separate controls with a common implementation pattern. we are setting account level parameters.
Note our sample script Chapter_6_Account_Usage_Demo.sql initially creates only three monitor parameters in support of account parameter monitoring, providing our first set of controls to monitor
.
USE ROLE IDENTIFIER ( $monitor_owner_role );
USE DATABASE IDENTIFIER ( $monitor_database );
USE SCHEMA IDENTIFIER ( $monitor_owner_schema );
USE WAREHOUSE IDENTIFIER ( $monitor_warehouse );
SELECT *
FROM MONITOR.reference_owner.v_monitor_data
WHERE monitor_parameter_value = 'TRUE';
Note fully qualified object reference. We should see three rows returned as per Figure 6-7.
But how do we compare the reference data to the actual values?
First, we must run the SHOW command to access the Snowflake global services layer based on FoundationDB, the key-pair store that underpins all Snowflake metadata operations. We return to this theme later in this chapter
.
SHOW PARAMETERS LIKE 'prevent_unload_to_inline_url' IN ACCOUNT;
However, there is a problem. SHOW is not a true SQL command since we cannot use it in a SQL statement. For example, we cannot create a view based on it. This does not work. Create or replace view v_show_parameters as SHOW parameters. However, we can use RESULT_SCAN to convert SHOW output to a result set, but we must understand the interaction. When RESULT_SCAN is used with last_query_id, we cannot run commands in between as the query_id changes.
SELECT "value" FROM TABLE ( RESULT_SCAN ( last_query_id()));
Instead, we must use a stored procedure to programmatically extract the value for a named parameter as the two statements run consecutively, noting we extend this code pattern later for network parameter monitoring
.
You may be wondering whether the stored procedure can be converted into a function and embedded in a SQL statement. In short, the answer at the time of writing, using a function, is not possible (but would have been so much simpler).
Having explained and implemented all dependent objects on which account parameter monitoring relies, Figure 6-8 illustrates how the components relate with sp_check_parameters being the next deliverable for development
.
Flowchart to explain the logic contained in sp_check_parameters in Figure 6-9.
Let’s implement the sp_check_parameters stored procedure using JavaScript.
CREATE OR REPLACE PROCEDURE sp_check_parameters() RETURNS STRING
Call the stored procedure to test. This should not insert any records in monitor_log
.
CALL sp_check_parameters();
SELECT * FROM monitor_log;
Finally, mis-set a parameter then retest.
USE ROLE accountadmin;
ALTER ACCOUNT SET prevent_unload_to_inline_url = FALSE;
USE ROLE IDENTIFIER ( $monitor_owner_role );
USE DATABASE IDENTIFIER ( $monitor_database );
USE SCHEMA IDENTIFIER ( $monitor_owner_schema );
USE WAREHOUSE IDENTIFIER ( $monitor_warehouse );
CALL sp_check_parameters();
SELECT * FROM monitor_log;
You should see a single record, as shown in Figure 6-10.
Network Parameters
Network policy
monitoring is a variation on account parameter monitoring with two extra steps. Just as sp_get_parameter_value uses the SHOW command with consecutive RESULT_SCAN query, interrogating network policies requires the same constructs followed by DESCRIBE and RESULT_SCAN query.
Add reference data changing the IP range to suit your local environment, and noting your Snowflake account network policy should also be set.
USE ROLE IDENTIFIER ( $monitor_reference_role );
USE DATABASE IDENTIFIER ( $monitor_database );
USE SCHEMA IDENTIFIER ( $monitor_reference_schema );
USE WAREHOUSE IDENTIFIER ( $monitor_warehouse );
INSERT INTO monitor_parameter ( monitor_parameter_id, monitor_control_id, monitor_parameter_name, monitor_parameter_value ) SELECT monitor_parameter_id_seq.NEXTVAL, monitor_control_id, 'MY_NETWORK_POLICY', '192.168.0.0' FROM monitor_control WHERE monitor_control_name = 'SNOWFLAKE_NIST_NP1';
The queries required (in order) are as follows.
USE ROLE accountadmin;
CREATE OR REPLACE NETWORK POLICY my_network_policy
ALLOWED_IP_LIST=( '192.168.0.0' );
SHOW NETWORK POLICIES IN ACCOUNT;
SELECT "name" FROM TABLE ( RESULT_SCAN ( last_query_id()));
DESCRIBE NETWORK POLICY my_network_policy;
SELECT "name", "value" FROM TABLE ( RESULT_SCAN ( last_query_id()));
Build the dynamic
INSERT INTO monitor_log using same pattern from sp_check_parameters, call and test your stored procedure.
System for Cross-Domain Identity Management (SCIM) Events
Monitoring SCIM
events is distinctly different from all other monitoring insofar as accessing rest_event_history is only allowable using ACCOUNTADMIN role. We can achieve the same outcome with some lateral thinking, a task (which runs as ACCOUNTADMIN), and a separate logging table. Please refer to https://docs.snowflake.com/en/sql-reference/functions/rest_event_history.html for more details.
For this example, we use SNOWFLAKE_NIST_RE1 as the control reference.
Create a sequence and table to capture rest_event_history output
.
USE ROLE IDENTIFIER ( $monitor_owner_role );
USE DATABASE IDENTIFIER ( $monitor_database );
USE SCHEMA IDENTIFIER ( $monitor_owner_schema );
USE WAREHOUSE IDENTIFIER ( $monitor_warehouse );
-- Create Sequence
CREATE OR REPLACE SEQUENCE hist_rest_event_history_id_seq START WITH 10000;
CREATE OR REPLACE TABLE hist_rest_event_history
(
hist_rest_event_history NUMBER,
event_timestamp TIMESTAMP_LTZ,
event_id NUMBER,
event_type TEXT,
endpoint TEXT,
method TEXT,
status TEXT,
error_code TEXT,
details TEXT,
client_ip TEXT,
actor_name TEXT,
actor_domain TEXT,
resource_name TEXT,
resource_domain TEXT
);
Recognizing the need to report events in the past hour, using the ACCOUNTADMIN role, create a view joining SNOWFLAKE.information_schema.rest_event_history to our reference data. Note the use of the TABLE function
.
USE ROLE accountadmin;
CREATE OR REPLACE SECURE VIEW v_rest_event_history COPY GRANTS
FROM TABLE ( snowflake.information_schema.rest_event_history (
'scim',
DATEADD ( 'minutes', -60, current_timestamp()),
current_timestamp(),
1000 )
) reh;
An alternative way to derive the time since the last run is to use your local monitor status table if implemented. The INSERT statement should reference the view using a stored procedure and TASK to execute periodically
.
Synchronous Monitoring
The remainder
of our monitoring configuration is much more straightforward and relies upon accessing the Account Usage views. We still rely upon reference data to determine our baseline criteria reflected in the following sections.
Time Travel
Database time-travel
monitoring does not require SHOW/DESCRIBE processing. Instead, it is a straightforward join between v_monitor_data and Account Usage store (assuming corresponding reference data has been created first and the monitoring databases have 90-day time travel enabled).
USE ROLE IDENTIFIER ( $monitor_reference_role );
USE DATABASE IDENTIFIER ( $monitor_database );
USE SCHEMA IDENTIFIER ( $monitor_reference_schema );
USE WAREHOUSE IDENTIFIER ( $monitor_warehouse );
INSERT INTO monitor_parameter ( monitor_parameter_id, monitor_control_id, monitor_parameter_name, monitor_parameter_value ) SELECT monitor_parameter_id_seq.NEXTVAL, monitor_control_id, 'MONITOR', '90' FROM monitor_control WHERE monitor_control_name = 'SNOWFLAKE_NIST_TT1';
INSERT INTO monitor_parameter ( monitor_parameter_id, monitor_control_id, monitor_parameter_name, monitor_parameter_value ) SELECT monitor_parameter_id_seq.NEXTVAL, monitor_control_id, 'TEST', '90' FROM monitor_control WHERE monitor_control_name = 'SNOWFLAKE_NIST_TT1';
Use the following query to test, noting the Account Usage views can take up to 3 hours to refresh.
USE ROLE IDENTIFIER ( $monitor_owner_role );
USE DATABASE IDENTIFIER ( $monitor_database );
USE SCHEMA IDENTIFIER ( $monitor_owner_schema );
USE WAREHOUSE IDENTIFIER ( $monitor_warehouse );
INSERT INTO monitor_log
SELECT monitor_log_id_seq.NEXTVAL,
d.database_name,
d.retention_time,
v.monitor_control_name,
v.monitor_parameter_name,
v.monitor_parameter_value,
current_timestamp()::TIMESTAMP_NTZ
FROM MONITOR.reference_owner.v_monitor_data v,
snowflake.account_usage.databases d
WHERE v.monitor_parameter_name = d.database_name
AND v.monitor_parameter_value != d.retention_time
AND v.monitor_group_name = 'TIME TRAVEL'
AND d.deleted IS NULL;
Testing can also be accomplished by setting != to = in this query
.
Events
Many monitoring
requirements can be achieved by directly querying account_usage_store or information_schema and joining the reference data. Examples of both approaches are offered next. Remember the key differences are Account Usage views have a latency of between 45 minutes and 3 hours and typically store information for up to one year. In contrast, the information schema has zero latency, but data is only stored for 14 days.
Reference Chapter 4 and our first control, SNOWFLAKE_NIST_AC1.
USE ROLE IDENTIFIER ( $monitor_owner_role );
USE DATABASE IDENTIFIER ( $monitor_database );
USE SCHEMA IDENTIFIER ( $monitor_owner_schema );
USE WAREHOUSE IDENTIFIER ( $monitor_warehouse );
Create secure view v_snowflake_nist_ac1.
CREATE OR REPLACE SECURE VIEW v_snowflake_nist_ac1 COPY GRANTS
COMMENT = 'Reference: Chapter 4 Account Security'
AS
SELECT 'SNOWFLAKE_NIST_AC1' AS control_name,
'MONITOR.monitor_owner.v_snowflake_nist_ac1' AS control_object,
start_time,
role_name,
database_name,
schema_name,
user_name,
query_text,
query_id
FROM snowflake.account_usage.query_history
WHERE role_name = 'ACCOUNTADMIN';
Note the absence of timeband filters. We should not limit the lowest Level Account Usage views, but we may later choose to apply timeband filters. This is a building block for later use.
Check view v_snowflake_nist_ac1 has content:
SELECT * FROM v_snowflake_nist_ac1;
The equivalent real-time information schema view looks like this.
CREATE OR REPLACE SECURE VIEW v_rt_snowflake_nist_ac1 COPY GRANTS
COMMENT = 'Reference: Chapter 4 Account Security and our first control'
AS
SELECT 'SNOWFLAKE_NIST_AC1' AS control_name,
'MONITOR.monitor_owner.v_rt_snowflake_nist_ac1' AS control_object,
Naturally, we would develop one wrapper view for each control and consolidate it into a single master view, joining our reference data against which we may wish to apply filters.
CREATE OR REPLACE SECURE VIEW v_rt_snowflake_controls COPY GRANTS
AS
SELECT ac1.control_name,
ac1.control_object,
ac1.start_time,
ac1.role_name,
ac1.database_name,
ac1.schema_name,
ac1.user_name,
ac1.query_text,
ac1.query_id,
rd.monitor_group_name,
rd.monitor_control_name,
rd.monitor_parameter_name,
rd.monitor_parameter_value
FROM MONITOR.monitor_owner.v_rt_snowflake_nist_ac1 ac1,
MONITOR.reference_owner.v_monitor_data rd
WHERE ac1.control_name = rd.monitor_control_name
AND rd.monitor_control_name = 'SNOWFLAKE_NIST_AC1'
AND rd.monitor_parameter_name = 'ACCOUNTADMIN'
AND ac1.start_time >= DATEADD ( 'minutes', rd.monitor_parameter_value, current_timestamp()::TIMESTAMP_NTZ );
Only one source view is shown. All others should be added via UNION ALL. Also, note the filter on start_time, restricting results to the past 60 minutes set by reference data.
Check view v_rt_snowflake_controls has content
:
SELECT * FROM v_rt_snowflake_controls;
External Monitoring Tools
Some third-party tools
require direct access to Snowflake Account Usage views to function. We should not allow direct access to any Snowflake objects but instead, create wrapper views and custom roles to limit data access to minimal objects required to achieve each objective. While this approach may be uncomfortable for vendors, we must not compromise our account security and must insist on vendors conforming to our security policies. Some vendors make provision for customization through a configuration file, as with Datadog.
An example view for specific monitoring purposes would include all source view attributes and some custom attributes, which may look like the following.
CREATE OR REPLACE SECURE VIEW login_history COPY GRANTS
AS
SELECT lh.*,
current_account()||'.'||user_name AS object_name,
'Login History Wrapper View' AS source_name,
'snowflake.account_usage.login_history' AS source_path
FROM snowflake.account_usage.login_history lh;
Check view login_history has content:
SELECT * FROM login_history;
Note the removal of the v_ prefix because we want to present our view to the consuming user in (almost) native format to make integration easy.
We must grant entitlement on our views to a role.
GRANT SELECT ON login_history TO ROLE IDENTIFIER ( $monitor_reader_role );
Finally, grant the role to a user, which I leave you to complete
.
Cleanup
Finally, you want to remove all objects
.
USE ROLE sysadmin;
DROP DATABASE IDENTIFIER ( $monitor_database );
DROP WAREHOUSE IDENTIFIER ( $monitor_warehouse );
USE ROLE securityadmin;
DROP ROLE IDENTIFIER ( $monitor_reference_role );
DROP ROLE IDENTIFIER ( $monitor_reader_role );
DROP ROLE IDENTIFIER ( $monitor_owner_role );
History Retention
We may need to hold the full history of every action in our account from inception to the current date. While Snowflake retains a one-year immutable history, we may need to periodically copy data to local tables before information ages out.
The following list of snowflake.account_usage views should be reviewed for completeness in scope and attribute changes before invoking data copy. I suggest that data be copied in March and September to avoid year-end change freeze blackouts
.
task_history
access_history
automatic_clustering_history
copy_history
database_storage_usage_history
data_transfer_history
load_history
login_history
materialized_view_refresh_history
metering_daily_history
metering_history
pipe_usage_history
query_history
replication_usage_history
search_optimization_history
serverless_task_history
sessions
stage_storage_usage_history
storage_usage
warehouse_events_history
warehouse_load_history
warehouse_metering_history
Summary
This chapter introduced the Account Usage store and built a new database and associated schemas for later use. You then enabled access to the Account Usage store.
Next, you developed a reference schema with metadata storing our account level settings and denormalizing it for later use, demonstrating a sample but extensible data model.
A full monitoring solution is not possible with the limited space available. The page budget has been significantly exceeded to make things clear. It is well worth the effort. I hope you agree and have plenty to pique your interest. I have included additional code in the accompanying file to assist your understanding.
Let’s turn our attention to ingesting data, where you learn more techniques on your journey to building the Snowflake Data Cloud.