This chapter discusses implementing two paradigms: Snowflake Data Exchange and Snowflake Marketplace . Both are underpinned by Snowflake Secure Data Sharing, which we delve into first. You know from Chapter 1 that Snowflake Data Exchange and Snowflake Marketplace are similar in concept but serve different target audiences. Secure Direct Data Share underpins both Snowflake Data Exchange and Snowflake Marketplace. Chapter 3 explained how data shares are implemented using micro-partitions. Let’s investigate use cases via hands-on practical examples.
For Secure Direct Data Share , we continue our investigation using command-line scripts; but for Snowflake Data Exchange and Snowflake Marketplace, there is no approved command-line approach for configuration. Instead, and in line with the Snowflake-approved approach, we use Snowsight. Furthermore, to participate in Snowflake Marketplace, there is a formal engagement process to endure with a commitment to producing specific data sets, not to be undertaken lightly. More information is at https://other-docs.snowflake.com/en/marketplace/becoming-a-provider.html#step-1-submit-a-request-to-join-the-snowflake-data-marketplace .
Two separate Snowflake accounts are required for this chapter. For ease of testing, please ensure both are in the same cloud service provider and region. The data share examples will not work otherwise! Using a single email address, multiple trial Snowflake accounts may be provisioned at https://signup.snowflake.com .
Snowflake is constantly evolving, and screenshots may be enhanced or reorganized between writing and publication. However, the general content and navigation are sure to remain relevant. Be aware that some details may change. Due to space constraints, I have not explored every option available for every feature but instead focused on what is believed to be the core capabilities resulting in end-to-end data provision and consumption.
Terminology and product positioning have changed since inception. Please refer to the documentation for the latest information.
For an overview of the three product offerings discussed in this chapter, please also refer to the documentation at https://docs.snowflake.com/en/user-guide/data-sharing-product-offerings.html#overview-of-the-product-offerings-for-secure-data-sharing .
This chapter is what I would have liked to have at the outset of my Snowflake journey, a blueprint to get started, investigate further, and build out into a comprehensive, secure, business-focused solution.
By the end of this chapter, you will be equipped to implement pragmatic solutions immediately useful to your organization’s internal and external facing needs.
A bold claim. Let’s get started!
Overview
Earlier chapters explained how to ingest data into Snowflake by presenting data for usage using a common pattern: data ingestion on the left and data consumption on the right. Recognizing we have not addressed data science and data applications, both are better served by other authors and media, not the least due to space constraints in this book.
The latest Snowflake location information is at https://docs.snowflake.com/en/user-guide/intro-regions.html .
We must be aware of Snowflake’s presence for various reasons, not the least of which are costs when moving data between cloud service providers (CSPs) and regions, see documentation at https://docs.snowflake.com/en/user-guide/billing-data-transfer.html#data-transfer-billing-use-cases .
Secure Direct Data Share
Why Data Share?
We mostly work in environments where data silos predominate, and legacy thinking pervades our conversations, stifling innovation and preventing true data democratization. Consequently, we may only have access to curated data sets via limited access paths or, worse still, partial data sets from which to derive meaningful answers. Without wishing to repeat Chapter 2, we are largely stuck with the legacy paradigm born in the 1980s of FTP/SFTP, API, ETL/ELT, and most recently, cloud storage. Yes, it is surprising to see the latest emergent technology considered a legacy. But the stark reality is that many organizations have wrapped up cloud storage in legacy constructs because these are both familiar and well understood.
I recognize the irony. Most of this book has been concerned with ingesting data via legacy techniques. But the reality on the ground (at the time of writing) is that most organizations are not yet willing to move away from their comfort zone and embrace the new data-sharing paradigm.
I hope you are among those willing to consider a trial and then adopt what is considered the future of data interchange . Snowflake has envisioned a new paradigm, Secure Direct Data Share, where service on demand is seamless and transparent, and data access is available out of the box. Secure Direct Data Share is a built-in, highly secure, core Snowflake capability reliant on cloud storage to share data with internal business colleagues, partners, and Snowflake customers.
The aim is to streamline data access by implementing frictionless interfaces while retaining data security and system integrity regardless of location.
Suitable for all applications , Secure Direct Data Share explicitly controls customers’ access to shared data. It is a great way to securely and safely monetize your data while always retaining full control.
Data sharing removes barriers, regardless of cloud, region, workload, or organizational domain, and gives instant access through a single copy of data. The number of data copies is set to rise from 9:1 to 10:1. Reusing a single copy of data should cause everyone to sit up and pay attention. Data sharing is a true game-changer.
Data sharing enables business continuity by eliminating disruptions and delivering a seamless customer experience, with no more processing chains or single points of failure from source to destination. Utilizing Snowflake’s global data mesh guarantees consistent, reliable delivery with full security embedded.
Incorporating object tagging gains the distributed benefit of our organization’s perspective on data categorization applied consistently across all uses of the subject data. Cross-domain governance is more easily managed centrally and consistently rather than distributed or fragmented across geographically disparate teams; one consistent implementation pattern and multiple aspects satisfy all use cases.
How to Data Share
Shares are a first-order object whose declaration and management are typically performed using the ACCOUNTADMIN role, the use of which should be tightly restricted. Snowflake provides instructions on devolving entitlement to lower privileged roles at https://docs.snowflake.com/en/user-guide/security-access-privileges-shares.html#enabling-non-accountadmin-roles-to-perform-data-sharing-tasks . In our view, devolving entitlement should not be done lightly as the law of unintended consequences quickly follows https://en.wikipedia.org/wiki/Unintended_consequences . I recommend that share entitlement is not devolved.
To identify shares.
Replacing <your_consumer_account> with a consumer account in the same CSP and region.
To identify your Snowflake account, select current_account();.
We should see the new table csv_test_2 in our imported share. Naturally, all other share consumers can also see the new table.
Further information on consuming from shares is at https://docs.snowflake.com/en/user-guide/data-share-consumers.html#data-consumers .
Data Share Summary
Consumers may create a single database per share. An inbound database must be created before the share can be queried, and created databases can be renamed and dropped. As stated, roles and entitlement are not currently carried forward with the share. They must be rebuilt in the recipient account. However, row-level security, column-level security, external tokenization, and data masking policies are honored.
Managing the share is trivial. Adding and removing objects from the share does not require much effort, and all share consumers also receive the same shared objects.
Most importantly, no data is copied. Metadata alone enables the secure sharing of data in underlying storage. Since no data is copied from the provider account, the Snowflake consumer account is not charged for storage.
Only the most recent micro-partitions are available in the share. The Time Travel feature is not supported for objects imported via a share, although it is supported for the original objects. Not all schema object types can be shared, and for those shareable object types, their secure counterparts must be used to preserve strict control of access to data. Further information is at https://docs.snowflake.com/en/user-guide/data-sharing-provider.html#working-with-shares .
Reader Accounts
Reader accounts are provisioned for organizations who are not yet Snowflake customers enabling data query but not upload, update, or delete of data. A key advantage of reader accounts is reduced data integration cost with direct access to live, ready-to-query data.
For Snowflake customers, the compute cost is paid for by the consuming account. For non-Snowflake consumers using a reader account, consumption is paid for by the provider.
We do not provide an in-depth study of reader accounts but instead refer to Snowflake documentation at https://docs.snowflake.com/en/user-guide/data-sharing-reader-create.html#managing-reader-accounts .
Tracking Usage
Before investigating data access, we must be mindful of latency when accessing Snowflake’s Account Usage views. For query_history, latency can be up to 45 minutes, and for access_history, up to 3 hours. Latency is not a major consideration for tracking usage as the information is not of critical importance. More information is at https://docs.snowflake.com/en/sql-reference/account-usage.html#account-usage-views .
Access History
At the time of writing, the access_history view is a public preview and subject to change; therefore, the following code sample is indicative only. Documentation is at https://docs.snowflake.com/en/user-guide/access-history.html#access-history .
It is not currently possible to track consumer activity through the Snowflake Account Usage store. Each account tracks its own usage, for which these SQL statements may prove useful.
Data Sharing provider accounts: The queries and logs on the shared objects executed in the provider account are not visible to Data Sharing consumer accounts.
Data Sharing consumer accounts: The queries on the data share executed in the consumer account are logged and only visible to the consumer account, not the Data Sharing provider account. The base tables accessed by the data share are not logged.
Data Sharing Usage
Snowflake is developing a suite of monitoring views where we can track shared data usage. Current monitoring capability is quite limited and does not allow identifying the rows accessed by any specific query. The latency may be up to two days.
Please also refer to Snowflake documentation at https://docs.snowflake.com/en/sql-reference/data-sharing-usage.html .
Centralized Cost Monitoring
This section explores the potential for implementing a common cost monitoring pattern using data share and local tables.
It is assumed that all code in this chapter has been run to this point.
Conceptual Model
Figure 14-7 shows two provider accounts (A & B) with a common suite of baselined reporting components deployed in a monitor schema. Each view overlays the provider Account Usage store or information schema with tabular SQL user-defined functions (UDTFs) , one per view. Each provider account shares its UDTFs, and the consumer account imports each share into a separate database, one per provider.
Taking advantage of UDTF capability to return a result set, scheduling via a task in the consumer account calls each UDTF inserting data into its own tables, one for each UDTF.
In the reporting database, result sets from each provider account reporting component are joined using views for common reporting.
Cost Monitoring Setup
Taking advantage of our share configuration , let’s extend functionality by implementing components to demonstrate how centralized cost monitoring can be delivered.
Let’s assume we want to report warehouse credit consumption by the warehouse for the previous day. In practice, we may want a more sophisticated view, but for our example, this will suffice.
We should see results from our Snowflake activity. We may also see unexpected rows from serverless compute.
We cannot add v_warehouse_spend directly to our share; otherwise, we receive this error: “SQL compilation error: A view or function being shared cannot reference objects from other databases.” The error is caused by referencing the Snowflake database but using the TEST database for sharing. Instead, we must use a task to extract data into a table in the TEST database and then share the table.
Failure to suspend the task results in duplicate data.
When testing is complete, reset the task schedule to run once daily, and resume the task.
We could add warehouse_spend_hist to our share, but instead, to expand our knowledge, I prefer to create a fn_get_warehouse_spend UDTF. We must create UDTF as a SECURE function. For more information, see the documentation at https://docs.snowflake.com/en/sql-reference/udf-secure.html#secure-udfs .
There is no COPY GRANTS clause for functions. Entitlement must be redone if a function is redefined.
We should see the result set from our earlier SELECT from warehouse_spend_hist noting the credits_used attribute has been coerced to NUMBER (18,6).
Reporting Database Setup
I leave it for you to create a task to schedule data ingestion from the source. Note the dependency upon data extracted into the originating table.
Replication
Replication has been discussed throughout the book. Now, let’s work through setting up replication.
We do not discuss replication in the context of full disaster recovery (DR) implementation; we simply use DR as a target account.
While we can configure replication in the Snowflake user interface, we contend deeper understanding is gained from working through command line invocation recognizing any automation we develop works through scripts and not using the interface. In other words, our replication strategy depends on more than a single user interface-driven strategy, probably with complex dependencies including failover, failback, and client redirect.
Not every object type is currently available for replication. Please refer to the documentation at https://docs.snowflake.com/en/user-guide/database-replication-intro.html#replicated-database-objects . Replication has further limitations. For example, RBAC is not replicated; therefore, the receiving account must implement RBAC locally; databases with external tables cannot be replicated; databases created from shares cannot be replicated.
This example illustrates simple database replication in support of data shares and is not a full treatise on failover and failback, which falls beyond the scope of this book. Also, note there are cost implications to replicating data across accounts, CSPs, and regions.
We need two (or more) accounts enabled for replication. Your organization may prefer each account to be enabled individually, in which case a Snowflake support ticket should be raised. Response times are usually very short. Otherwise, Snowflake support can assign accounts the ORGADMIN role, as documented at https://docs.snowflake.com/en/user-guide/organizations-manage-accounts.html#enabling-accounts-for-replication .
You should see <your_database> in the list of available databases for import. Note that is_primary is “true”.
And refreshing the left-hand pane of our browser where our new database is listed.
With our database replicated, we have two remaining tasks: create and apply a security model and then share desired objects, which I leave you to complete.
Data Exchange
Now that we understand data sharing, let’s investigate Data Exchange, the internal nexus for colleagues, partners, and customers to collaborate, where our citizen scientists discover and self-serve available data sets. Please refer to the Snowflake documentation at https://docs.snowflake.com/en/user-guide/data-exchange.html#data-exchange .
Utilizing Snowflake’s built-in capabilities, Data Exchange is its internal place where data is published to consumers—the people and organizations in and outside the publishing organization boundary.
Data Exchange is limited to 20 accounts for each listing across all CSPs and regions. Consideration must be given to creating a data share topology that does not replicate data silos but meets business needs and facilitates the delivery of successful business outcomes. Throughout this section, there is repeated reference to working with our business colleagues when developing our data exchanges. Only through collaboration will organizations benefit from the power and flexibility that Data Exchange offers. But equally, if done badly, it results in frustration and disillusionment with Data Exchange as a capability.
Request Data Exchange
Business case, team, and/or participants in the Data Exchange: An example might be, “We are developing content for distribution throughout our organization and wish to run a proof-of-concept using Data Exchange.”
Data Exchange name (without spaces): XYZ_CORP_DX
Public Data Exchange name: XYZ Corporation Data Exchange
Your administrative account URL: for example, https://<account>.eu-west-2.aws.snowflakecomputing.com
Your account locator: Run SELECT current_account();
Enabling Data Exchange can take 48 hours.
Snowsight
This section moves from the classic console to use Snowsight , the future-state Snowflake console, because there is no approved command-line approach to implementing Data Exchange features.
If using AWS PrivateLink or Azure Private Link, you may need to create a CNAME record as described in the documentation at https://other-docs.snowflake.com/en/marketplace/intro.html#introduction-to-the-snowflake-data-marketplace .
A new browser window and login dialog appear alongside our existing classic console. Log in, accept the prompt to import tabs, and take a moment to familiarize yourself with the new look and feel.
Accessing Data Exchange
I leave the investigation of data from direct shares for your investigation and focus on XYZ Corporation Data Exchange, our fictional data exchange ready for configuration.
To administer the private exchange, use the ACCOUNTADMIN role.
We may have multiple data exchanges for our organization, which appear alongside our initial data exchange, as shown in Figure 14-12.
Tony Robbins said, “Complexity is the enemy of execution.” I strongly recommend a simple approach to data exchange is adopted by provisioning the minimal number of data exchanges necessary to meet business objectives while retaining meaningful data isolation boundaries.
Managing Data Exchange
In contrast to data share, accounts may be in any region or CSP. Beware of egress charges, however.
All Data Exchange members may publish and consume the data sets of other members, but only the owner account can administer members.
Selecting an existing data exchange member enables their role to be changed, or clicking the three-dotted box allows removal from the data exchange.
If you selected the data exchange owner, you would see the Admin role checked and grayed out. It is not possible to reassign data exchange ownership. This is a Snowflake support function.
We might also wish to add a provider profile, which is useful for informing consumers of our organization’s contact information.
Add Standard Data Listing
With data exchange members added to our data exchange, let’s consider how to manage data sets in the data exchange. We have already encountered the two roles available, which are self-explanatory: providers and consumers.
Standard: As soon as the data is published, all consumers have immediate access to the data.
Personalized: Consumers must request access to the data for subsequent approval, or data is shared with a subset of consumer accounts.
In our example, we use Test Listing as our title, select Standard, then click Next. The title is presented in preview form requiring further information, as shown in Figure 14-19. Note additional tabs for Consumer Requests and Settings for your further investigation.
Once a listing has been published, changing the listing type is impossible.
I strongly recommend your business colleagues are actively involved in the creation of content to populate the listing details. Each listing is a showcase and shop front for your organization’s internal colleagues and selected external customers; therefore, well-crafted descriptions and documentation representing approved messaging are essential.
The terms of service requires your organization’s legal team’s input and approval before publication.
Sample SQL Queries should sufficiently represent consumers to gain insight and pique their interest in using your data.
Limit query runtime and data sets as the provider account pays for consumption costs.
Protect your data model by provisioning access to views in preference to tables.
Although each entry is self-explanatory, there is a lot to consider when provisioning objects into our data exchange. We do not dwell on populating. I leave them for your further investigation.
When complete , click Publish Listing.
Manage Shared Data
Clicking View on Exchanges provides a single page view of listings with summary information which I leave for your further investigation. Navigating to Home ➤ Data ➤ Private Sharing, as shown in Figure 14-12, displays a second listing alongside the original (not shown).
Unpublish a Data Listing
Unpublished listings revert to preview status, as shown in Figure 14-19.
Accessing Data
The View Database screen (not shown) displays the Database Details tab by default. There is a second tab for Schemas. I leave these for your further investigation as the content is self-explanatory. Note the context menu navigates to Home ➤ Data ➤ Databases.
In common with Secure Direct Data Share, RBAC is not imported with shares. You may need to set your browser role to ACCOUNTADMIN or configure a new role with IMPORT SHARE privilege.
A new browser tab opens pre-populated with the sample query and comments from the listing Usage Examples. Add warehouse to execute the sample query and execute.
Data Exchange Summary
Shared By My Account lists each created listing, whether published or not.
Requests lists inbound and outbound data requests along with approval status.
Reader Accounts allows providers to share data with consumers who are not already Snowflake customers without requiring the consumers to become Snowflake customers. See documentation for further information at https://docs.snowflake.com/en/user-guide/data-sharing-reader-create.html .
Having walked through hands-on examples of configuring and accessing objects provisioned via Data Exchange using a Standard Listing, I leave you to develop data access paths for your user community to interact with provisioned data sets.
Provisioning steps for Personalized listings are broadly the same as for Standard Listings, except the consumer must request access to the data set, and the provider must approve the request. These steps are self-explanatory.
Snowflake Marketplace
Now that you understand Data Exchange, the internal nexus for colleagues, partners, and customers to collaborate, let’s investigate Snowflake Marketplace, the external nexus for individual and business collaboration. It is a worldwide environment where our citizen scientists discover and self-serve available data sets. Please refer to Snowflake documentation at https://other-docs.snowflake.com/en/data-marketplace.html#snowflake-data-marketplace and further summary information at www.snowflake.com/data-marketplace/ .
Data Exchange is focused on closed groups with a maximum of 20 participant accounts. Snowflake Marketplace is global in scope with unlimited participation.
Data Exchange is thematic in approach with either a single or limited topics. Snowflake Marketplace has a general approach where all themes and topics are available ubiquitously.
Both Data Exchange and Snowflake Marketplace are evolving rapidly where the boundaries and scope are converging and overlapping. You see an example later in this chapter.
In the same manner as Data Exchange and utilizing Secure Direct Data Sharing capability, organizations create listings on the Snowflake Marketplace, the single Internet location for seamless data interchange. The same security and real-time features remain though the scope of Snowflake Marketplace is external partners and consumers instead of internally focused, as is the case for Data Exchange.
Further information on joining the Snowflake Marketplace or requesting new data sets is at https://other-docs.snowflake.com/en/marketplace/intro.html#how-do-i-request-new-data-or-data-providers-to-be-added-to-the-data-marketplace .
Snowflake claims over 500 listings from more than 160 vendors representing a 76% increase in usage in the six months leading up to June 2021, along with new usage-based purchase options enabling consumers to transact entirely on the Snowflake Marketplace. For more information, see Snowflake news at www.snowflake.com/news/snowflake-accelerates-data-collaboration-with-more-than-500-listings-in-snowflake-data-marketplace-and-announces-monetization-in-the-data-cloud/ .
Becoming a data provider involves populating an online form to enroll in the Snowflake Partner Network . For organizations, this step will likely involve your procurement, legal, and data governance teams as a minimum, along with a clearly articulated and approved proposal on organizational objectives.
Snowflake Marketplace offers standard, sample/purchase, and personalized data listings. This exploration implements a standard data listing while leaving the personalized data listing for you to investigate further.
When writing this book, Snowflake Marketplace was undergoing significant changes, many of which relate to navigation. Although I have attempted to reflect the latest information, further changes will probably render the navigation outdated, but the core data entry screens remain intact.
Setting up as a Provider
Add Standard Data Listing
Snowflake Inc. requires content in Microsoft Word for its operations team to review and approve prior to creating the first listing. This is Snowflake Inc.’s standard practice.
For all listing types (see Figure 14-31), prospective consumers must register their interest by populating contact/company information.
There is an overlap with Data Exchange insofar as private listings may be made on Snowflake Marketplace, and long term, it is possible Data Exchange will be folded into Snowflake Marketplace.
Snowflake Marketplace listings attract a rebate on credits consumed by usage.
Standard: As soon as the data is published, all consumers have immediate access to the data.
Purchase with free sample: Upgrade to full data set upon payment.
Personalized: Consumers must request access to the data for subsequent approval, or data is shared with a subset of consumer accounts.
Once the listing has been created (not shown but very similar to creating a Data Exchange listing), there is one further consideration.
There is one further consideration. Consumption analysis may indicate data set publication may best be focused on particular regions; therefore, I recommend analyzing available data before publishing to all regions.
Accessing Snowflake Marketplace
Immediately we can see the distinct difference between internal Data Exchange and external Snowflake Marketplace. The two listings are standard and personalized as outlined.
Using Snowflake Marketplace
A closer reading of the Standard data set listing shown in Figure 14-33 illustrates the marketplace nature of published data. Both are sample data sets.
Note content may also be searched by category and business need.
Content can also be referenced directly from the corresponding URL https://app.snowflake.com/marketplace/listings/Snowflake%20Inc . You see one or more published data sets.
Selecting a data set results in a detailed screen similar to the one seen when consuming Data Exchange content, which I consider sufficiently familiar to not require further explanation. from our work
Managing Snowflake Marketplace
Assuming all obligations have been met and our request to become an approved data provider ( https://spn.snowflake.com/s/become-a-partner ) to the Snowflake Marketplace is successful, we may now participate in the Snowflake Marketplace. Our participation may impose restrictions according to the type and nature of the data we provide.
Listings must always comply with relevant laws and contractual obligations. We must have legal and contractual rights to share data.
In contrast to data share, but common with data exchange, be aware of egress charges. And like Data Exchange, there is no approved command line approach to implementing Snowflake Marketplace features.
Accessing Data
The Add Roles dialog allows another existing role with IMPORT SHARE entitlement in your Snowflake account to access the imported database.
Then click Get Data.
If all is well, our data is imported into a new database and made available for use; otherwise, you may see an error.
A screenshot of an error message. The message reads your selected role S Y S admin cannot get data. Choose a role that has the import share privilege to get this data or please contact your account administrator. Insufficient privilege to accept data exchange listing terms.
In Figure 14-38, the SYSADMIN role is set at the browser level, underneath the username.
Clicking Query Data opens a new browser screen showing imported databases with sample queries pre-populated and ready for our use.
Snowflake Marketplace Summary
Snowflake Marketplace represents a pivotal point in Snowflake development and delivery, facilitating citizen scientists to rapidly acquire desired data sets into a single account and providing an opportunity to monetize data sets. Participation is critical for Snowflake’s success, and the ease of data set integration presents an almost frictionless opportunity for our organizations to derive immense value.
Automating integration with Snowflake Marketplace requires tooling outside of the user interface. Allowing programmatic interaction, along with improved consumption metrics, will encourage adoption. And Snowflake is signaling steps in this direction.
Summary
This chapter overviewed its objectives before diving into data sharing, explaining why by unpacking the advantages of adopting a new approach. We then developed a simple data share example using two separate Snowflake accounts, noting the use of the ACCOUNTADMIN role and caution over delegating entitlement to lower privileged roles. You began to see the power of sharing, where objects can be added and removed from a share with immediate visibility in the consumer account. Many accounts may consume a single share.
Using your newfound knowledge, we implemented a simple but readily extensible cost monitoring solution sharing daily Warehouse consumption costs from one account to another while introducing UDTFs and CRON scheduling for tasks. And for those who require replication before sharing, we walked through replicating a database between accounts.
Next, we focused on Snowflake Data Exchange, noting the use of Snowsight user interface for configuration, and walked through an example of how to create and manage a standard data listing noting that once published, we cannot change a listing type and unpublishing a listing does not remove existing consumer access.
Lastly, we investigated Snowflake Marketplace identifying how to access listings and publish our organizations’ listings. The process is largely similar to Snowflake Data Exchange, with additional governance.