Gateway architectures

For large-scale deployments of Power BI in which multiple types of datasets and workloads will be supported (import refreshes, as well as DirectQuery and Live connection queries), BI teams can consider multiple gateway clusters. In this approach, each gateway cluster is tailored to meet the specific resource needs (RAM and CPU) of the different workloads, such as large nightly refreshes or high volumes of concurrent queries in the early mornings.

For example, one gateway cluster could be composed of two gateway instances with a relatively high amount of available RAM on each gateway server. This cluster would have resources available during the most intensive scheduled refresh operations (for example, 4 A.M. to 6 A.M.) and would be exclusively used by import mode Power BI datasets and any Azure Analysis Services models that also regularly import data from on-premises sources. A separate gateway cluster would be created based on two gateway instances with a relatively high number of CPU cores on each gateway server. This gateway cluster would be used exclusively by DirectQuery Power BI datasets and any reports based on Live connection to an on-premises SQL Server Analysis Services instance. 

A third gateway cluster, in addition to an import and a DirectQuery/Live connection cluster, could be dedicated to business-led BI projects. For example, as described in the On-premises data gateway versus personal mode section earlier in this chapter, certain data sources maintained by business teams (for example, Excel workbooks) may require the high availability and management benefits of the On-premises data gateway. Generally, this self-service cluster would be oriented toward scheduled refresh operations, but organizations may also want to empower business users to create DirectQuery datasets or reports based on Live connections to SSAS instances (on-premises).

In the following example from the Manage gateways portal in the Power BI service, two gateway clusters have been configured:

Manage gateways in Power BI service

As shown in the preceding image, the two Gateway Clusters (Frontline Gateway and Frontline Gateway 2) have been configured to support different data sources. As noted in the previous section, the individual gateway instances installed for each cluster are not currently accessible from the gateway portal but can be accessed via PowerShell scripts. Each cluster represents a single logical unit of gateway resources for its given data source(s).

If gateway clusters are created for specific workloads (for example, import versus DirectQuery), it can be helpful to note this both in the Gateway Cluster Name and in its Description. It's not recommended to allow a single point of failure but if only one gateway server is used in a cluster then the name of this server can be included in the cluster name and description.

The following diagram depicts a gateway cluster being used to support a scheduled data refresh of a Power BI dataset:

Scheduled data refresh via gateway cluster

With the data source(s) configured in the Manage Gateways portal in the Power BI Service, a scheduled data refresh for an import mode dataset can be configured to use the Gateway Cluster. The Gateway Cluster receives the query request at the scheduled time and is responsible for connecting to the data source(s) and executing the queries that load/refresh the tables of the Power BI dataset. Once the dataset in the Power BI Service is refreshed, dashboard tiles based on the dataset will also be refreshed and reports built against the dataset will issue queries against the dataset.

Given that the report queries are local to the refreshed dataset within the same Power BI Service tenant, and given the performance optimizations of the engine running within import mode Power BI datasets (that is, columnar compression, in-memory), query performance is usually very good with this deployment.

Currently, the entire Power BI dataset must be fully refreshed in each scheduled refresh operation. This is very inefficient, and in some scenarios infeasible, for large datasets as resources are needed to load both historical (unchanged) data and new data. However, incremental data refresh is a top feature identified on the Power BI Premium roadmap and, along with other roadmap features, will help support larger Power BI datasets. Additional details on Power BI Premium are included in Chapter 13, Scaling with Premium and Analysis Services.

The following diagram depicts two gateway clusters being used to support both the scheduled refresh of an import mode dataset and a Live connection to a SSAS tabular instance:

Multiple gateway clusters

Gateway Cluster A in the preceding diagram functions just like the previous diagram in supporting scheduled refreshes of import mode datasets. Gateway Cluster B has been created to exclusively support queries requested via Live connections to an on-premises SSAS database—an SSAS Tabular model in this scenario. Given the high volume of query requests generated by users interacting with Power BI reports based on the SSAS model, the servers used in Gateway Cluster B can be provisioned with additional CPU cores and actively monitored via performance counters for changes in utilization.

In addition to the interactive query requests from Live connection reports, owners of datasets can configure a scheduled refresh for the cache supporting dashboard tiles based on Live connection reports. Guidance on configuring this feature is included in the Dashboard cache refresh section at the end of this chapter. 

The description of Gateway Cluster B is also generally applicable to DirectQuery datasets based on supported sources, such as SQL Server, Oracle, and Teradata. Just like Live connections to SSAS, reports built against these datasets will also generate high volumes of queries that must go through the gateway cluster and be returned to the Power BI service tenant.

Given the additional latency created by the requests for queries and the transfer of query results back to the Power BI service, it's especially important to develop and provision efficient data sources for DirectQuery and Live connection reports. Two examples of this include using the clustered columnstore index for SQL Server and optimizing the DAX expressions used for measures of an SSAS model.

Additionally, organizations can consider Azure ExpressRoute to create a fast, private connection between on-premises infastructure and Azure. The following URL provides documentation on this service: http://bit.ly/2tCCwEv.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.164.141