© Ásgeir Gunnarsson and Michael Johnson 2020
Á. Gunnarsson, M. JohnsonPro Microsoft Power BI Administrationhttps://doi.org/10.1007/978-1-4842-6567-3_14

14. On-Premises Data Gateway

Ásgeir Gunnarsson1   and Michael Johnson2
(1)
Hafnarfjordur, Iceland
(2)
St. Andrews, South Africa
 
Read this chapter if you would like to find out more information about
  • What the On-premises data gateway is

  • When a gateway is needed

  • How to set up the gateway

  • Managing the gateway

  • Monitoring the gateway

  • Troubleshooting the gateway

When Power BI report developers build reports, they do so using the Power BI desktop application. This application is usually run within an organization’s network and has access to resources such as databases and file systems. When these reports are published to the Power BI Service, the service may not have the same access to local resources as the developer had. When this happens, a mechanism to provide the same level of access to these local resources to the Power BI Service is required. This mechanism is the On-premises data gateway that we will refer to simply as the gateway. The gateway acts as an agent that runs securely within the organizational network or on a standalone machine and provides the Power BI Service access to the required resources.
../images/496939_1_En_14_Chapter/496939_1_En_14_Fig1_HTML.jpg
Figure 14-1

Role of the gateway

The gateway is not only used for Power BI but also supports a growing number of services running in Azure. Some of these applications include
  • Power Apps

  • Power Automate

  • Azure Logic Apps

  • Azure Analysis Services

Many principals that apply to Power BI also apply to these other services. In this chapter, before embarking on a process of setting up the gateway, it first needs to be established if the gateway is even required.

When is a gateway needed?

Gateways are required whenever the source of data, required by the Power BI Service, cannot be accessed directly over the public Internet. This can generally be for one of two reasons.
  • The data source resides on a private network

  • The data source is not supported directly by the service

The data source resides on its own private network

The Power BI Service is cloud-based, meaning that it was built to run only in the cloud. The service also assumes that other services are available over the Internet. While this is true of many Platform-as-a-Service (PAAS) and Software-as-a-Service (SAAS) applications, it is not always the case with on-premises and Infrastructure-as-a-Service (IAAS) applications.

Example of data sources that require a gateway:
  • On-premise databases such as SQL Server, MySQL, and Oracle

  • Files and folders stored in a local file system or network share

  • ODBC

  • SQL Server installed on an Azure Virtual Machine

The data source is not supported in the service

The gateway is also required when Power BI uses a data source that is not supported directly by the Power BI Service. An example of this is the R and Python data sources. Each of these languages supports a variety of versions and libraries that are not compatible with each other, and are thus not supported in a shared infrastructure environment like the Power BI Service and require the gateway to be set up and configured on servers for this use.

Examples of data sources not supported by the service
  • Python scripts

  • R scripts

  • Web data source using the Web.Page() function

Note

R and Python scripts are not supported on the standard gateway. Therefore they need to be set up using the personal gateway.

A full list of data sources and their need for a gateway can be found at https://docs.microsoft.com/en-us/power-bi/connect-data/power-bi-data-sources.

Gateway architecture

When we speak of the gateway, we usually think of the application that is installed in the organizational network. The gateway is actually a collection of applications and services that work together to facilitate the movement data. Each data movement follows a simple pattern that we discuss after we look at the components that make up the system.

Components of the gateway

The gateway consists of three core systems that create a secure and reliable communication channel between the Power BI datasets and their data sources. Each of these services contains multiple components, but we will not go to that level of detail.
../images/496939_1_En_14_Chapter/496939_1_En_14_Fig2_HTML.jpg
Figure 14-2

Power BI gateway architecture

Figure 14-2 shows the basic layout of the gateway; the three components are
  • Gateway Cloud Service : is responsible for handling all requests for data from the Power BI Service; such requests occur during on-demand or schedule refreshes, reports using DirectQuery and live connections, and when using Power Query online such as when designing a dataflow. The gateway cloud service is responsible for scheduling data requests as well as securing the credentials used to perform the refresh.

  • Azure Service Bus : Provides secure and reliable messaging between the Gateway cloud service and the On-premises data gateway. This simply means that the On-premises data gateway and the Gateway cloud service never communicate directly with one another, but rather pull and push messages off the Service Bus. These messages can be requests for data as well as the results of these data requests.

  • On-premises data gateway : Receives requests for data over the Azure Service Bus. These requests are processed and then returned to the Power BI Service.

With a secure and reliable channel created through which data requests get submitted, we look in more detail how those requests are processed.

Typical steps for a data request

All requests for data follow a similar pattern, whether they are from scheduled data imports or reports using DirectQuery or live connections. These steps are
  1. 1.

    Gateway Cloud Service receives a request for data from the Power BI Service.

     
  2. 2.

    Gateway Cloud Service packages the data request together with credentials required for the query.

     
  3. 3.

    The package gets placed on the Azure Service Bus.

     
  4. 4.

    An available Gateway member pulls data requests securely from the Azure Service Bus.

     
  5. 5.

    On-premises data gateway decrypts query and credentials.

     
  6. 6.

    On-premises data gateway executes the query; this can be a DirectQuery or a Live connection query in which case it is sent directly to the data source or an import query, in which case it will invoke the mashup engine to perform the data import.

     
  7. 7.

    On-premises data gateway returns the results of the request to the Gateway Cloud Service via Azure Service Bus.

     

This process is the same for both the standard or personal gateways.

Power BI Gateway workloads

Power BI supports three storage modes; each storage mode stores and queries data differently and, therefore, has different data access requirements. The first storage mode is called import mode and is a cached model, meaning that it imports and stores data. When a report uses import mode, Power BI imports all the data it needs into its own internal storage called Vertipaq engine using a process known as the mashup engine which executes the Power Query commands. When users interact with reports, the reports generate queries against the xVelocity engine. The other two modes are called direct and live storage modes. We refer to these two modes collectively as the non-cached modes. In the non-cached modes, Power BI generates queries for each visual that needs to render at the time of the report execution, meaning that there is no need for periodic refreshing of the datastore.

The gateway enables the data movement requirements for all workload types in which the data is not available in the public cloud. Each workload type consumes CPU, Memory, and network differently. Therefore a gateway that is optimal for one workload type may not be ideal for another.

A full list of data sources, their support for the gateway, and support for live connections and DirectQuery is found at https://docs.microsoft.com/en-us/power-bi/connect-data/power-bi-data-sources.

Cached workloads

Cashed datasets are the most common workload types in Power BI and include all reports built using an imported data source as well as all dataflows. In a cached dataset, all data, defined by the source queries (written in Power Query) are imported and stored in the model first. This transformation, performed by the mashup engine, is the same process used by Power BI desktop. The dataset needs to be loaded first before it can be queried and needs to be periodically refreshed for the report data to remain current. Because cached datasets store data in their internal data store, they tend to be much larger than non-cached datasets that store only metadata. When a cached dataset is refreshed, the required processing commands are sent from the Power BI Service to the gateway. Each request is then processed in three distinct steps.

Connect

Before the gateway can transform data, it first needs to connect to and download the data sources. Before connecting the gateway retrieves the authentication credentials required to connect to the data source; these credentials are securely stored in Azure Key Vault, which can be used to securely store keys and other secrets.

Note

You can read more about Azure Key Vault at https://azure.microsoft.com/en-us/services/key-vault/.

Integration with Azure Key Vault is seamlessly built into the Power BI Service and does not need to be managed by the organization or administrator. With the authentication details, the gateway can then authenticate against the data sources using one of many authentication protocols, including OAuth, basic or Windows authentication. Once the data has been securely copied to the gateway, it can then be transformed.

Transform

Once the data has been loaded into the gateway, the Power Query command is executed by the mashup engine. Such commands include the creation of new columns, rows, and even tables. These operations can be very resource heavy, including the use of a lot of memory. Ensuring that data is filtered as close to the source as possible reduces the amount of data that is processed by the mashup engine. Only planning and monitoring can correctly determine the correct amount of resources required for this step. During the transform phase, data may be spooled to disk; in the event of large datasets being used, it is recommended that high throughput storage be used, such as solid-state storage.

Transfer

Once all transformations have been completed by the gateway, the result sets need to be returned to the Power BI Service either for further processing or to be loaded into either Power BI data models or dataflows. This transfer is made securely over an encrypted channel. Before transmission, the result set is compressed to reduce the amount of data that needs to be moved over the Internet. Both the compression and encryption of this data require CPU and a good network to ensure data is moved quickly to the service.

Non-cached workloads

Power BI supports two non-cached storage modes, Live connections and DirectQuery. In both these storage modes, no data is sourced from the data source before it is queried. Instead, queries are generated and executed at runtime.
  • Live connections: Connections made against an Analysis Services model; this can be a tabular or multidimensional model. Internally, Power BI also uses an analysis service tabular model, so there is no need for the service to translate the underlying DAX queries into another query language.

  • Direct Queries: Made against data sources that have a compatible query language and are supported in Power BI; such data sources include relational databases, Hadoop, Spark, and others. The primary reason report designers use DirectQuery is because of large data sizes or the need for near real-time results.

Both storage modes use the gateway as an agent to relay queries from the Power BI Service to their on-premises data sources. Similar to the cached dataset, the gateway fist needs to access the credentials required by the data source before opening a connection. There is no transformation step involved, so the mashup engine is not invoked. The result set for the query is compressed and encrypted. Non-cached workloads, therefore, do not have the same requirement for large amounts of memory, but a good CPU and network connectivity are vital as these directly affect report performance.

On-premises data gateway modes

The gateway is available in two modes; each of these modes has a different target audience. The gateway installed in standard mode is the recommended gateway mode; this mode was initially known as the enterprise data gateway as it made more of the features that enterprises are looking for. The second mode is called personal mode and was designed for users who need to easily set up gateway but are not very concerned about the availability of the gateway.

Note

While it is possible to install both a standard and personal gateway on a single computer, this is not generally recommended as both gateways will consume the same resources which may result in poor performance.

Personal gateway

The personal mode gateway exists primarily as an easy to configure and run version of the gateway. There are instances where it is required over the standard gateway. In personal mode, the gateway is installed as a user application. As a user application, local administer rights are not required, and the application runs under the identity of the user who executed it. By running as the user, the gateway mostly has access to everything that the user has access to, making setup and configuration easy. This is useful, as many organizations do not grant local administrator rights to users on their computers; configuring additional service accounts and granting those accounts access to the data source are also something that users are not able to do without the help of IT. This lower barrier to entry makes the personal gateway ideal for developers and other power users.

The personal gateway can only be used by the owner of that gateway. The use of that gateway is also dependent on that gateway being available. If it has been installed on a laptop or similar type device, then it is unavailable while the machine is in sleep mode or has been disconnected from the network, including any required VPN connections.

Advantages of the personal gateway
  • Easier to set up

  • Supports sources such as R and Python that are not supported in the standard gateway

  • Easy to use permission as the application runs under users account

Disadvantages of the personal gateway
  • No support for High Availability

  • Can only be used by a single user

  • Does not support other Power Platform applications such as Power Apps and Power Automate

  • Does not support Dataflows and Paginated reports

Step-by-step

For instructions on how to install the personal mode On-premises data gateway, go to https://docs.microsoft.com/en-us/power-bi/connect-data/service-gateway-personal-mode.

When reports are promoted to a production-like environment, ensuring access to the gateway becomes very important, if not critical. In these instances, the personal gateway is not sufficient, and a reliable version of the gateway is required; this is where the standard gateway comes into play.

Standard gateway

The standard gateway was built to provide a reliable gateway for use throughout the entire organization. The Standard gateway is the recommended gateway mode for use for anything that is deemed “in Production.”

Tip

Don’t set up a single gateway to be shared by development, testing, and production. Instead, set up three separate gateways. This will avoid any impact that test and development may have on production and allows you to test new versions of the gateway before upgrading your production environment.

Unlike the personal gateway, the standard gateway is installed as a service, preferably on dedicated machines running windows server. The standard gateway supports clustering and load balancing, helping organizations ensure reliable and consistent performance, making it the preferred choice for production workloads.

As a service, the standard gateway will run under its own local account; by default, this is NT SERVICEPBIEgwService. This service account can be changed to another local or domain service account.

Advantages of the standard gateway
  • Supports gateway members running in a cluster for high availability.

  • Report load can be distributed across nodes in the cluster.

  • Support for DirectQuery and Live Connections.

  • Supports dataflows.

  • Supports Paginated reports.

  • Can be used by multiple users.

Disadvantages of the standard gateway
  • Often requires members of the IT department to install.

  • Service account required for windows authentication SSO through Kerberos to be used.

  • Does not support R and Python; therefore there is no high availability option for these data sources.

Step-by-step

For instructions on how to install the standard On-premises data gateway, go to https://docs.microsoft.com/en-us/data-integration/gateway/service-gateway-install.

Regardless of which gateway mode is installed, the underlying architecture of the gateway remains the same.

High availability and scaling workloads

When Power BI is being used to support business-critical reporting, it is essential that the service be reliable and available. High availability in the Power BI Service is managed by Microsoft, but ensuring the availability of the gateway falls on the organization and the gateway administrators.

The standard gateway enables high availability using gateway clusters. Gateway clusters are collections of standard gateways that form a single logical gateway in the Power BI Service. If for some reason, a cluster member were to become unavailable, for example, during a software update or hardware failure, then other members in the cluster would seamlessly take on the activities of the unavailable node, ensuring uninterrupted service.
../images/496939_1_En_14_Chapter/496939_1_En_14_Fig3_HTML.jpg
Figure 14-3

Gateway cluster

The creation of a gateway cluster is straightforward and does not rely on technologies such as Windows Failover Clustering to work. All gateways setup in standard mode are actually created as clusters of one member. We will discuss how to set up the gateway and clusters later in this chapter.

Once the gateway is set up, each member in the cluster polls the Azure Service Bus for new data requests. These data requests are not allocated to specific nodes but instead assigned to the first available node to pull them off the bus. While it is not possible to allocate rules to each node based on a source or workload type, it is possible to manage the load balancing of each node to ensure that a node is not overworked. This is done by setting the CPU and memory thresholds.
  • CPUUtilizationPercentageThreshold: This is the maximum amount of CPU that the gateway can consume while still accepting data requests. A value of 0 indicated that throttling does not take place.

  • MemoryUtilizationPercentageThreshold: This is the maximum amount of memory that the gateway can consume while still accepting data requests. A value of 0 means that there will be no throttling.

  • ResourceUtilizationAggregateionPeriodInMinutes: This number sets the time that the memory and CPU are aggregated by. This is important, especially in the case of CPU, where short bursts of 100% CPU usage are common on an otherwise idle processor . The default is set to 5 minutes; this effectively means that if one of the threshold values is breached, then the gateway will not accept new requests for at least 5 minutes.

    See the section on configuration on how to adjust these settings.

When these limits are breached, the gateway will not accept any requests until this value falls below the threshold. If none of the gateway nodes in the cluster are able to process the data request, then the request fails.

Advantages of gateway clusters
  • Removes a single point of failure.

  • Requests can be distributed among all gateway nodes in the cluster, ensuring balanced workloads.

  • By updating only a single node at a time, the administrator can ensure that the clusters remain online.

With a little theory about the gateway architecture complete, next , we can look at how to set up the gateway.

Setting up the On-premises data gateway

Unlike most of the infrastructure used in Power BI, gateway administrators or users are fully responsible for setting up and maintaining the gateway. It is essential that administrators set up the gateway correctly to ensure that reliable and timely data requests are performed.

Users will often install the personal gateway on their own personal devices. While this is unlikely that these setups conform to best practices, this may be all right for these occasional ad-hoc users. Plus ensuring that standard gateways are correctly configured and maintained is one of the primary responsibilities of the gateway administrators. We begin by looking at some of the requirements for the gateway.

Requirements

At the time of writing, the Power BI gateway requires a compatible windows 64-bit operation system, which can be at least Windows 8 or Windows Server R2 with .Net 4.6 installed. Newer versions of the gateway require .Net 4.7.2. It is recommended to check on the minimum versions of both the .net framework and OS before installing the gateway. This does mean that unsupported OSs such as Linux will not work, and it is not uncommon to find a lone Windows machine in a predominantly Linux environment set up to act as the gateway.

The gateway has the following requirements:
  • Windows x64 (Windows 8 or Windows Server 2012 R2).

  • .Net 4.6.

  • Cannot be installed on Windows Core.

  • It can be run in a virtual environment.

  • The machine cannot be a domain controller.

Sizing

Apart from the OS and .Net requirements, there isn’t much documentation on the minimum hardware specs required to run the gateway. However, the gateway does come with the following recommended requirements:
  • 8-core CPU.

  • 8 GB of memory.

  • Solid-state drive.

  • Where possible, install the gateway on a dedicated computer.

  • Although it is possible to install a gateway in both personal and standard mode on a single computer, this is not recommended.

These are only recommendations by Microsoft, and computers that do not meet these requirements may still work, but should be tested first. These recommendations should be used as a starting point, and the computer can be upgraded if necessary. It is also important to consider the workload type. For example, in environments that use only DirectQuery or Live Connections, the gateway computers may not need as much memory or a solid-state drive, as these workload types do not consume as many resources. When import mode is used, however, ensuring that there is enough memory available helps avoid writing the content of the data transformations to disk while the mashup engine is transforming it. It should also be considered that multiple datasets may be refreshing at the same time, meaning that two, three, or more times the amount of memory may be required compared to using Power BI Desktop to load data.

Tip

Installing the gateway in a virtual machine gives you the flexibility to resize the computer when necessary.

Placement

In addition to the computer that the gateway is installed on, how the computer is connected to the network plays a big role in the overall performance. In both cached and non-cached scenarios, the latency between gateway, the data source, and the Power BI Service affects report refresh and report response times. This is especially relevant on gateways setup in personal mode, that are usually installed on desktops, or even laptops that could be using a slow LAN or even a wireless network.

The recommendation for the placement of the gateway is to set them up as close to the data source as possible, and on a network with the best outgoing Internet connection possible. Administrators may be tempted to install the gateway on the physical host of the data source; this is generally not recommended, especially in the case of services such as SQL Server that may have a high per-core cost that would be consumed by the gateway.

Downloading and installing the gateway

The install file for the gateway can is downloaded in several ways. The easiest way is to navigate to the downloads section under the settings pane and select the gateway. This directs you to a page where you can choose between the standard and personal mode gateway. Some installations will provide a single installer for both the standard and personal gateway.
../images/496939_1_En_14_Chapter/496939_1_En_14_Fig4_HTML.jpg
Figure 14-4

Download Gateway installer

Tip

It is possible to download and install the gateway using PowerShell. This is a good approach in large environments where several gateways need to be installed. See the session on automation later in this chapter for more on how to use the gateway cmdlets to install power BI.

../images/496939_1_En_14_Chapter/496939_1_En_14_Fig5_HTML.jpg
Figure 14-5

Accept terms and install folder

Once the gateway is installed, the setup wizard guides you automatically into the setup process. If for some reason, you are not ready to proceed with setting up the gateway, you are able to cancel. This does not uninstall the gateway but leaves it in an unconfigured state. This is the same state that that the gateway is left in when installed via PowerShell.

Configuring the gateway

Once the gateway is installed, the next step is to configure it. If you installed the gateway manually using the installer, then you are directed straight through to the configuration screen. If you installed the gateway using PowerShell or if you canceled the setup after the install, then you can enter the configuration panel by searching for the On-Premises gateway in the Start Menu; this is the same for when you want to change the configuration of the gateway.
../images/496939_1_En_14_Chapter/496939_1_En_14_Fig6_HTML.jpg
Figure 14-6

Open the On-premises data gateway configuration screen

The first step is to provide the credentials of the user who will administer the gateway. The credentials are used to both authenticate the gateway to the service and make the user an administrator of the gateway. It is good practice to ensure that the Power BI administrators set up the gateway rather than have users or report developers perform this task to ensure a separation of concerns.
../images/496939_1_En_14_Chapter/496939_1_En_14_Fig7_HTML.jpg
Figure 14-7

Login with Gateway administrators account

After successfully authenticating the administrators’ account, the next step is to choose the configuration type:
../images/496939_1_En_14_Chapter/496939_1_En_14_Fig8_HTML.jpg
Figure 14-8

Main setup options

  • Register a new gateway on this computer: You choose this option if you want to create a new gateway cluster or if you want to add a new member to the cluster.

  • Migrate, restore , or takeover an existing gateway: You choose this option if you want to
    • Move a gateway to a new computer: If you need to move the gateway from one computer to another, then you can use this to do so. This is particularly useful when the gateway is offline and cannot be recovered. An alternate approach to this is to add a new member to the cluster, make the new member the primary member, and then remove the old member. To do this, the primary member must be online.

    • Recover a damaged gateway: If the gateway has been damaged, then this option can be used to get the gateway online again.

    • Take ownership of a gateway: If the gateway was installed using an account that should not be a gateway administrator, then this can be used to change the account associated with this gateway.

Register a new gateway on this computer

Creating a new gateway cluster is as simple as providing a name for the gateway and choosing a recovery key. The name of the gateway should conform to the Power BI gateway naming standard that your organization has in place; it is recommended that this name include the fact that it is a gateway as well as which environment this gateway is to be used in. Although it is possible to use the same gateway for multiple environments, this should be discouraged as development or testing workloads can have negative consequences on the production environment.
../images/496939_1_En_14_Chapter/496939_1_En_14_Fig9_HTML.jpg
Figure 14-9

Configuring a new Gateway cluster

The recovery key is required for additional nodes to be added to the cluster, repair the cluster member, and/or to upgrade the gateway. Ensuring that this key is recorded and kept secure is important.

Tip

if your organization does not have a secure means of storing such credentials, then you should consider using Azure Key Vault. https://azure.microsoft.com/en-us/services/key-vault/

Adding a new member to an existing cluster

Adding another gateway to an existing cluster is also a relatively straightforward task. Again you are asked to provide a name for the gateway. You will then enable the “Add to an existing gateway” option that returns a list of existing gateway clusters that the user is an administrator for.

Steps
  1. 1.

    Capture the new name of the cluster, again applying the organization naming convention.

     
  2. 2.

    Enable “Add to an existing gateway.”

     
  3. 3.

    Select the gateway cluster to add the new node to.

     
  4. 4.

    Provide the recovery key used for the gateway cluster.

     
../images/496939_1_En_14_Chapter/496939_1_En_14_Fig10_HTML.jpg
Figure 14-10

Adding a gateway to a cluster

Network connectivity

The Power BI gateway requires connectivity to the Internet. By using the gateway, Power BI removes the need to allow incoming connections; these are connections that originate somewhere on the Internet and would need to pass through the corporate firewall. For good reasons, most organizations prohibit such connections. Instead, the gateway makes an outgoing connection to the Power BI Service via the Azure Gateway Service and uses Azure Service Bus as a messaging system.

Depending on how the setup of your network is configured, you may need to request that the firewall administrator open up certain ports and URLs to allow outgoing connections. This is common in secure environments such as banks and other regulated environments. All traffic passes through port 433, and a list of URLs can be found at

https://docs.microsoft.com/en-us/power-bi/admin/power-bi-whitelist-urls

../images/496939_1_En_14_Chapter/496939_1_En_14_Fig11_HTML.jpg
Figure 14-11

Testing network access for the gateway

There is an option to run a network diagnostic assessment that will let you know of any problems.

If your organization uses a proxy to access the Internet, you can read how to configure that here https://docs.microsoft.com/en-us/data-integration/gateway/service-gateway-proxy.

Updating the gateway

New versions of the gateway are often released, and the administrator must ensure that these are kept up-to-date. New releases of the gateway not only provide performance and security enhancements but also enable new functionality by updating the gateway to use the latest mashup engine that may be required by datasets built using the latest version of the Power BI desktop application.

It should be noted that the gateway becomes unavailable for at least the duration of the upgrade and possibly longer if issues are experienced, so upgrading the gateway should only be done during approved windows for business-critical workloads. If the gateway is installed as part of a gateway cluster, then this upgrade procedure is simpler as each node in the cluster can be upgraded independently of one another, reducing the risk of an outage. It is recommended that you first upgrade one of the non-active nodes in the cluster first.

Warning

As the personal gateway does not support clustering, all updates result in some downtime.

Data sources

After the gateway has been created, data sources can be added to the gateway. Data sources are preconfigured connections to on-premise data sources such as files, local intranet, or databases that can be reused by datasets written against those data sources. Data sources give the gateway administrators control over the data sources used via the gateway. To create a data source on a gateway, the user needs to be a gateway administrator. Adding a data source can be done before a report is deployed or while setting up the dataset connection settings.
../images/496939_1_En_14_Chapter/496939_1_En_14_Fig12_HTML.jpg
Figure 14-12

Add or remove a data source from the gateway

Tip

The option to add a data source or remove the gateway can be hard to find as ellipses (…) only show when hovered over at the end of the gateway pane.

../images/496939_1_En_14_Chapter/496939_1_En_14_Fig13_HTML.jpg
Figure 14-13

Configuring a new data source

Report writers are then able to use those data sources by selecting the gateway that they want to use, then mapping the connection to a data source using the “Maps to” dropdown that lists available sources.
../images/496939_1_En_14_Chapter/496939_1_En_14_Fig14_HTML.jpg
Figure 14-14

Mapping data sources in the gateway

If the user who is mapping the data source is not an administrator of the gateway, then users can be granted permission to use the gateway on the datasource settings page. This needs to be done for each datasource and does not grant the user access to all data sources on the gateway.
../images/496939_1_En_14_Chapter/496939_1_En_14_Fig15_HTML.jpg
Figure 14-15

Adding a user to a data source

These users can configure reports that they manage to use the data source without requiring authentication details, providing an additional separation of responsibilities, and allowing the gateway administrator to change authentication details without requiring the report owners to take any actions.

Custom connectors

Custom connectors allow organizations to create or obtain connectors to data sources that may not be supported through the default Power BI data sources. Custom Data Connectors are usually created by Independent Software Vendors (ISVs) to allow their clients to connect to their source data. When such custom connectors are used in Power BI desktop, the use of such connectors first needs to be enabled in the desktop tool.
../images/496939_1_En_14_Chapter/496939_1_En_14_Fig16_HTML.jpg
Figure 14-16

Configure folder for custom connectors

During report development, the report designer will download or sometimes create their own custom connector. The custom connector uses a .Mez file extension and is placed in a particular folder, usually in the user’s personal folders. Something like c:usersUserNameDocumentsPower BI DesktopCustom Connectors. Unlike custom visuals, the custom connector is not embedded into the.PBIX file itself. For the custom connector to work during data refresh, it needs to be available to each of the gateways. This means creating a folder on each host and ensuring that a copy of the correct .Mez file is available.

Warning

Custom connectors can be used to execute malicious code or can result in data being compromised. Custom connectors should only be used from a trusted source.

Monitoring and troubleshooting

Monitoring the performance of the gateways is an essential administrative task. An effective monitoring strategy ensures that you not only understand how the gateways are performing but also when action should be taken for a poorly performing gateway. There are two levels of monitoring that need to be performed:
  • General health: Monitoring the general health of the gateway involves monitoring not only the use of CPU, memory, disk, and network but also the number of queries being processed by the gateway. Such monitoring activities are often performed in near real-time for early detection of potential issues.

  • Query performance and troubleshooting: Monitoring the general time taken for specific queries, groups of queries, or data sources to return as well as the frequency of failed queries and timeouts. Such monitoring is often not performed in real-time and may only be evaluated periodically or in the event of errors.

General health

Monitoring the health of the gateway is similar to monitoring other applications in the Microsoft ecosystem. Basic monitoring focuses on four primary resources:
  • CPU

  • Memory

  • Disk

  • Network

No application can function at its optimal level if any of these resources is in short supply. In addition to the four primary resources, a good general health monitor will watch out for other signs that the system is under strain or if there are other problems. In the case of the gateway, this could include query timeouts or failures. For a quick understanding of the four primary resources, many users and administrators use Task Manager.

Task Manager

Task Manager is an easy to use application that can quickly give the user an overview of the health of the system. In addition to providing system-level counters, users can drill down to specific applications and processes.
../images/496939_1_En_14_Chapter/496939_1_En_14_Fig17_HTML.jpg
Figure 14-17

Viewing gateway performance using task manager

While Task Manager is simple to use, it is not a suitable candidate for reliable monitoring as it required the administrator to be logged into the terminal and monitoring the system. This is impractical for continued usage; instead, Performance monitor provides a better functionality.

Performance Monitor

Performance Monitor, also known as Perfmon, is a system monitoring tool that can be configured to provide in-depth counters on a number of key matrices. Performance Monitor works by aggregating a number of events over a given timespan. As all counters are simple aggregations, these counters are not very helpful in diagnosing specific problems but are very useful in determining if a problem exists. An example of this is monitoring failed queries; while the counter will not help you identify the failed query, it alerts you to the fact that a query has failed. While Performance Monitor can be used in a similar way to Task Manager, as a real-time display of counter values. A better solution is to create a data collector that can be used to systematically log counters and their values on a schedule. These values can then be stored to various destinations such as text files. These stored results can then be collected and analyzed on a schedule using a tool such as Power BI; however, for a real-time monitoring solution, you can log these counters directly into a SQL Database where a new real-time dashboard and alerts can be created.
../images/496939_1_En_14_Chapter/496939_1_En_14_Fig18_HTML.jpg
Figure 14-18

Setting up a performance monitor collector set

One of the core reasons for collecting these counter values is to build a baseline for the cluster. A baseline is simply an understanding of what the computer’s resource usage looks like when it is stable. This becomes important during times of crisis when the administrator needs to identify the source of a problem. If you understand what the expected or baseline values are, then it becomes easier to determine if there is a problem. An example of this is finding out that the gateway is consuming 80% of the CPU; this simple measure by itself does not tell you if something is wrong, but if this measure is usually at 40%, then that is an indication of a problem and it should be investigated. It is also good to be able to identify failing queries before users start to complain.

Both the system counters and the gateway counters can be collected into a single data collector set allowing you to build reports and alerts; next we will look at some of the important system and gateway-specific counters.

Important system counters

There are thousands of possible counters that can be configured when monitoring, but it is recommended that you only monitor important counters as monitoring can result in strain on the computer and begin to impact the system. It should be noted, however, that Performance Monitor counters are relatively lightweight and should not have an adverse impact on performance. Some of the counters that you will want to monitor are
  • Processor(_Total)\% Processor Time: This counter tells you how busy the server’s CPU is. Like all counters, there are no golden numbers that will indicate a problem, and the CPU spiking to 100% for short durations is perfectly acceptable. When the average CPU usage rises above 80%, then there is likely a cause for concern.

  • SystemProcessor Queue Length: This counter tells you how many threads are waiting for the processor; any number greater than 0 is an indication that the CPU cannot keep up with the current workload.

  • MemoryAvailable Mbytes: This counter measures the amount of free memory on the server. Memory is a valuable resource, and it is essential to ensure that there is enough free memory available to handle requests. When there is not enough memory available, the server pages some of that memory to disk, which is much slower than a memory. Ideally, this number remains greater than 0.

  • Paging File(_Total)\% Usage: When the operating system runs out of memory is begins to swap data out to disk. Moving data out to disk is several orders of magnitudes slower than those same operations happening in memory, so you want to avoid this happening at all if possible. If this is happening, add more memory or add more members to the cluster to distribute the load.

  • PhysicalDisk(_Total)Avg. Disk sec/Read and PhysicalDisk(_Total)Avg. Disk sec/Write: This counter reports the number of read and write operations that are happening on the disk.

  • Network interfaceBytes total/sec: This counter indicates the rate at which the network adaptors are processing data. This number should remain below 50-60% of the sustainable bandwidth of the network.

On-premises data gateway counters

The On-premises data gateway provides several gateway-specific counters that the administrator can use to monitor gateway activity.
../images/496939_1_En_14_Chapter/496939_1_En_14_Fig19_HTML.jpg
Figure 14-19

Getting a full list of gateway counters

At the time of writing, 27 gateway-specific counters can be monitored. Each counter monitors a different workload type and it is not unreasonable to monitor all 27. If, however, you know that some counters are not relevant to your environment, then you may exclude them. Some of the counters that you will want to pay attention to are
  • # of all queries executed/sec: This counter will give you a count of the number of queries executed per second. If your organization uses only imported datasets, then this number will likely be small. For reports that use DirectQuery and live connections against on-premises data sources, this number can be a good indication of heavy use.

  • # of queries failed/sec: Any number greater than 0 should be investigated as it may be cause for concern and indicate problems with the source system. Timeouts against the production OLTP system can be an indication report beginning to affect the operations of the system.

  • # of items in the Service Bus pool: The gateway uses Azure Service Bus to send data requests to the gateway reliably; this counter lets you know if the gateway cluster can satisfy all incoming requests. There is no golden value here, and each environment will differ, which is why baselining this value is important. If this number begins to climb, it is an indication that more queries are being sent to the cluster than the cluster can resolve. To fix this problem, the cluster can be scaled up or out by adding more resources to the gateway computers or adding more members to the cluster. Additionally, the developer can be asked to look for more efficient ways to query the data. (Dataflows could be an appropriate solution.)

Note

Adding additional nodes to the cluster may be meaningless if the “Distribute requests across active gateways in the cluster” is not enabled, as queries will not scale beyond a single node.

Troubleshooting

Monitoring of a solution such as a gateway helps confirm that the environment is running as expected and that there are no errors. When errors are detected, it often falls to the administrator to troubleshoot the problem. Such problems can be reported in a few ways.
  • Power BI alerts: Data sources should be configured to notify administrators when errors occur.

  • Monitoring: Administrators may detect errors or degraded performance from their general health monitoring.

  • Users: Report users may also contact the administrators directly or via a contact center to notify them about problems in the system.

Identifying the cause of problems may be challenging. Some of the questions you may consider when troubleshooting the gateway are
  • Does the problem occur consistently?

  • Is there a problem with a particular data source, source system, or gateway?

  • Are all the gateways running the latest version of the On-premises data gateway software?

  • How does the current gateway throughput compare to known benchmarks?

Here we look at some of the tools available to help the administrator troubleshoot some of these problems.

Refresh history

When a refresh fails in the service, details of the error are logged to the service. These errors can be viewed by the administrators and other users with access to the dataset. You can view these details by viewing the refresh history; if any of these jobs have failed, then details of the failure can usually be logged to the service.
../images/496939_1_En_14_Chapter/496939_1_En_14_Fig20_HTML.jpg
Figure 14-20

Dataset refresh error details

Event logs

The windows event logs can also be used to diagnose problems with the gateway. Such events are usually at the application level, so if there are problems with the gateway starting or if the gateway process suddenly terminates, then the details of the error can usually be found in the event log.
../images/496939_1_En_14_Chapter/496939_1_En_14_Fig21_HTML.jpg
Figure 14-21

View application events in the event log

Gateway logs

The gateway itself also records more detailed logs. These logs are usually the best place to start. However, these logs are only available on the standard gateway. The logs can be accessed by
  1. 1.

    Launching the On-premises data gateway app.

     
  2. 2.

    Login (you must be logged in to view the logs).

     
  3. 3.

    Navigate to the Diagnostics.

     
  4. 4.

    Select export logs.

     
../images/496939_1_En_14_Chapter/496939_1_En_14_Fig22_HTML.jpg
Figure 14-22

Exporting gateway logs

The application generates a .zip file on the desktop of the user who exported the log. Inside the .zip file, you find multiple files; many of these files are created daily with a date timestamp to make finding the relevant log entries easier. These files can also be accessed directly depending on the operating system the gateway is installed on. You can the log files at:
  • UsersPBIEgwServiceAppDataLocalMicrosoftOn-premises data gatewayReport

  • WindowsServiceProfilesPBIEgwServiceAppDataLocalMicrosoftOn-premises data gatewayReport

By default, these logs only capture error information. However, it is also possible to capture detailed query execution details and performance counters. The detailed execution report has a wealth of information on which queries are failing or how long they take to complete. The Performance counters are the same as the Performance Monitor counters. To enable the additional logging, you need to manually update the configurations file

Microsoft.PowerBI.DataMovement.Pipeline.GatewayCore.dll.config setting the QueryExecutionReportOn and SystemCounterReportOn values to True.
../images/496939_1_En_14_Chapter/496939_1_En_14_Fig23_HTML.jpg
Figure 14-23

Update Config file

After making the changes, you need to restart the gateway which can be done from within the On-premises data gateway app.
  1. 1.

    Navigate to Service settings.

     
  2. 2.

    Select Restart now.

     

To make analyzing these logs easier, Microsoft has created a Power BI report template that you can use to connect to these logs. At the time of witting, this report was still in preview, but as a template, you are able to modify it to meet your needs.

Additional logging

The level of logging provided by the gateway logs is often enough to diagnose most problems; however, it does not include all the information available. The gateway offers a verbose logging option than can be enabled via the On-premises data gateway app.
../images/496939_1_En_14_Chapter/496939_1_En_14_Fig24_HTML.jpg
Figure 14-24

Enable additional logging

Like the performance logging options, the gateway needs to be restarted, so you need to consider this before enabling it during a critical time. This level of logging is also likely to cause additional load on the gateway and computer that it runs on, so leaving the gateway in this mode is not recommended.

Automation

In large organizations or those that embrace an infrastructure-as-code philosophy, the gateways can be installed a nd managed using the DataGateway PowerShell cmdlets. These cmdlets are not installed as part of the Power BI cmdlets and need to be installed separately. The cmdlets can be installed using the following command:
Install-Module -Name DataGateway

The DataGateway Cmdlets also require PowerShell core 6.2.2 or greater, which is a different implementation of PowerShell than the one used by the Power BI Cmdlets.

Many of the gateway admin tasks can be automated using the DataGateway Cmdlets, such as
  • Getting a list of gateways

  • Returning the cluster status

  • Removing nodes from the cluster

  • Adding new data sources to the cluster

  • Deleting the entire cluster

The cmdlets can be broken down into four categories.

Cmdlets to manage policies

These cmdlets aid the administrator in managing the gateway policies; such policies include the ability to control who can install gateways and which mode they may install. This can be particularly helpful if your organization would like to restrict or eliminate the use of personal gateways. Examples of such cmdlets are
  • Get-DataGatewayInstaller: Lists users authorized to install and register gateways within the tenant

  • Get-DataGatewayTenantPolicy: Returns the gateway installation and registration policy for the tenant

  • Set-DataGatewayTenantPolicy: Updates the installation policy used in the tenant

Cmdlets to manage gateway clusters and members

This set of cmdlets can be used to manage the gateway cluster itself. This can include getting a list of gateways, their status, and even removing members from the cluster. Note that there is no add-member cmdlet as members can only be added manually at this point.
  • Get-DataGatewayCluster: Lists all gateway clusters for the current user or a specific one based on the passed parameters

  • Get-DataGatewayClusterStatus: Returns the cluster status along with additional properties

  • Remove-DataGatewayClusterMember: Removes a gateway member from the corresponding gateway cluster

  • Set-DataGatewayCluster: Sets properties for an existing gateway cluster.

Cmdlets to manage users

This set of cmdlets can be used to manage users of the gateway clusters
  • Add-DataGatewayClusterUser: Add a user to a gateway cluster.

  • Remove-DataGatewayClusterUser: Remove a user from a gateway cluster.

Managing data sources

In addition to managing gateways, you can also automate to manage the data sources for that gateway.
  • Add-DataGatewayClusterDatasourceUser: Adds a user with required permissions for a Power BI data source

  • Get-DataGatewayClusterDatasource: Lists all or selected Power BI data sources on a gateway cluster

  • Get-DataGatewayClusterDatasourceStatus: Tests the connectivity of a data source from the gateway cluster.

  • Get-DataGatewayClusterDatasourceUser: Lists users for the data source on a gateway cluster

  • Remove-DataGatewayClusterDatasource: Removes a Power BI data source from a gateway cluster

  • Remove-DataGatewayClusterDatasourceUser: Removes a user from a Power BI data source

  • Set-DataGatewayClusterDatasource: Sets properties of an existing Power BI data source

Summary

The gateway serves as an important bridge between the power BI server that runs in the cloud and the data sources that reside on private networks. Ensuring that the gateways remain available and perform well is crucial.

Call to action

  • Ensure gateway is installed in a cluster with at least two nodes.

  • Ensure servers with gateways installed have adequate resources.

  • Monitor resource usage and query performance.

  • Install the gateway cluster as close to data sources as possible.

  • Ensure distribute requests across active gateways in the cluster is enabled; this needs to be done in the gateway admin portal in the service.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.174.168