Import versus DirectQuery

A subsequent but closely related step to dataset planning is choosing between the default Import mode or DirectQuery mode. In some projects, this is a simple decision as only one option is feasible or realistic given the known requirements while other projects entail significant analysis of the pros and cons of either design. For example, if a data source is considered slow or ill-equipped to handle a high volume of analytical queries then an import mode dataset is very likely the preferred option. Additionally, if multiple data sources are required for a dataset and they cannot be consolidated into a single DirectQuery data source then an import mode dataset is the only option. Likewise, if near real-time visibility to a data source is an essential business requirement then DirectQuery is the only option.

When DirectQuery is a feasible option or can be made a feasible option via minimal modifications, organizations may be attracted to the prospect of leveraging investments in high-performance database and data warehouse systems. However, the overhead costs and version control concerns of import mode can be reduced via Power BI features, such as the dataset refresh APIs discussed in Chapter 8Managing Application Workspaces and Content, and the expected incremental data refresh feature for Power BI Premium capacities. 

The following list of questions can help guide an import versus DirectQuery decision:

  1. Is there a single data source for our dataset which Power BI supports as a DirectQuery source?
    • For example, each fact and dimension table needed by the dataset is stored in a single data warehouse database, such as Oracle, Teradata, SQL Server, or Azure SQL Database.
    • The following URL identifies the data sources supported for DirectQuery with Power BI, including sources which are currently only in beta: http://bit.ly/2AcMp25.
  2. If DirectQuery is an option per question 1, is this source capable of supporting the analytical query workload of Power BI?
    • For example, although Azure SQL Data Warehouse technically supports DirectQuery, it's not recommended to use Azure SQL Data Warehouse as a DirectQuery data source, given the limitations on the volume of concurrent queries supported and a lack of query plan caching. 
    • In many other scenarios, the data source may not be optimized for analytical queries, such as with star schema designs and indexes which target common BI/reporting queries. Additionally, if the database is utilized for online transaction processing (OLTP) workloads and/or other BI/analytical tools, then it's necessary to evaluate any potential impact to these applications and the availability of resources.
  3. Is an import mode dataset feasible, given the size of the dataset and any requirements for near real-time visibility to the data source?
    • Currently Power BI Premium supports import mode datasets up to 10 GB in size and incremental data refresh is not available. Therefore, massive datasets must either use a DirectQuery data source or a Live connection to an Analysis Services model.
    • Additionally, Power BI Premium currently supports a maximum of 48 refreshes per day for import mode datasets. Therefore, if there's a need to view data source data for the last several minutes or seconds, an import mode dataset is not feasible.
  4. If the DirectQuery source is capable of supporting a Power BI workload as per question 2, is the DirectQuery connection more valuable than the additional performance and flexibility provided via the import mode?
    • In other words, if an import mode dataset is feasible, as per question 3, then an organization should evaluate the trade-offs of the two modes. For example, since an import mode dataset will be hosted in the Power BI service and in a compressed and columnar in-memory data store, it will likely provide a performance advantage. This is particularly the case if the DirectQuery source is hosted on-premises and thus queries from the Power BI cloud service must pass through the On-premises data gateway reviewed in Chapter 9Managing the On-Premises Data Gateway.
    • Additionally, any future data sources and most future data transformations will need to be integrated into the DirectQuery source. With an import mode dataset, the scheduled import process can include many data transformations and potentially other data sources without negatively impacting query performance.

For organizations that have invested in powerful data source systems for BI workloads, there's a strong motivation to leverage this system via DirectQuery. A business intelligence team or architect will be adverse to copying data into another data store and thus creating both another data movement and a source of reporting that must be supported. As a result of this scenario, Microsoft is actively investing in improvements to DirectQuery datasets for Power BI and Analysis Services models in DirectQuery mode. These investments are expected to reduce the gap in query performance between DirectQuery and the import mode. Additionally, a hybrid dataset mode may be released that allows teams to isolate tables or even partitions of tables between DirectQuery and Import storage options. 

However, Microsoft is also in the process of expanding support for large import mode Power BI datasets hosted in Power BI Premium capacity. For example, in the near future a dataset much larger than 10 GB could be incrementally refreshed to only update or load the most recent data. Additional details on the capabilities provided by Power BI Premium, potential future enhancements, and the implications for Power BI deployments are included in Chapter 13, Scaling with Premium and Analysis Services.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.33.235