This is a blog post by Lamdatek. We are a startup based in San Diego, CA. We provide data consulting services and business intelligence analytics for companies in different industries.
Power BI Report and on-premises Data Source - The Moving Pieces
On-Premises Data Source: the on-premises database remains within an organization's network and is not stored in the Power BI Cloud. This database is usually located behind a firewall and inaccessible from the internet.
Gateway Connection: The Power BI On-premises Data Gateway acts as a bridge between the on-premises data source and the Power BI Cloud. The gateway securely communicates with the on-premises database to retrieve the data.
Data Retrieval and Processing: when create a Power BI report using the Power BI Desktop, it connects to the on-premises database through the On-premises Data gateway. The Power BI Desktop fetches the necessary data and performs any required data transformations, modeling, and calculations.
Report Publishing: Once the Power BI report is created in Power BI Desktop, one can publish it to the Power BI Cloud. However, the published report does not include the actual data from the on-premises database.
Metadata and Query Definition: The published report in the Power BI Cloud stores metadata, including report layouts, visuals, relationships, measures, and calculations. It also stores the query definitions that specify how to retrieve data from the on-premises database using the gateway.
Data Refresh: When the report is published, one can configure scheduled data refresh in the Power BI cloud. During data refresh, the Power BI cloud sends a request to the on-premises data source through the gateway, retrieves the updated data, and updates the report with the refreshed data.
Power BI on-premises Data Gateway
Before users of Power BI Desktop or Power BI cloud can connect to on-premises data source (databases or files), an on-premises data gateway needs to be installed and configured. On-premises data gateway is designed for organizations that want to securely access and refresh on-premises data in a self-service manner.
When creating a report in Power BI Desktop and connecting to on-premises data source, user needs to use this gateway to establish a secure connection.
When publishing a report from Power BI Desktop to the Power BI Cloud, data sources used in the report require a gateway to connect. Power BI cloud will automatically detect that the report already used a gateway in Power BI Desktop. As a result, the existing gateway association will be maintained. There is no need to configure a new gateway for the same data sources (but need to ensure that dataset and data sources in the Power BI cloud are correctly associated with the gateway configured in Power BI Desktop).
On-premises Data Gateway Architecture
Power BI Gateway facilitates quick and secure communication flows from the Power BI cloud to on-premises data source and then back to the Power BI cloud. This is the basic flow:
The Power BI cloud creates a query and encrypted credentials for the on-premises data source.
The query and credentials are sent to the Power BI on-premises gateway queue for processing.
The gateway gets the query, decrypts the credentials, and connects to the data source(s) with those credentials.
The gateway sends the query to the data source to be run.
The results are sent from the data source back to the gateway and then to the Power BI cloud.
Note: Power BI cloud’s data refreshes can return large amounts of data. For such queries, data is temporarily stored on the gateway machine. This data storage continues until all data is received from the data source. The data is then sent back to the Power BI cloud.
Authentication to on-premises Data Source from Power BI cloud
Power BI cloud is built on Azure cloud. When publishing a report from Power BI Desktop to the Power BI cloud, one can select a desired connection type (connecting to a data source) based on requirements and data size. The choice can be to import a copy of the data into Power BI cloud or to connect directly to the data source.
Import a Copy of the Data
Once data is imported into Power BI cloud, viewing the data in reports and dashboards doesn't access the underlying data source anymore. The imported data will be refreshed and updated according to the schedule in gateway configuration.
In this case, a user establishes a connection based on the user’s sign-in and accesses the data with the credential. After the report is published to the Power BI cloud, Power BI always uses this user's credential to import/refresh the data.
Connect directly to Data Source
The data remains in the on-premises database, and Power BI cloud queries data in real-time when user interacts with the Power BI report.
If the data source is connected directly using preconfigured credentials, these preconfigured credentials are used to connect to the data source when ANY user views the data associated with the report. If using single sign-on, the current user's credentials are used to connect to the data source.
Power BI cloud allows various data source connection modes that determine whether the data is persisted in the Power BI cloud or not.