Cataloging Google Bigquery Data in microsoft Purview
Table of Contents
- Cataloging Google Bigquery Data in microsoft Purview
- Cataloging Google BigQuery Data in Microsoft Purview: Your Thorough Guide
- What is Microsoft Purview?
- How does Microsoft Purview work with Google BigQuery?
- Key Questions and Answers
- What metadata is extracted from Google BigQuery by Microsoft Purview?
- What are the limitations of cataloging Google BigQuery Data in Microsoft Purview?
- How do I register a Google BigQuery project in Microsoft Purview?
- How do I configure a Data Map scan for Google BigQuery?
- How is the connection for Data Quality analysis Configured?
- What is the Role of Data Quality in Microsoft Purview for google BigQuery?
- Example of configuring a data quality scan:
- Can I delete the analysis after it is indeed completed?
Supported Features
During the analysis of the Google Bigquery origin, Microsoft Purview supports:
- Extraction of technical metadata, including:
- Projects and data sets
- Tables that include columns
- Views included the columns
- Static derivation recovery on relations between the assets between tables and views.
When configuring the analysis, you can choose to analyze an entire Google Bigquery project. It is also possible to define the scope of the analysis to a subset of data sets corresponding to the specified names.
Limitations
- Currently, Microsoft Purview only supports the analysis of Google Bigquery data sets in the United States multiance position. if the specified data set is in another position, such as US-East1 or EU, the analysis completes, but assets are not displayed in Microsoft Purview.
- When an object is removed from the data origin, the corresponding asset in Microsoft Purview is not automatically removed during the next analysis.
Configure Data mapping Analysis
To catalog Google Bigquery data in Microsoft Purview,follow these steps:
Register a Google Bigquery Project
- Open Microsoft Purview and select “Data Map” in the left pane.
- Select “Register.”
- In “Register origins,” select “Google Bigquery.” Select “Continue.”
- Enter a name for the database as it will be listed within the catalog.
- Enter the complete Project ID (e.g.,Mydomain.com:Myproject).
- Select a collection from the list.
- Select “Register.”
Configure a Data Map Scan
- Ensure a self-hosted integration runtime is configured.
- Switch to “Origins.”
- Select the registered Bigquery project.
- Select “+ New analysis.”
- Specify the following details:
- Name: Name of the analysis.
- Connect through integration runtime: select the configured self-hosted integration runtime.
- Credentials: During the configuration of Bigquery credentials:
- Select “Basic authentication” as the authentication method.
- Specify the email ID of the service account in the username field (e.g., xyz@developer.gserviceacoun.com).
- To generate the private key, copy the entire JSON key file and archive it as the value of a Key Vault secret. To create a new private key from the Google Cloud platform:
- In the displacement menu, select ”IACE (Identity Access Management)” and then “Governance -> service account -> select a project.”
- Select the email address of the service account for which you want to create a key.
- Select the ”Keys” tab.
- Select the “add Key” drop-down menu and then select “Create new key.”
- Choose the JSON format.
- Specify the JDBC (Java Database Connectivity) driver path on the computer where the self-hosted integration runtime is running (e.g., D:DriversGooglebigquery).
- specify a list of Bigquery data sets to import (e.g., dataset1; dataset2). When the list is empty, all available data sets are imported.
- Maximum memory (in GB) available in the virtual machine to be used for the analysis. this depends on the size of the Google Bigquery project to be analyzed.
- Select “Connection Test.”
- Select “Continue.”
- Choose the analysis trigger: schedule or run once.
- Examine the analysis and select Save and perform.
After analysis, data assets in the Google Bigquery project will be available in the Unified Catalog research. For more details on how to connect and manage Google Bigquery in Microsoft purview, consult the documentation.
Configure Connection for Data Quality Analysis
The analyzed asset is now ready for cataloging and governance.associate the analyzed assets to data in a governance domain to configure a data quality analysis.
- Select the tab Management > Domain > Quality governance > data to create the connection.
- Configure the connection:
- Add the name and description of the connection.
- Select the origin type Google BigQuery.
- Add the project ID,the name of the data set,and the name of the table.
- Select “Private Key of the service account”:
- Add an Azure subscription.
- Connect the set of keys credentials.
- Name of the secret.
- version of the secret.
- Test the connection to ensure it is correctly configured.
Profiling and Data Quality Scanning
After correctly completing the connection configuration, it is possible to profile, create and apply rules, and perform data quality analysis in Google Bigquery. Follow the detailed guidelines in the documentation.
Reference Documents
[Links to reference documents would be placed here]
Cataloging Google BigQuery Data in Microsoft Purview: Your Thorough Guide
Are you looking to manage and govern your Google BigQuery data effectively? Microsoft purview offers robust capabilities for cataloging and analyzing your BigQuery assets.This guide provides a detailed walkthrough of how to integrate and leverage these features.
What is Microsoft Purview?
microsoft Purview is a unified data governance service that helps you manage and govern your on-premises,multi-cloud,and software-as-a-service (SaaS) data. It provides capabilities for data discovery, data lineage, data cataloging, and data quality.
How does Microsoft Purview work with Google BigQuery?
Microsoft Purview allows you to connect to your Google BigQuery projects,scan and catalog the metadata,and manage the data assets within its ecosystem. This integration enables you to discover, understand, and govern your bigquery data alongside your other data sources.
Key Questions and Answers
What metadata is extracted from Google BigQuery by Microsoft Purview?
During the analysis of your Google BigQuery origin, Microsoft Purview extracts the following technical metadata:
Projects and datasets
Tables, including their columns
Views, including their columns
Static derivation recovery on relations between tables and views.
What are the limitations of cataloging Google BigQuery Data in Microsoft Purview?
currently, Microsoft Purview only supports the analysis of Google bigquery data sets in the United States multiregion. If your data set is in another region (e.g., US-East1 or EU), the analysis will complete, but the assets will not be displayed in Microsoft Purview.
When an object is removed from the data origin, the corresponding asset in Microsoft Purview is not automatically removed during the next analysis.
How do I register a Google BigQuery project in Microsoft Purview?
Follow these steps to register your Google BigQuery project:
- Open Microsoft Purview: Select “Data Map” in the left pane.
- Select “Register.”
- Choose Google BigQuery: In “Register origins,” select “Google BigQuery” and than “Continue.”
Enter a name for the database.
enter the complete Project ID (e.g.,Mydomain.com:Myproject).
Select a collection from the list.
Select “Register.”
How do I configure a Data Map scan for Google BigQuery?
Configuring a Data Map scan involves the following steps:
- Ensure a self-hosted integration runtime is configured.
- Navigate to “Origins.”
- Select the registered BigQuery project.
- Select “+ new analysis.”
- specify analysis details:
Name: Provide a name for the analysis.
Connect through integration runtime: Select the configured self-hosted integration runtime.
credentials:
Select “Basic authentication.”
Enter the service account email ID.
Enter the private key from the JSON key file in a Key Vault secret. To create a new private key from the Google Cloud platform:
Go to “IAM & Admin > Service Accounts >”select a project.”
Select the email address of the service account.
Select the “Keys” tab, add a key, and Create a new key. Choose the JSON format.
JDBC Driver Path: Specify the path on the machine where the self-hosted integration runtime is running (e.g., D:\Drivers\Googlebigquery).
Data Sets: Specify a list of BigQuery data sets to import.
Maximum memory: Set the maximum memory available (in GB).
Connection test.
- Select “continue.”
- Choose the analysis trigger: Run on a schedule or onc.
- Save and perform the analysis.
How is the connection for Data Quality analysis Configured?
- Select the tab Management > domain > Quality governance > data to create the connection.
- Configure the connection:
Add the name and description of the connection.
Select the origin type Google BigQuery.
Add the project ID,the name of the data set,and the name of the table.
Select “Private Key of the service account”:
Add an Azure subscription.
Connect the set of keys credentials.
Name of the secret.
* version of the secret.
- Test the connection to ensure it is correctly configured.
Notable: Data Quality administrators must have read-only access to Google BigQuery. virtual networks and private endpoints are not yet supported for Google BigQuery databases for the data quality analysis service.
What is the Role of Data Quality in Microsoft Purview for google BigQuery?
After configuring the connection, you can profile, create and apply rules, and perform data quality analysis within Microsoft Purview directly on your Google BigQuery data.
Example of configuring a data quality scan:
| Feature | details |
| —————————- | ————————————— |
| Connection type | Google BigQuery |
| Authentication | Private Key of the service account |
| Project ID | The Google BigQuery project. |
| Dataset Name | The name of the Google BigQuery dataset. |
| Table Name | The name of the Google BigQuery table. |
Can I delete the analysis after it is indeed completed?
Deleting the analysis does not delete the catalog assets created by previous analyses. However, to prevent further scans from querying the source, delete the analysis.
