Home » Tech » Data Quality in Google BigQuery’s Unified Catalog

Data Quality in Google BigQuery’s Unified Catalog

Cataloging Google Bigquery⁢ Data in microsoft ⁤Purview

Supported Features

During the ‍analysis of the Google Bigquery origin, Microsoft Purview supports:

  • Extraction of technical ‌metadata, including:
    ​ ‌

    • Projects and data ⁢sets
    • Tables that include columns
    • Views⁤ included the columns
  • Static derivation recovery on relations⁢ between the assets between tables and views.

When configuring the analysis, you can choose to analyze an entire Google Bigquery project. It is also possible​ to define the scope of the analysis to a subset of data sets⁤ corresponding to the specified names.

Limitations

  • Currently, Microsoft Purview ⁤only supports the analysis of Google ‍Bigquery data sets in the United States multiance position. if the specified data set is in another position, such as US-East1 or EU, the analysis⁢ completes, but assets are not displayed in Microsoft Purview.
  • When an object ⁢is removed from the data origin, the corresponding asset in Microsoft Purview is not automatically removed during the next analysis.

Configure Data mapping Analysis

To catalog Google Bigquery data in Microsoft Purview,follow these steps:

Register a Google Bigquery ⁢Project

  • Open Microsoft Purview and‌ select “Data Map” in the left ‌pane.
  • Select “Register.”
  • In “Register origins,” select “Google Bigquery.” Select “Continue.”
    • Enter a name​ for the ⁤database as it will be listed⁤ within the catalog.
    • Enter the complete Project ID (e.g.,Mydomain.com:Myproject).
    • Select a collection⁢ from ‍the list.
    • Select “Register.”

Configure a Data Map​ Scan

  • Ensure a ⁢self-hosted integration ‍runtime is configured.
  • Switch ‍to “Origins.”
  • Select the registered Bigquery project.
  • Select “+ New⁤ analysis.”
  • Specify the following details:
    • Name: Name of the analysis.
    • Connect through integration runtime: select the configured self-hosted integration runtime.
    • Credentials: During the configuration of Bigquery‌ credentials:

      • Select “Basic ⁢authentication” as the authentication method.
      • Specify the email ID of the ​service account in the username field⁢ (e.g., xyz@developer.gserviceacoun.com).
      • To generate the private‌ key, copy the entire JSON key file ⁤and archive ⁤it as the value of a Key Vault secret. To create a new private key from the ⁢Google Cloud platform:
        • In the displacement menu, select ‌”IACE (Identity Access ​Management)” and then “Governance ->‍ service account -> select a project.”
        • Select the email address of the service account for which you want to ‍create a key.
        • Select the ‌”Keys” tab.
        • Select the “add‌ Key”⁣ drop-down menu and then ‌select “Create new key.”
        • Choose the JSON format.
    • Specify the JDBC (Java Database​ Connectivity) driver path on the ⁣computer where the self-hosted integration runtime is ⁣running (e.g., D:DriversGooglebigquery).
    • specify a list of Bigquery data sets to import (e.g., ​dataset1; dataset2). When the list is empty, all available data⁢ sets are imported.
    • Maximum memory (in GB) available in the virtual machine to be used⁢ for the analysis. this depends‍ on the size of the Google Bigquery project to be analyzed.
  • Select “Connection Test.”
  • Select “Continue.”
  • Choose the analysis trigger: ‌schedule or run once.
  • Examine‌ the analysis⁢ and ‍select Save and perform.

After analysis, data assets in the Google Bigquery project will be available⁢ in the Unified Catalog research. For more details on how⁤ to connect and manage‍ Google Bigquery in Microsoft purview, consult the documentation.

Configure Connection for Data Quality Analysis

The analyzed asset⁢ is now ready for cataloging and governance.associate the analyzed‌ assets to data in a ​governance ⁤domain to configure a data quality analysis.

  1. Select the tab Management > ‍Domain > Quality governance > data to create the connection.
  2. Configure the connection:
    • Add the name and description of the connection.
    • Select⁢ the origin ‌type Google BigQuery.
    • Add the project ID,the name ‌of the⁢ data set,and the name⁤ of the table.
    • Select “Private ⁢Key‍ of the service account”:
      • Add an Azure ⁢subscription.
      • Connect the set of keys credentials.
      • Name of the secret.
      • version of the secret.
  3. Test the‌ connection to ensure it is correctly configured.

Profiling and ⁣Data Quality Scanning

After correctly‍ completing the connection configuration, it is possible to profile, create and apply rules, and ⁢perform data quality analysis in Google Bigquery. Follow the detailed guidelines in the documentation.

Reference Documents

[Links to reference documents would be placed here]

Cataloging‍ Google BigQuery‌ Data in Microsoft ‍Purview: Your Thorough Guide

Are you looking to manage and govern your Google BigQuery data effectively?⁣ Microsoft purview offers robust capabilities for cataloging and ⁤analyzing your BigQuery assets.This guide provides a⁢ detailed walkthrough of how to integrate and‍ leverage these features.

What is⁣ Microsoft Purview?

microsoft Purview is⁣ a ⁣unified data governance service that helps you manage and govern your on-premises,multi-cloud,and software-as-a-service (SaaS) data.‍ It provides⁢ capabilities for⁤ data discovery, data lineage, data cataloging,⁤ and data quality.

How does Microsoft Purview work with Google BigQuery?

Microsoft Purview ⁣allows ‌you to connect to your ‍Google BigQuery projects,scan and catalog the metadata,and manage the data assets within​ its‍ ecosystem.⁤ This integration enables ‌you to discover, ⁤understand, and govern your bigquery ⁢data⁤ alongside your ​other⁢ data sources.

Key Questions and Answers

What metadata ⁣is extracted from Google BigQuery by Microsoft Purview?

During the analysis of your Google BigQuery origin, Microsoft Purview⁤ extracts the following technical metadata:

‌ Projects and datasets

Tables, including their columns

Views, including​ their columns

Static derivation recovery on relations ‌between tables and views.

What are the limitations of cataloging Google BigQuery Data⁤ in Microsoft Purview?

currently, Microsoft Purview only supports‍ the analysis of ⁢Google bigquery data sets in ⁣the United States multiregion. If your data set is in another⁣ region (e.g., US-East1 or ⁣EU), ⁤the analysis will complete,​ but the assets will not be displayed in Microsoft Purview.

When an object⁤ is removed ‌from the data origin, the‌ corresponding asset in Microsoft Purview is ‌not ‍automatically removed during ⁣the ⁣next analysis.

How do I register a Google​ BigQuery project in Microsoft Purview?

Follow these steps to register your Google BigQuery project:

  1. Open Microsoft Purview: Select “Data Map” in⁣ the left pane.
  2. Select⁣ “Register.”
  3. Choose Google BigQuery: In “Register origins,” select‌ “Google BigQuery” and than “Continue.”

Enter a name for the database.

enter the complete Project⁣ ID⁤ (e.g.,Mydomain.com:Myproject).

⁢ Select⁤ a collection from the list.

Select “Register.”

How do I configure a Data Map ‍scan for Google BigQuery?

Configuring a Data Map scan involves ​the following steps:

  1. Ensure a self-hosted‍ integration runtime is configured.
  2. Navigate to “Origins.”
  3. Select the registered ‍BigQuery project.
  4. Select “+ new analysis.”
  5. specify analysis details:

Name: Provide a name‍ for the ‍analysis.

Connect through integration runtime: Select the configured self-hosted integration runtime.

credentials:

⁢ Select “Basic​ authentication.”

Enter the service account‌ email ID.

‍ Enter the private key from the JSON key file in a Key Vault secret. To create a new private key⁢ from the Google Cloud platform:

Go⁤ to “IAM⁣ & Admin⁣ > Service Accounts >”select a project.”

Select the email address of the service ⁤account.

Select the “Keys” tab, add a key, and ⁢Create​ a new key. Choose the JSON format.

JDBC Driver Path: Specify the path ‍on the machine where the self-hosted integration runtime is running (e.g., D:\Drivers\Googlebigquery).

Data ⁤Sets: Specify a list of BigQuery data ⁢sets to import.

Maximum⁣ memory: Set the maximum​ memory ⁣available (in GB).

Connection test.

  1. Select “continue.”
  2. Choose the analysis trigger: Run on a schedule or onc.
  3. Save and⁢ perform the analysis.

How is the connection for Data Quality analysis Configured?

  1. ⁣Select the tab ​ Management > domain ⁣> ⁣ Quality governance > data to‍ create the connection.
  2. Configure the connection:

Add the name‌ and description of the connection.

​Select‍ the origin ⁣type Google BigQuery.

Add the project ID,the name of the data set,and the name of the ⁢table.

Select “Private Key of the service account”:

⁢ Add an Azure subscription.

Connect the set of‍ keys credentials.

Name ⁣of the⁤ secret.

⁤ ⁢ ⁣ * version of the secret.

  1. Test the connection to ensure ‍it is correctly configured.

Notable: Data Quality ⁣administrators must have read-only access to Google BigQuery. virtual networks and private endpoints are ⁤not yet supported for Google BigQuery databases for the ⁣data quality‍ analysis service.

What is the Role ⁤of Data Quality ‌in Microsoft⁤ Purview for google ​BigQuery?

After configuring the connection, you ‌can profile, create and apply rules, and‍ perform​ data‌ quality analysis within Microsoft Purview directly on your Google BigQuery data.

Example of configuring a ⁢data quality scan:

| Feature ⁣ ⁤ |‍ details ‍ ⁢ ⁣ |

| —————————- | ————————————— |

| ​ Connection type ​ ‍ | Google ‍BigQuery ⁣ ⁤ ‌ ⁢ ⁢ |

|‌ Authentication ⁢ ⁤ | Private Key of the service account ⁤ |

| Project ID ⁤ | The Google⁣ BigQuery project. ​ |

| Dataset Name ‌ ⁣ | The name ⁤of the Google BigQuery dataset. |

| Table Name ‌|⁣ The name of the Google BigQuery table. ​ |

Can ⁢I delete the analysis after⁢ it is indeed completed?

Deleting the analysis does not delete‌ the ⁤catalog ‍assets created by previous analyses. However, to prevent further scans from querying the source, delete the analysis.

You may also like

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.