Reference data from your data warehouse

Reference data from your data warehouse

Reference data from your data warehouse in your Y42 space schema.

Before you can use an existing data in a Y42 space, you must reference it in your space schema. To do so, you can use the Reference source or Airbyte BigQuery/ Snowflake source.

Y42 source assetChoose this method when...Limitations
Reference (BigQuery)The schema is in the same GCP project as the space schema OR is a shared schema.The schema must be in the same location/region as the space schema.
Reference (Snowflake)The schema is accessible by the Snowflake user.NA
Airbyte BigQuery/ SnowflakeYou want to physically replicate the data into your space schema.Ingestion rows usage limits apply.

Reference source

Import a schema from the space's parent GCP project / Snowflake account.

Import a schema from the space's parent GCP project / Snowflake account.

The Reference source represents dbt’s Sources and can be defined using dbt Source properties (opens in a new tab). If you use a Reference source, Y42 will access the schema you want to import using a GCP or Snowflake service account. Y42 then creates Logical Views (opens in a new tab) in BigQuery or Non-Materilaized Views (opens in a new tab) in Snowflake of your selected tables in the space schema. After which, you can reference these Views just as you would with other assets.

Limitations

BigQuery

  • The schema you want to reference must be located in the same hosting location or region (opens in a new tab) as the space schema.

  • Imports will not generate any logs. Please consult the FAQ for common issues.

  • Reference views do not show any row count.

  • Views do not behave the same as materialized tables. Please refer to BigQuery's documentation (opens in a new tab) for any limitations on schema changes, data freshness, and access. Depending on your data warehouse, data may be automatically updated without triggering an import, schemas may be out of date unless an import is triggered, etc.:

    • BigQuery Logical views automatically reflect the base table’s data regardless of when they have been created.

Snowflake

  • Views do not automatically propagate parent table changes. To view updated data you have to run an import via a DAG or an ad-hoc import run.

  • Quoted column names - By default, Snowflake uses UPPER-CASED characters for table and column names, unless specified otherwise. As a result, if you're dealing with tables or columns that have case-sensitive or lower-cased names, it's necessary to use quotes around both the table and column names when making queries.

Reference tables from a schema in the same project or database

Y42 will utilize the space's service account credentials to access the tables that you would like reference. As such, you do not need to configure a new service account.

Add a Reference source

Reference tables from a shared schema (BigQuery only)

Y42 requires additional permissions if the schema you want to reference does not belong to the same GCP project as the Y42 space.

In the GCP project of the schema that you would like to load, navigate to IAM and follow this guide (opens in a new tab). You will have to grant at least BigQuery Data Owner permissions to the project and/ or schema.

Replicate data using Airbyte

We recommend using Reference sources as the default method to reference tables from BigQuery and Snowflake. However, there may be scenarios in which you might prefer to use the Airbyte BigQuery source or CData Snowflake source to replicate a schema into the space schema.

FAQ

I've encountered a 'Not found' or 'Access Denied' error message. What should I do?

This error usually occurs when table's schema does not exist in the same location or region as your space's schema. It may also occur if the space's service account does not have access to the schema you want to reference.

Please check for any typos in the schema or the location of your Space, and whether the space's service account has access.

What should I do if I can't find my schema/table using a Reference source?

BigQuery: Please ensure that the schema / table resides in the same location as the Y42 generated schema (region-specific). Snowflake: Please ensure that the space's service account has access to reading schemas and tables.

For example, single-region schemas are not queryable in multi-region schemas and neither are cross-region schemas. If you have shared access to a schema in another project, you will need to filter for the project and related schema in the Settings.

Why can't I commit my table selection?

If you cannot commit your table selection, you might have manually modified the YAML file in a way that created duplicates. Please make sure that the names are unique and all required properties exist: name, database, schema, identifier.

How do I prevent Reference sources from becoming stale?

To view fresh data, add the Reference import job to a schedule or manually trigger an import job.