Create a space
As discussed in the previous section, an organization contains one or more spaces. It is in these spaces where the actual data engineering happens. In this section, we'll set up a space and connect it to a data warehouse.
About spaces
The space is the environment where Y42 keeps all our pipelines and assets. Spaces are isolated working environments in an organization. We might create multiple spaces to separate data pipelines by functional teams or by varying levels of access restrictions to sensitive data.
In the background, spaces are connected to a data warehouse (BigQuery or Snowflake), a cloud storage bucket, and a Git repository. Y42 needs access to your data warehouse to read, process, and store your data. It uses the cloud storage as an intermediate layer to cache information. Lastly, the Git repository is where Y42 stores pipeline configurations, schedulers, tests, and metadata as code.
Component | Purpose |
---|---|
Data warehouse | Y42 writes the data it processes to a data warehouse and reads it again for further transformations. |
Cloud storage | Y42 reads and writes data to a cloud storage bucket as an intermediate layer for caching information. |
Git repository | Y42 writes the configuration of pipelines, schedulers, data tests, and metadata to a Git repository. |
Sandbox mode
You can opt to create a space in sandbox mode. Doing so allows you to experience the Y42 platform without setting up the connection to a data warehouse. Please only use this for demo purposes, however! The space will automatically be deleted after 14 days and is not meant for real-life projects.
Create a space
Create a space
Every space in a organization has its own name. You'll be prompted to provide a display name when you create a new space. Based on this display name, Y42 will generate a slug. While you can modify display names after creation, this won't update the slug.
Connect to data warehouse and cloud storage
Y42 supports the following configurations:
Data warehouse provider | Cloud storage service | Reference |
---|---|---|
Google BigQuery | Google Cloud Storage | Guide |
Snowflake | Amazon Web Services S3 | Guide |
Snowflake | Microsoft Azure Blob Storage | Guide |
Please refer to the relevant page for your configuration.
Connect to Git repository
After connecting to the data warehouse, we can choose the Git repository Y42 uses in the background. For this guide, let's stick with the default Y42-hosted GitLab repository. Please refer to the documentation if you want to use your self-hosted GitLab or GitHub repository.
Welcome to your space
After the configuration, Y42 will show us our newly created space. We'll explore the space in depth throughout the upcoming sections, but let's briefly discuss the different panels in our space.
Asset Editor
This is where the active development of our pipelines happens. We can use three distinct views: List, Lineage, and Code. Y42 keeps all three of these in sync, so edits in the lineage view will automatically show up in the code view. When developing, we can pick the view that best suits the current task and our preference.
Asset Monitor
Here we can zoom out and get a high-level overview of our asset health. Y42 displays relevant metadata and provides assets' build history. If something goes wrong with our pipelines, the asset monitor will show us. From the high-level overview we can drill down and get specific data on assets and runs.
Asset Catalog
The Catalog documents the assets in our space and displays all their relevant metadata. The lineage views here are especially useful, showing us how data flows through our pipelines at a column level.
Settings
As the name suggests, this is where we can view and adjust the settings for our space. This includes integrations, secrets, and access control. Note that Y42 has settings panels at both the space and organization level.
Ready for lift-off! 🚀
We now have an account, organization, and space all set up and ready for action! In the next sections, we'll explore all of the core features that Y42 offers on its turnkey data orchestration platform. We'll begin by setting up a pipeline that ingests and transforms data.