Learn how to use the user interface to stream data from your Snowflake database to Adobe Experience Platform by following this guide.
This tutorial requires a working understanding of the following components of Experience Platform:
Read the guide on prerequisite setup for Snowflake streaming data for information on the steps that you need to complete before you can ingest streaming data from Snowflake to Experience Platform.
In the Platform UI, select Sources from the left navigation to access the Sources workspace. You can select the appropriate category from the catalog on the left-hand side of your screen. Alternatively, you can find the specific source you wish to work with using the search option.
Under the Databases category, select Snowflake Streaming, and then select Add data.
Sources that do not have an authenticated account in the sources catalog display the Set up option. Once an authenticated account exists, this option changes to Add data.
The Connect Snowflake Streaming account page appears. On this page, you can either use new or existing credentials.
To create a new account, select New account and provide a name, an optional description, and your credentials.
When finished, select Connect to source and then allow some time for the new connection to establish.
Credential | Description |
---|---|
Account | The name of your Snowflake account. For conventions on account names, read the Snowflake Streaming authentication guide. |
Warehouse | The name of your Snowflake warehouse. Warehouses manage the execution of queries in Snowflake. Each Snowflake warehouse is independent from one another and must be accessed individually to bring data to Experience Platform. |
Database | The name of your Snowflake database. The database contains the data that you want to bring to Experience Platform. |
Schema | (Optional) The database schema associated with your Snowflake account. |
Username | The username of your Snowflake account. |
Password | The password to your Snowflake account. |
Role | (Optional) A custom-defined role that can be provided to a user, for a given connection. If unprovided, this value defaults to public . |
For more information on account creation, read the section on configuring role settings in the Snowflake Streaming overview.
To use an existing account, select Existing account and then select the desired account from the existing account catalog.
Select Next to proceed.
A timestamp column must exist in your source table in order for a streaming dataflow to be created. The timestamp is required for Experience Platform to know when data will be ingested and when incremental data will be streamed. You can retroactively add a timestamp column for an existing connection and create a new dataflow.
Ensure that the case of the data fields in your sample source data file is in accordance with Snowflake’s guidance on case resolution for identifiers. Read the Snowflake document on identifier casing for more information.
The Select data step appears. In this step, you must select the data you want to import into Experience Platform, configure timestamps and timezones, and provide a sample source data file for the ingestion of raw data.
Use the database directory on the left of your screen and select the table that you want to import to Experience Platform.
Next, select the timestamp column type of your table. You can select between two types of timestamp columns: TIMESTAMP_NTZ
or TIMESTAMP_LTZ
. If you select a column type of TIMESTAMP_NTZ
, then you must also provide a timezone. Your columns should have a not null constraint. For more information, read the section on [limitations and frequently asked questions]
You can also configure backfill settings during this step. Backfill determines what data is initially ingested. If backfill is enabled, all current files in the specified path will be ingested during the first scheduled ingestion. If not, then only the files that are loaded in between the first run of ingestion and the start time will be ingested. Files loaded prior to the start time will not be ingested.
Select the Backfill toggle to enable backfill.
Finally, select Choose file to upload a sample source data to help create the mapping set, which will be used in a later step to map your original data to Experience Data Model (XDM).
When finished, select Next to proceed.
Next, you must provide information on your dataset and your dataflow.
A dataset is a storage and management construct for a collection of data, typically a table, that contains a schema (columns) and fields (rows). Data that is successfully ingested into Experience Platform is persisted within the data lake as datasets. During this step, you can create a new dataset or use an existing dataset.
To use a new dataset, select New dataset, then provide a name, and an optional description for your dataset. You must also select an Experience Data Model (XDM) schema that your dataset adheres to.
New dataset details | Description |
---|---|
Output dataset name | The name of your new dataset. |
Description | (Optional) A brief overview of the new dataset. |
Schema | A dropdown list of schemas that exist in your organization. You can also create your own schema prior to the source configuration process. For more information, read the guide on creating an XDM schema in the UI. |
If you already have an existing dataset, select Existing dataset and then use the Advanced search option to view a window of all datasets in your organization, including their respective details, such as whether they are enabled for ingestion into Real-Time Customer Profile.
If your dataset is enabled for Real-Time Customer Profile, then during this step, you can toggle Profile dataset to enable your data for Profile-ingestion. You can also use this step to enable Error diagnostics and Partial ingestion.
Once your dataset is configured, you must then provide details on your dataflow, including a name, an optional description, and alert configurations.
Dataflow configurations | Description |
---|---|
Dataflow name | The name of the dataflow. By default, this will use the name of the file that is being imported. |
Description | (Optional) A brief description of your dataflow. |
Alerts | Experience Platform can produce event-based alerts that users can subscribe to. These options require a running dataflow to trigger them. For more information, read the alerts overview
|
When finished, select Next to proceed.
The Mapping step appears. Use the mapping interface to map your source data to the appropriate schema fields before ingesting that data into Experience Platform, then select Next. For an extensive guide on how to use the mapping interface, read the Data Prep UI guide for more information.
The final step in the dataflow creation process is to review your dataflow before executing it. Use the Review step to review the details of your new dataflow before it runs. Details are grouped in the following categories:
Once you have reviewed your dataflow, select Finish and allow some time for the dataflow to be created.
By following this tutorial, you have successfully created a streaming dataflow for Snowflake data. For additional resources, read the documentation below.
Once your dataflow has been created, you can monitor the data that is being ingested through it to view information on ingestion rates, success, and errors. For more information on how to monitor streaming dataflows, visit the tutorial on monitoring streaming dataflows in the UI.
To update configurations for your dataflows scheduling, mapping, and general information, visit the tutorial on updating sources dataflows in the UI.
You can delete dataflows that are no longer necessary or were incorrectly created using the Delete function available in the Dataflows workspace. For more information on how to delete dataflows, visit the tutorial on deleting dataflows in the UI.