The Merkury Enterprise Identity Resolution source is in beta. Please read the sources overview for more information on using beta-labeled sources.
This tutorial provides steps to create a Merkury Enterprise Identity Resolution source connection and dataflow using the Adobe Experience Platform user interface.
This tutorial requires a working understanding of the following components of Experience Platform:
In order to access your bucket on Experience Platform, you need to provide valid values for the following credentials:
Credential | Description |
---|---|
Access key | The access key ID for your bucket. You can retrieve this value from your Merkury team. |
Secret key | The secret key ID for your bucket. You can retrieve this value from your Merkury team. |
Bucket name | This is your Merkury bucket where files will be shared. You can retrieve this value from your Merkury team. |
For more information on set up for Merkury and other prerequisites, read the Merkury source overview.
In the Platform UI, select Sources from the left navigation bar to access the Sources workspace. The Catalog screen displays a variety of sources for which you can create an account with.
You can select the appropriate category from the catalog on the left-hand side of your screen. Alternatively, you can find the specific source you wish to work with using the search option.
Under the Data partners category, select Merkury and then select Set up.
The Connect to Merkury page appears. On this page, you can either use new credentials or existing credentials.
If you are using new credentials, select New account. On the input form that appears, provide a name, an optional description, and your Merkury credentials. When finished, select Connect to source and then allow some time for the new connection to establish.
To use an existing account, select Existing account and then select the Merkury account that you would like to use. Select Next to proceed.
Supported file formats
You can ingest the following file formats with the Merkury source:
bzip2
, gzip
, deflate
, zipDeflate
, tarGzip
, and tar
.After creating your Merkury account, the Add data step appears, providing an interface for you to explore your Merkury file hierarchy and select the folder or specific file that you want to bring to Experience Platform.
Select the root folder to access your folder hierarchy. From here, you can select a single folder to ingest all files in the folder recursively. When ingesting an entire folder, you must ensure that all files in that folder share the same data format and schema.
Once you have selected a folder, the right interface updates to a preview of the contents and structure of the first file in the selected folder.
During this step, you can make several configurations to your data, before proceeding. First, select Data format and then select the appropriate data format for your file in the dropdown panel that appears.
The following table displays the appropriate data formats for the supported file types:
File type | Data format |
---|---|
CSV | Delimited |
JSON | JSON |
Parquet | XDM Parquet |
After configuring your data format, you can set a column delimiter when ingesting delimited files. Select the Delimiter option and then select a delimiter from the dropdown menu. The menu displays the most frequently used options for delimiters, including a comma (,
), a tab (\t
), and a pipe (|
).
If you prefer to use a custom delimiter, select Custom and enter a single-character delimiter of your choice in the pop up input bar.
You can also ingest compressed JSON or delimited files by specifying their compression type.
In the Select data step, select a compressed file for ingestion and then select its appropriate file type and whether it’s XDM-compliant or not. Next, select Compression type and then select the appropriate compressed file type for your source data.
To bring a specific file to Platform, select a folder, and then select the file that you want to ingest. During this step, you can also preview file contents of other files within a given folder by using the preview icon beside a file name.
When finished, select Next.
The Dataflow detail page allows you to select whether you want to use an existing dataset or a new dataset. During this process, you can also configure your data to be ingested to Profile, and enable settings like Error diagnostics, Partial ingestion, and Alerts.
To ingest data into an existing dataset, select Existing dataset. You can either retrieve an existing dataset using the Advanced search option or by scrolling through the list of existing datasets in the dropdown menu. Once you have selected a dataset, provide a name and a description for your dataflow.
To ingest into a new dataset, select New dataset and then provide an output dataset name and an optional description. Next, select a schema to map to using the Advanced search option or by scrolling through the list of existing schemas in the dropdown menu. Once you have selected a schema, provide a name and a description for your dataflow.
Next, select the Profile dataset toggle to enable your dataset for Real-Time Customer Profile. This allows you to create a holistic view of an entity’s attributes and behaviors. Data from all Profile-enabled datasets will be included in Profile and changes are applied when you save your dataflow.
Error diagnostics enables detailed error message generation for any erroneous records that occur in your dataflow, while Partial ingestion allows you to ingest data containing errors, up to a certain threshold that you manually define. See the partial batch ingestion overview for more information.
You can enable alerts to receive notifications on the status of your dataflow. Select an alert from the list to subscribe to receive notifications on the status of your dataflow. For more information on alerts, see the guide on subscribing to sources alerts using the UI.
When you are finished providing details to your dataflow, select Next.
The Mapping step appears, providing you with an interface to map the source fields from your source schema to their appropriate target XDM fields in the target schema.
Platform provides intelligent recommendations for auto-mapped fields based on the target schema or dataset that you selected. You can manually adjust mapping rules to suit your use cases. Based on your needs, you can choose to map fields directly, or use data prep functions to transform source data to derive computed or calculated values. For comprehensive steps on using the mapper interface and calculated fields, see the Data Prep UI guide.
Once your source data is successfully mapped, select Next.
The Scheduling step appears, allowing you to configure an ingestion schedule to automatically ingest the selected source data using the configured mappings. By default, scheduling is set to Once
. To adjust your ingestion frequency, select Frequency and then select an option from the dropdown menu.
Interval and backfill are not visible during a one-time ingestion.
If you set your ingestion frequency to Minute
, Hour
, Day
, or Week
, then you must set an interval to establish a set time frame between every ingestion. For example, an ingestion frequency set to Day
and an interval set to 15
means that your dataflow is scheduled to ingest data every 15 days.
During this step, you can also enable backfill and define a column for the incremental ingestion of data. Backfill is used to ingest historical data, while the column you define for incremental ingestion allows new data to be differentiated from existing data.
See the table below for more information on scheduling configurations.
Scheduling configuration | Description |
---|---|
Frequency | Configure frequency to indicate how often the dataflow should run. You can set your frequency to:
|
Interval | Once you select a frequency, you can then configure the interval setting to establish the time frame between every ingestion. For example, if you set your frequency to day and configure the interval to 15, then your dataflow will run every 15 days. You cannot set the interval to zero. The minimum accepted interval value for each frequency is as follows:
|
Start Time | The timestamp for the projected run, presented in UTC time zone. |
Backfill | Backfill determines what data is initially ingested. If backfill is enabled, all current files in the specified path will be ingested during the first scheduled ingestion. If backfill is disabled, only the files that are loaded in between the first run of ingestion and the start time will be ingested. Files loaded prior to the start time will not be ingested. |
For batch ingestion, every ensuing dataflow selects files to be ingested from your source based on their last modified timestamp. This means that batch dataflows select files from the source that are either new or have been modified since the last flow run. Furthermore, you must ensure that there’s a sufficient time span between file upload and a scheduled flow run because files that are not entirely uploaded to your cloud storage account before the scheduled flow run time may not be picked up for ingestion.
When finished configuring your ingestion schedule, select Next.
The Review step appears, allowing you to review your new dataflow before it is created. Details are grouped within the following categories:
Once you have reviewed your dataflow, click Finish and allow some time for the dataflow to be created.
By following this tutorial, you have successfully created a dataflow to bring batch data from your Merkury source to Experience Platform. For additional resources, visit the documentation outlined below.
Once your dataflow has been created, you can monitor the data that is being ingested through it to view information on ingestion rates, success, and errors. For more information on how to monitor dataflow, visit the tutorial on monitoring accounts and dataflows in the UI.
To update configurations for your dataflows scheduling, mapping, and general information, visit the tutorial on updating sources dataflows in the UI
You can delete dataflows that are no longer necessary or were incorrectly created using the Delete function available in the Dataflows workspace. For more information on how to delete dataflows, visit the tutorial on deleting dataflows in the UI.