This functionality is available to customers who have purchased the Real-Time CDP Prime or Ultimate package, Adobe Journey Optimizer, or Customer Journey Analytics. Contact your Adobe representative for more information.
Action item: The September 2024 release of Experience Platform introduced the option to set an endTime
date for export dataset dataflows. Adobe has also introduced a default end date of May 1st 2025 for all dataset export dataflows created prior to the September 2024 release.
For any of those dataflows, you need to update the end date in the dataflow manually before the end date, otherwise your exports will stop on that date. Use the Experience Platform UI to view which dataflows will be set to stop on May 1st 2025.
Refer to the scheduling section for information on how to edit the end date of a dataset export dataflow.
This article explains the workflow required to export datasets from Adobe Experience Platform to your preferred cloud storage location, such as Amazon S3, SFTP locations, or Google Cloud Storage by using the Experience Platform UI.
You can also use the Experience Platform APIs to export datasets. Read the export datasets API tutorial for more information.
The datasets that you can export vary based on the Experience Platform application (Real-Time CDP, Adobe Journey Optimizer), the tier (Prime or Ultimate), and any add-ons that you purchased (for example: Data Distiller).
Use the table below to understand which dataset types you can export depending on your application, product tier, and any add-ons purchased:
Application/Add-on | Tier | Datasets available for export |
---|---|---|
Real-Time CDP | Prime | Profile and Experience Event datasets created in the Experience Platform UI after ingesting or collecting data through Sources, Web SDK, Mobile SDK, Analytics Data Connector, and Audience Manager. |
Ultimate |
|
|
Adobe Journey Optimizer | Prime | Refer to the Adobe Journey Optimizer documentation. |
Ultimate | Refer to the Adobe Journey Optimizer documentation. | |
Customer Journey Analytics | All | Profile and Experience Event datasets created in the Experience Platform UI after ingesting or collecting data through Sources, Web SDK, Mobile SDK, Analytics Data Connector, and Audience Manager. |
Data Distiller | Data Distiller (Add-on) | Derived datasets created through Query Service. |
Watch the video below for an end-to-end explanation of the workflow described on this page, benefits of using the export dataset functionality, and some suggested use cases.
Hi, this is Michelle. In this video, I’ll show you how to export an experience platform data set to a cloud storage destination. Before we get into the demo, let’s review the benefits and use cases for this feature. Cloud storage destinations let you export raw, unstructured data sets using a guided workflow. The core tenets of data governance apply. Usage policies are enforced, as is the case for other types of destination workflows. Because of the interoperability between experience platform and public clouds, use cases involving data sets used in external systems is supported. These are some primary use cases for exporting data sets. Use an external machine learning and business intelligence tools to support analytical use cases, such as reporting and informing better audience creation. Monitor the health and performance of email marketing campaigns. And store data in external systems for compliance needs. These are the cloud storage destinations that can export data sets. If you don’t see the data set option in the export workflow, check your user permissions. You need view destinations and manage and activate data set destinations permissions. Also, ensure you have the proper data management permissions for data sets, specifically view data sets. Okay, it’s time for a demo, starting with the data set export connection. I’m logged into experience platform. Under connections, I’ll choose destinations. In the destinations catalog view, I’ll choose cloud storage under the categories heading. We’ll see the workflow using an Amazon S3 account. While you may have existing Amazon S3 connections, a new connection is needed, specifically for exporting data sets. In the Amazon S3 destination card, click the data set icon at the top. This opens a new panel on the right. Click the configure new destination link. I’m using an existing account, since I have configured accounts already. If you don’t, use the new account workflow. In the modal, choose an account from the list and then click on select in the right corner. Now it’s time to choose the data type. Three different types are supported for S3, but data sets is what I want. Next, fill in the name, description, bucket name, and folder path fields. The last two fields specify which area and path of the S3 account the data set files will be stored. There’s two options for file type, JSON and Parquet. I’ll choose JSON. There’s an option for compression format, and that’s relative to the file type selected. Review and select the alerts you’d like to receive for this export. I’m finished with the configuration details in this step, so I’ll select next up here. Now I need to select the appropriate marketing action. Data export is what I need. I’ll select that and create the connection. Now I’m ready to send data to the new S3 connection. I’ll choose the activate button for Amazon S3. Data sets is the data type. This shows me the connection I just set up. I’ll choose this and go to the next step. Now I can browse and choose the data set I want to export. There are some Journey Optimizer data sets here, but I want this weekend data set. I’ll go to the next step. There are some folder name and scheduling settings available. First, I can modify the folder path using any of the available macros to customize the name. Now we’ll look at schedule settings. You can choose the export cadence of exports. Determine if you want full or incremental files. The first file includes all existing data in the data set, functioning as a backfill. There are frequency setting presets. Choose the one that works for your needs. Last, you can choose the date range for your data set export. I’ll save these settings and advance to the review step. Once everything looks good, choose finish to kick off the export process. That’s it! It’s very straightforward to export data sets. After this, you would connect to the S3 bucket to confirm you see the files there. Thank you for watching this video.
Currently, you can export datasets to the cloud storage destinations highlighted in the screenshot and listed below.
Some file-based destinations in the Experience Platform catalog support both audience activation and dataset export.
This document contains all the information necessary to export datasets. If you want to activate audiences to cloud storage or email marketing destinations, read Activate audience data to batch profile export destinations.
To export datasets to cloud storage destinations, you must have successfully connected to a destination. If you haven’t done so already, go to the destinations catalog, browse the supported destinations, and configure the destination that you want to use.
To export datasets, you need the View Destinations, View Datasets, and Manage and Activate Dataset Destinations access control permissions. Read the access control overview or contact your product administrator to obtain the required permissions.
To ensure that you have the necessary permissions to export datasets and that the destination supports exporting datasets, browse the destinations catalog. If a destination has an Activate or an Export datasets control, then you have the appropriate permissions.
Follow the instructions to select a destination where you can export your datasets:
Go to Connections > Destinations, and select the Catalog tab.
Select Activate or Export datasets on the card corresponding to the destination that you want to export datasets to.
Select Data type Datasets and select the destination connection that you want to export datasets to, then select Next.
If you want to set up a new destination to export datasets, select Configure new destination to trigger the Connect to destination workflow.
Use the check boxes to the left of the dataset names to select the datasets that you want to export to the destination, then select Next.
Use the Scheduling step to:
Use the Edit schedule control on the page to edit the export cadence of exports, as well as to select whether to export full or incremental files.
The Export incremental files option is selected by default. This triggers an export of one or multiple files representing a full snapshot of the dataset. Subsequent files are incremental additions to the dataset since the previous export. You can also select Export full files. In this case, select the frequency Once for a one-time full export of the dataset.
The first incremental file export includes all existing data in the dataset, functioning as a backfill. The export can contain one or multiple files.
Use the Frequency selector to select the export frequency:
Use the Time selector to choose the time of day, in UTC format, when the export should take place.
Use the Date selector to choose the interval when the export should take place.
Select Save to save the schedule and proceed to the Review step.
For dataset exports, the file names have a preset, default format, which cannot be modified. See the section Verify successful dataset export for more information and examples of exported files.
Select Edit folder path to customize the folder structure in your storage location where exported datasets are deposited.
You can use several available macros to customize a desired folder name. Double-click a macro to add it to the folder path and use /
between the macros to separate the folders.
After selecting the desired macros, you can see a preview of the folder structure that will be created in your storage location. The first level in the folder structure represents the Folder path that you indicated when you connected to the destination to export datasets.
On the Review page, you can see a summary of your selection. Select Cancel to break up the flow, Back to modify your settings, or Finish to confirm your selection and start exporting datasets to the destination.
When exporting datasets, Experience Platform creates one or multiple .json
or .parquet
files in the storage location that you provided. Expect new files to be deposited in your storage location according to the export schedule you provided.
Experience Platform creates a folder structure in the storage location you specified, where it deposits the exported dataset files. The default folder export pattern is shown below, but you can customize the folder structure with your preferred macros.
The first level in this folder structure - folder-name-you-provided
- represents the Folder path that you indicated when you connected to the destination to export datasets.
folder-name-you-provided/datasetID/exportTime=YYYYMMDDHHMM
The default file name is randomly generated and ensures that exported file names are unique.
The presence of these files in your storage location is confirmation of a successful export. To understand how the exported files are structured, you can download a sample .parquet file or .json file.
In the connect to destination workflow, you can select the exported dataset files to be compressed, as shown below:
Note the difference in file format between the two file types, when compressed:
json.gz
. The format of the exported JSON is NDJSON, which is the standard interchange format in the big data ecosystem. Adobe recommends using an NDJSON-compatible client to read the exported files.gz.parquet
Exports to JSON files are supported in a compressed mode only. Exports to Parquet files are supported in a compressed and uncompressed mode.
To remove datasets from an existing dataflow, follow the steps below:
Log in to the Experience Platform UI and select Destinations from the left navigation bar. Select Browse from the top header to view your existing destination dataflows.
Select the filter icon on the top left to launch the sort panel. The sort panel provides a list of all your destinations. You can select more than one destination from the list to see a filtered selection of dataflows associated with the selected destination.
From the Activation data column, select the datasets control to view all datasets mapped to this export dataflow.
The Activation data page for the destination appears. Use the checkboxes on the left side of the dataset list to select the datasets which you want to remove, then select Remove datasets in the right rail to trigger the remove dataset confirmation dialog.
In the confirmation dialog, select Remove to immediately remove the dataset from exports to the destination.
Refer to the product description documents to understand how much data you are entitled to export for each Experience Platform application, per year. For example, you can view the Real-Time CDP Product Description here.
Note that the data export entitlements for different applications are not additive. For example, this means that if you purchase Real-Time CDP Ultimate and Adobe Journey Optimizer Ultimate, the profile export entitlement will be the larger of the two entitlements, as per the product descriptions. Your volume entitlements are calculated by taking your total number of licensed profiles and multiplying by 500 KB for Real-Time CDP Prime or 700 KB for Real-Time CDP Ultimate to determine how much volume of data you are entitled to.
On the other hand, if you purchased add-ons such as Data Distiller, the data export limit that you are entitled to represents the sum of the product tier and the add-on.
You can view and track your profile exports against your contractual limits in the license usage dashboard.
Keep in mind the following limitations for the general availability release of dataset exports:
Can we generate a file without a folder if we just save at /
as the folder path? Also, if we don’t require a folder path, how will files with duplicate names be generated in a folder or location?
Starting with the September 2024 release, it is possible to customize the folder name and even use /
for exporting files for all datasets in the same folder. Adobe does not recommend this for destinations exporting multiple datasets, as system-generated filenames belonging to different datasets will be mixed in the same folder.
Can you route the manifest file to one folder and data files into another folder?
No, there is no capability to copy the manifest file to a different location.
Can we control the sequencing or timing of file delivery?
There are options for scheduling the export. There are no options for delaying or sequencing the copy of the files. They are copied to your storage location as soon as they are generated.
What formats are available for the manifest file?
The manifest file is in .json format.
Is there API availability for the manifest file?
No API is available for the manifest file, but it includes a list of files comprising the export.
Can we add additional details to the manifest file (i.e., record count)? If so, how?
There is no possibility to add additional info to the manifest file. The record count is available via the flowRun
entity (queryable via API). Read more in destinations monitoring.
How are data files split? How many records per file?
Data files are split per the default partitioning in the Experience Platform data lake. Larger datasets have a higher number of partitions. The default partitioning is not configurable by the user as it is optimized for reading.
Can we set a threshold (number of records per file)?
No, it is not possible.
How do we resend a data set in the event that the initial send is bad?
Retries are in place automatically for most types of system errors.