Delete datasets and batches

Last update: 2024-06-28
  • Created for:
  • Intermediate
    Developer

Learn how to delete datasets and batches in Adobe Experience Platform. If a dataset needs to be removed from the system for any reason, such as cleaning up test datasets in lower environments or datasets that were added in error, you can simply delete that dataset and remove its contents from the data lake, identity graph, and profile store. Individual batches can be deleted from the data lake, but not from the identity graph and profile store.

 Transcript

In this video, we’ll show you how to delete a dataset, an experienced platform. When you ingest data into platform, that data is stored in a dataset and platforms data leak. If the dataset has been enabled for profile, the data is used to build identity graphs and populate your customer profiles with attributes and events. If a dataset needs to be removed from the system for any reason, such as cleaning up test datasets in lower environments or datasets that were added in error, you can simply deleted inside and remove its contents from the data like identity graph and profile store. There are a few methods you can use to delete platform datasets. To delete a dataset manually, you can use the platform interface or make calls to the catalog API. This will be the method we’ll focus on in this video. If your organization has access to health care shield or security and Privacy Shield, you can also configure your platform data sets to automatically expire on a scheduled future date. Check out our video on automated expirations for more details. Keep in mind that while any dataset you create can be deleted using these methods, system datasets that are auto created by Adobe applications like Analytics Audience Manager or offer decisioning cannot be deleted. All that being said though, why would you want to delete a dataset in the first place? Well, while every organization’s data requirements are different, here’s a few common use cases. Deleting datasets helps them for its data minimization principles, since naturally reducing the amount of data in the system. If the dataset hasn’t been used for anything for an extended period, it may need to be deleted. Second, deleting datasets should be a common practice when removing old test data from lower level development sandboxes when engineers develop and migrate new features. This keeps your test environments clean without the need to do a full sandbox reset and reconfigure everything from scratch. Finally, in rare cases, it may be necessary to make breaking changes to your data model and associated data flows to account for oversights in their initial design. While careful schema planning should make this an avoidable situation, in most cases under extreme circumstances, you can delete the offending schema and its associated datasets and attempt to re ingest the data under a new or modified schema. Now, if your data issues simply stem from a bad batch or a few erroneous records, you won’t need to go as far as deleting the whole dataset. In the case of a bad batch, you can simply delete that batch from the dataset itself, which will show you in this video. Keep in mind that, unlike deleting datasets, deleting batches only removes data from the data lake. It doesn’t remove data from the identity graph or profile store. If there are some individual profiles with incorrect attributes that you want to correct. You can do that directly by using the absolute method through platform APIs. Refer to our corresponding video guide for more info. Okay. Now that that’s all out of the way, let’s jump into the platform interface and walk through the process of deleting a dataset and seeing how that can affect a customer profile. Let’s start with our dataset first. Start by selecting data sides in the left navigation, and here you’ll see a list of all datasets available for your organization and which ones are participating in real time customer profile. Using the search feature, I can narrow down the list to this loyalty dataset, which captures some loyalty program information about our customers. Our engineers only added this dataset recently, and we’re still troubleshooting how it behaves in real time customer profile. So we need to delete this dataset before they add an updated version. We could just go ahead and delete the dataset from here. But before we do that, let’s quickly head over to profiles in the left NAV and look up a sample profile to see how this will affect our data downstream. Since they know the email ID of a particular profile. I’ll use it here to pull it up from the list and click into it to see its details. And here I can see a list of attributes derived from the loyalty data set from earlier. While we’re just looking at a single profile here, deleting the dataset will affect all profiles that use it as a source. Clicking into the customers identity graph, we can see the various identities we have for this profile and how they’re linked to each other. One of these is the customer’s loyalty ID and when we selected it we can see which data says this identity link was inferred from. See, in this case, we know that loyalty IDs were not defined correctly in our test schema, and this actually isn’t the right value for this customer. So I’ll click this node and in the right rail, I can see our Luma loyalty data set listed as the source. This is the data that I want to delete, so I’ll click into it from here. Once we’re in the details view for the data side, you can see a history of previously ingested batches. If this happened to be an ingestion issue with one of the batches and not the data set itself, we could click into the batch from here and then use the delete batch control to remove it from the system cleanly. In this case, though, we want to delete the whole data set, so we’ll go back to the dataset view and from here we’ll select more in the top right and then select DELETE will confirm our choice in the dialog and that’s it. We can check back on the sample profile we looked up earlier and now we can see that the loyalty details it had earlier are gone. Since those attributes were sourced from the data set we just deleted. The effects are immediate down to the individual profile level. Clicking into the identity graph. For this profile we can see that deleting the data slide also removed the node for the loyalty ID tied to this customer’s profile. As a result, any future ingested events or records containing that loyalty ID will not be stitched to this profile unless also paired with one or more of these remaining identities in the graph. As you can see, it’s really easy to delete data sets in the interface, but keep in mind that Experience Platform uses an API first architecture, meaning that any action you can do in the interface can also be done using calls to platforms, open APIs. When it comes to deleting data sets, you can use the data such endpoint in the catalog service API. Simply use the delete method and include the data sets ID in the path. Now you know how to delete a data set and Experience Platform. We hope this functionality will help you ensure that you’re not spending resources on storing any information that you no longer need. Thanks for watching.

On this page