Experience Data Model (XDM) is the core framework that standardizes customer experience data by providing common structures and definitions for use in downstream Adobe Experience Platform services. By adhering to XDM standards, all customer experience data can be incorporated into a common representation and used to gain valuable insights from customer actions, define customer audiences, and express customer attributes for personalization purposes.
Since XDM is extremely versatile and customizable by design, it is important to follow best practices for data modeling when designing your schemas. This document covers the key decisions and considerations that you must make when mapping your customer experience data to XDM.
Before reading this guide, review the XDM System overview for a high-level introduction to XDM and its role within Experience Platform.
As this guide focuses exclusively on key considerations regarding schema design, you are strongly recommended to read the basics of schema composition for detailed explanations of the individual schema elements mentioned throughout this guide.
The recommended approach for designing your data model for use in Experience Platform can be summarized as follows:
The steps related to identifying the applicable data sources required to carry out your business use cases vary from organization to organization. While the remainder of sections throughout this document focus on the latter steps of organizing and constructing an ERD after the data sources have been identified, the explanations of the diagram’s various components may inform your decisions as to which of your data sources should be migrated to Platform.
Once you have determined the data sources that you wish to bring into Platform, create a high-level ERD to help guide the process of mapping your data to XDM schemas.
The example below represents a simplified ERD for a company who wants to bring data into Platform. The diagram highlights the essential entities that should be sorted into XDM classes, including customer accounts, hotels, and several common e-commerce events.
Once you have created an ERD to identify the essential entities you would like to bring into Platform, these entities must be sorted into profile, lookup, and event categories:
Category | Description |
---|---|
Profile entities | Profile entities represent attributes relating to an individual person, typically a customer. Entities that fall under this category should be represented by schemas based on the XDM Individual Profile class. |
Lookup entities | Lookup entities represent concepts that can relate to an individual person, but cannot be directly used to identify the individual. Entities that fall under this category should be represented by schemas based on custom classes, and are linked to profiles and events through schema relationships. |
Event entities | Event entities represent concepts related to actions that a customer can take, system events, or any other concepts where you may want to track changes over time. Entities that fall under this category should be represented by schemas based on the XDM ExperienceEvent class. |
The sections below provide further guidance for how to sort your entities into the above categories.
A primary way of sorting between entity categories is whether the data being captured is mutable or not.
Attributes belonging to profiles or lookup entities are typically mutable. For example, a customer’s preferences might change over time, and the parameters of a subscription plan can be updated depending on market trends.
By contrast, event data is typically immutable. Since events are attached to a specific timestamp, the “system snapshot” that an event provides does not change. For example, an event can capture a customer’s preferences when they checkout a cart, and does not change even if the customer’s preferences end up changing later on. Event data cannot be changed after it has been recorded.
To summarize, profiles and lookup entities contain mutable attributes and represent the most current information about the subjects they capture, while events are immutable records of the system at a specific time.
If an entity contains any attributes related to an individual customer, it is most likely a profile entity. Examples of customer attributes include:
If you want to analyze how certain attributes within an entity change over time, it is most likely an event entity. For example, adding product items to a cart can be tracked as add-to-cart events in Platform:
Customer ID | Type | Product ID | Quantity | Timestamp |
---|---|---|---|---|
1234567 | Add | 275098 | 2 | Oct 1, 10:32 AM |
1234567 | Remove | 275098 | 1 | Oct 1, 10:33 AM |
1234567 | Add | 486502 | 1 | Oct 1, 10:41 AM |
1234567 | Add | 910482 | 5 | Oct 3, 2:15 PM |
When categorizing your entities, it is important to think about the audiences you may want to build to address your particular business use cases.
For example, a company wants to know all of the “Gold” or “Platinum” members of their loyalty program that have made more than five purchases in the last year. Based on this segmentation logic, the following conclusions can be made regarding how relevant entities should be represented:
In addition to considerations regarding segmentation use cases, you should also review the activation use cases for those audiences to identify additional relevant attributes.
For example, a company has built an audience based on the rule that country = US
. Then, when activating that audience to certain downstream targets, the company wants to filter all exported profiles based on home state. Therefore, a state
attribute should also be captured in the applicable profile entity.
Based on the use case and granularity of your data, you should decide whether certain values need to be pre-aggregated before being included in a profile or event entity.
For example, a company wants to build an audience based on the number of cart purchases. You can choose to incorporate this data at the lowest granularity by including each timestamped purchase event as its own entity. However, this can sometimes increase the number of recorded events exponentially. To reduce the number of ingested events, you can choose to create an aggregate value numberOfPurchases
over a week long or month long period. Other aggregate functions like MIN and MAX can also apply to these situations.
Experience Platform does not currently perform automatic value aggregation, although this is planned for future releases. If you choose to use aggregated values, you must perform the calculations externally before sending the data to Platform.
The cardinalities established in your ERD can also provide some clues as to how to categorize your entities. If there is a one-to-many relationship between two entities, the entity that represents the “many” is likely to be an event entity. However, there are also cases where the “many” is a set of lookup entities that are provided as an array within a profile entity.
Since there is no universal approach to fit all use cases, it is important to consider the pros and cons of each situation when categorizing entities based on cardinality. See the next section for more information.
The following table outlines some common entity relationships and the categories that can be derived from them:
Relationship | Cardinality | Entity categories |
---|---|---|
Customer and Cart Checkout | One to many | A single customer may have many cart checkouts, which are events that can be tracked over time. Customer would therefore be a profile entity, while Cart Checkout would be an event entity. |
Customer and Loyalty Account | One to one | A single customer can only have one loyalty account, and a loyalty account can only belong to one customer. Since the relationship is one-to-one, both Customer and Loyalty Account represent profile entities. |
Customer and Subscription | One to many | A single customer may have many subscriptions. Since the company is only concerned with a customer’s current subscriptions, Customer is a profile entity, while Subscription is a lookup entity. |
While the previous section provided some general guidelines for deciding how to categorize your entities, it is important to understand that there can often be pros and cons for choosing one entity category over another. The following case study is intended to illustrate how you might consider your options in these situations.
A company tracks active subscriptions for their customers, where one customer can have many subscriptions. The company also wants to include subscriptions for segmentation use cases, such as finding all users with active subscriptions.
In this scenario, the company has two potential options for representing a customer’s subscriptions in their data model:
The first approach would be to include an array of subscriptionID
within the profile entity for Customer.
Pros
Cons
The second approach would be to use event schemas to represent a subscription event. This would include the subscription ID alongside a customer ID and a timestamp of when the subscription event occurred.
Pros
Cons
Once you have sorted your entities into profile, lookup, and event categories, you can start converting your data model into XDM schemas. For demonstration purposes, the example data model shown earlier has been sorted into appropriate categories in the following diagram:
The category that an entity has been sorted under should determine the XDM class you base its schema on. To reiterate:
While event entities are almost always represented by separate schemas, entities in the profile or lookup categories may be combined together in a single XDM schema, depending on their cardinality.
For example, since the Customer entity has a one-to-one relationship with the LoyaltyAccount entity, the schema for the Customer entity could also include a LoyaltyAccount
object to contain the appropriate loyalty fields for each customer. If the relationship is one to many, however, the entity that represents the “many” could be represented by a separate schema or an array of profile attributes, depending on its complexity.
The sections below provide general guidance on constructing schemas based on your ERD.
The rules of schema evolution dictate that only non-destructive changes can be made to schemas once they have been implemented. In other words, once you add a field to a schema and data has been ingested against that field, the field can no longer be removed. It is therefore essential to adopt an iterative modeling approach when you are first creating your schemas, starting with a simplified implementation which progressively gains complexity over time.
If you are not sure whether a particular field is necessary to include in a schema, the best practice is to leave it out. If it is later determined that the field is necessary, it can always be added in the next iteration of the schema.
In Experience Platform, XDM fields marked as identities are used to stitch together information about individual customers coming from multiple data sources. Although a schema can have multiple fields marked as identities, a single primary identity must be defined for the schema to be enabled for use in Real-Time Customer Profile. See the section on identity fields in the basics of schema composition for more detailed information on the use case of these fields.
When designing your schemas, any primary keys in your relational database tables are likely candidates for primary identities. Other examples of applicable identity fields are customer email addresses, phone numbers, account IDs, and ECID.
Experience Platform provides several out-of-the-box XDM schema field groups for capturing data related to the following Adobe applications:
For example, you can use the Adobe Analytics ExperienceEvent Template field group to map Analytics-specific fields to your XDM schemas. Depending on the Adobe applications you are working with, you should be using these Adobe-provided field groups in your schemas.
Adobe application field groups automatically assign a default primary identity through the use of the identityMap
field, which is a system-generated, read-only object that maps standard identity values for an individual customer.
For Adobe Analytics, ECID is the default primary identity. If an ECID value is not provided by a customer, the primary identity instead defaults to AAID.
When using Adobe application field groups, no other fields should be marked as the primary identity. If there are additional properties that need to be marked as identities, these fields must be assigned as secondary identities instead.
When you ingest data into the data lake, data validation is only enforced for constrained fields. To validate a particular field during a batch ingestion, you must mark the field as constrained in the XDM schema. To prevent bad data from being ingested into Platform, you are recommended to define the criteria for field level validation when you create your schemas.
Validation does not apply to nested columns. If the field format is located within an array column, the data will not be validated.
To set constraints on a particular field, select the field from the Schema Editor to open the Field properties sidebar. See the documentation on type-specific field properties for exact descriptions of the available fields.
The following are a collection of suggestions to maintain data integrity when you create a schema.
identityMap
field often serves as the primary identity. Avoid designating additional fields as primary identities for that schema._id
is not used as an identity: The _id
field in Experience Event schemas cannot be used as an identity as it is meant for record uniqueness.This document covered the general guidelines and best practices for designing your data model for Experience Platform. To summarize:
Once you are ready, see the tutorial on creating a schema in the UI for step-by-step instructions on how to create a schema, assign the appropriate class for the entity, and add fields to map your data to.