Identity graph linking rules are currently in Limited Availability. Contact your Adobe account team for information on how to access the feature in development sandboxes.
As you test and validate identity graph linking rules, you may run into some issues related to data ingestion and graph behavior. Read this document to learn how to troubleshoot some common issues that you might encounter when working with identity graph linking rules.
The following diagram is a simplified representation of how data flows into Adobe Experience Platform and Applications. Use this diagram as reference to help you get a better understanding of the contents of this page.
It is important to note the following factors:
Ingestion issue type | Does the data get ingested in data lake? | Does the data get ingested in Profile? | Does the data get ingested in Identity Service? |
---|---|---|---|
General ingestion issue | No | No | No |
Graph issue | Yes | Yes | No |
Profile fragment issue | Yes | No | Yes |
This section assumes that the data has been successfully ingested into data lake and that there were no syntax or other errors that would prevent the data from being ingested into Experience Platform in the first place.
The examples use ECID as the cookie namespace and CRMID as the person namespace.
There are various reasons for why this could happen, including, but not limited to the following:
Within the context of identity graph linking rules, a record may be rejected from Identity Service because the incoming event has two or more identities with the same unique namespace but different identity value. This scenario usually happens due to implementation errors.
Consider the following event with two assumptions:
The following event will return an error message indicating that ingestion has failed.
{
"_id": "random_string",
"eventType": "web browsing event",
"identityMap": {
"ECID": [
{
"id": "11111111111111111111111111111111111111",
"primary": false
}
],
"CRMID": [
{
"id": "Alice",
"primary": true
}
]
},
"CRMID": "Bob",
"timestamp": "2024-08-17T15:22:51+00:00",
"web": {
"webPageDetails": {
"URL": "https://www.adobe.com/acrobat.html",
"name": "Adobe Acrobat"
}
}
}
Troubleshooting steps
To resolve this error, you must first collect the following information:
identity_value
) you expected to be ingested in the identity graph.dataset_name
) in which the event was sent in.Next, use Adobe Experience Platform Query Service and run the following query:
Replace dataset_name
and identity_value
with the information that you collected.
SELECT key, col.id as identityValue, timestamp, _id, identityMap, *
FROM (SELECT key, explode(value), *
FROM (SELECT explode(identityMap), *
FROM dataset_name)) WHERE col.id = 'identity_value'
After running your query, find the event record that you expected to generate a graph, and then validate that the identity values are different in the same row. View the following image for an example:
If the two identities are exactly the same, and if the event is ingested via streaming, then both Identity and Profile will deduplicate the identity.
Namespace priority plays an important role in how event fragments determine primary identity.
primary=true
flag.primary=true
flag.In order for authenticated user events to be tied to the person namespace, all authenticated events must contain the person namespace (CRMID). This means that even after a user logs in, the person namespace must still be present on every authenticated event.
You may continue to see primary=true
‘events’ flag when looking up a profile in profile viewer. However, this is ignored and will not be used by Profile.
AAIDs are blocked by default. Therefore, if you are using the Adobe Analytics source connector, you must ensure that the ECID is prioritized higher than the ECID so that the unauthenticated events will have a primary identity of ECID.
Troubleshooting steps
There are various reasons that contribute as to why your experience event fragments are not getting ingested into Profile, including but not limited to:
_id
and a timestamp
._id
must be unique for each event (record).In the context of namespace priority, Profile will reject any event that contains two or more identities with the highest namespace priority. For example, if GAID is not marked as a unique namespace and two identities both with a GAID namespace and different identity values came in, then Profile will not store any of the events.
Troubleshooting steps
If your data is sent to data lake, but not Profile, and you believe that this is due to sending two or more identities with the highest namespace priority in a single event, then you may run the following query to validate that there are two different identity values sent against the same namespace:
In the following queries, you must:
_testimsorg.identification.core.email
with the path sending the identity.Email
with the namespace with the highest priority. This is the same namespace that is not being ingested.dataset_name
with the dataset that you wish to query. SELECT identityMap, key, col.id as identityValue, _testimsorg.identification.core.email, _id, timestamp
FROM (SELECT key, explode(value), *
FROM (SELECT explode(identityMap), *
FROM dataset_name)) WHERE col.id != _testimsorg.identification.core.email and key = 'Email'
This query assumes that:
key='Email'
from the WHERE clause.This section outlines common issues you may encounter regarding how the identity graph behaves.
The identity optimization algorithm will honor the most recently established links and remove the oldest links. Therefore, it is possible that once this feature is enabled, ECIDs could be reassigned (re-linked) from one person to another. To understand the history of how an identity gets linked over time, follow the steps below:
Troubleshooting steps
The following steps will retrieve information under the following assumptions:
A single dataset is in use (this will not query multiple datasets).
The data is not deleted from data lake due to deletion by Advanced Data Lifecycle Management, Privacy Service, or other services conducting deletion.
First, you must collect the following information:
Identity symbols (namespaceCode) are case sensitive. To retrieve all identity symbols for a given dataset in the identityMap, run the following query:
SELECT distinct explode(*)FROM (SELECT map_keys(identityMap) FROM dataset_name)
If you do not know the identity value of your cookie identifier and you would like to search for a cookie ID that would have been linked to multiple person identifiers, then you must run the following query. This query assumes ECID as the cookie namespace and CRMID as the person namespace.
SELECT identityMap['ECID'][0]['id'], count(distinct identityMap['CRMID'][0]['id']) as crmidCount FROM dataset_name GROUP BY identityMap['ECID'][0]['id'] ORDER BY crmidCount desc
SELECT identityMap['ECID'][0]['id'], count(distinct personID) as crmidCount FROM dataset_name group by identityMap['ECID'][0]['id'] ORDER BY crmidCount desc
Note: personID refers to the path of the descriptor. You can find this information under schemas.
Now that you’ve identified the cookie values linked to multiple person IDs, take one from the results and use it in the following query to get a chronological view of when that cookie value was linked to a different person identifier:
SELECT identityMap['CRMID'][0]['id'] as personEntity, *
FROM dataset_name
WHERE identitymap['ECID'][0].id ='identity_value'
ORDER BY timestamp desc
SELECT _experience.analytics.customDimensions.eVars.eVar10 as personEntity, *
FROM dataset_name
WHERE identitymap['ECID'][0].id ='identity_value'
ORDER BY timestamp desc
Note: This example assumes that eVar10
is marked as an identity. For your configurations, you must change the eVar based on your own organization’s implementation.
Troubleshooting steps
Refer to the documentation on identity optimization algorithm, as well as the types of graph structures that are supported.
You can also use the graph simulation tool in the UI to simulate events and configure your own unique namespace and namespace priority settings. Doing so can help give you a baseline understanding of how the identity optimization algorithm should behave.
If your simulation results match your graph behavior expectations, then you can check and see if your identity settings matches the settings that you have configured in your simulation.
Identity graphs will adhere to your configured unique namespace and namespace priority after the settings have been saved. Any “collapsed” graphs that exist before you save your new settings will not be affected, until new data is ingested such that the collapsed graph is updated. The primary identity of event fragments on Real-Time Customer Profile will not be updated even after namespace priority changes.
Troubleshooting steps
You can use the identity graph viewer to check whether your graph was ingested before or after your settings. Examine the last updated timestamp under Link properties to see when Identity Service ingested the graph. If the timestamp is before configuration, then that suggests that the “collapsed” graph was created before enabling the feature.
Use the identity dashboard for insights on the state of your identity graph, such as the count of identities and graphs. Refer to the metric, “Graph count with multiple namespaces” for a count of graphs that have collapsed - these are graphs that contain two or more identities with the same namespace. Assuming that the sandbox has no data, and you have configured a namespace (e.g. CRMID) to be unique, the expectation is that there should be zero graphs that have two or more CRMIDs. In the example below, there are two graphs that contain two or more email addresses.
You can find a detailed breakdown in the profile snapshot export dataset in data lake by running the query below:
Replace dataset_name
with the actual name of your dataset.
The counts may not exactly match. The identity dashboard is based on the identity graph count and the following query is based on profile count with two or more identities. The data is independently processed and updated by the service.
SELECT key, identityCountInGraph, count(identityCountInGraph) as graphCount
FROM (SELECT key, cardinality(value) as identityCountInGraph
FROM (SELECT explode(identityMap)
FROM dataset_name
WHERE cardinality(identityMap) > 1)) /* by definition, graphs have 2 or more identities */
WHERE key not in ('ecid', 'aaid', 'idfa', 'gaid') /* filter out common device/cookie namespaces */
GROUP BY 1, 2
ORDER BY 1, 2 asc
You can use the following query in profile snapshot export dataset to obtain sample identities from “collapsed” graphs.
SELECT identityMap
FROM dataset_name
WHERE cardinality(identityMap['CRMID'])>1 /* any graphs with 2+ CRMID. Change CRMID namespace if needed */
The two queries listed above will yield expected results if the sandbox is not enabled for the shared device interim approach and will behave differently from identity graph linking rules.
This section outlines a list of answers to frequently asked questions about identity graph linking rules.
Read this section for answers to frequently asked questions about the identity optimization algorithm.
This scenario is unsupported. Therefore, you may see graphs collapse in cases where a user uses their B2C CRMID to login, and another user uses their B2B CRMID to login. For more information, read the section on single person namespace requirement in the implementation page.
Existing collapsed graphs will be affected (‘fixed’) by the graph algorithm only if these graphs get updated after you save your new settings.
For more information, read the guide on determining the primary identity for experience events.
The CRMID of the last authenticated user will be linked to the ECID (shared device). ECIDs can be reassigned from one person to another based on user behavior. The impact will depend on how the journey is constructed, so it is important that customers test out the journey in a development sandbox environment to validate the behavior.
The key points to highlight are as follows:
Journeys should look up a profile with a unique namespaces because a non-unique namespace may be re-assigned to another user.
Read this section for answers to frequently asked questions about namespace priority.
There are two ‘buckets’ of namespaces: person namespaces and device/cookie namespaces. The newly created custom namespace will have the lowest priority in each ‘bucket’ so that this new custom namespace does not impact existing data ingestion.
Yes, the ‘primary’ flag on identityMap is used by other services. For more information, read the guide on the implications of namespace priority on other Experience Platform services.
No. Namespace priority will only apply to Experience Event datasets using the XDM ExperienceEvent Class.
The identity optimization algorithm will be applied first to ensure person entity representation. Afterwards, if the graph tries to exceed the identity graph guardrail (50 identities per graph), then this logic will be applied. Namespace priority does not affect the deletion logic of the 50 identity/graph guardrail.
Read this section for answers to frequently asked questions about testing and debugging features in identity graph linking rules.
Generally speaking, testing on a development sandbox should mimic the use cases you intend to execute on your production sandbox. Refer to the following table for some key areas to validate, when conducting comprehensive testing:
Test case | Test steps | Expected outcome |
---|---|---|
Accurate person entity representation |
|
|
Segmentation | Create four segment definitions (NOTE: Each pair of segment definition should have one evaluated using batch and the other streaming.)
|
Regardless of shared device scenarios, John and Jane should always qualify for their respective segments. |
Audience qualification / unitary journeys on Adobe Journey Optimizer |
|
|
Use the graph simulation tool to validate that the feature is working at an individual graph level.
To validate the feature at a sandbox level, refer to the Graph count with multiple namespaces section in the identity dashboard.