Reference Linking Methods - Part 4
Thursday, September 2nd, 2010By John Talburt, PhD, CDMP, Director, UALR Laboratory for Advanced Research in Entity Resolution and Information Quality (ERIQ)
This is the last in a series of four posts that discuss four methods for linking references. These methods are:
- Direct matching
- Transitive linking
- Linking by association
- Asserted linking
In the direct matching, transitive linking, and association analysis methods discussed in previous posts, the evidence for establishing a link comes from the references themselves, either as attribute values or relationships with other references. A link created in this way is also called an inferred link.
But in almost any ER context, some pairs of equivalent references (i.e. that refer to the same entity) will have insufficient evidence available in the references themselves to make that determination, thereby leaving them as unlinked false negatives. For example, in the previous post we discussed how it might be possible to discover that the references to Mary Smith on Oak St and the Mary Smith on Elm St are equivalent through association analysis. But if the collateral evidence of the shared address association were not available, then the link could not have been inferred.
A different way to approach this problem is through asserted linking. An asserted link between two references is based on prior knowledge that they are equivalent. For this reason, creating links in this way is also called knowledge-based linking, and ER systems that use this method of resolution are called knowledge-based ER systems.
An asserted link often takes the form of a single record carrying the attribute values of two non-matching references. The assertion about Mary Smith’s change of address might be something like:
The Mary Smith previously residing at 123 Oak is now residing at 456 Elm.
It reflects the knowledge that references to Mary Smith on Oak Street and Mary Smith on Elm Street are equivalent independent of any similarity or dissimilarity between their corresponding attribute values.
So where do these assertions come from? Not out of thin air. An assertion like this could have been self-reported, acquired from public records, or gotten from a commercial data provider, such as a magazine subscription service. If this knowledge were to be acquired and provisioned in the ER identity management system prior to processing a reference to either Mary Smith on Oak street or Mary Smith on Elm street, then both references would be recognized as equivalent and could be linked at the time they were processed, regardless of the order in which they were received. Jeff Jonas calls ER systems that have this property “sequence neutral.”
Asserted linking is not just theoretical. For example, Acxiom® Corporation has made asserted linking the backbone of its AbiliTec® CDI technology that manages billions of assertions for U.S. consumers alone.
The disadvantage of asserted linking is that it is a non-trivial activity to acquire, store, and manage the assertions. Asserted linking divides the overall ER process into two concurrent processes. One is a foreground process for resolving equivalence and applying links. The other is a background process that acquires and integrates assertions into the identity management system. Of course, timing is critical. If an assertion is not acquired and available before processing the references that need them, then their equivalence will not be recognized and they will not be linked.
In the next post, I plan to discuss the role of ER in entity-based information exchange systems, sometimes called “information hubs.”

