HOME

Entity Resolution vs. Entity Identification

By John Talburt, PhD, CDMP, Director, UALR Laboratory for Advanced Research in Entity Resolution and Information Quality (ERIQ)

In entity resolution, as in any new research area, different authors or practitioners may use the same term but intend different meanings. You always have to be careful to understand exactly what a writer means when he or she uses a particular term. For example, I have found that the terms “entity resolution”, “entity identification”, and “entity disambiguation” are often used with different meanings by different writers.

Over the years, I have developed my own definitions.  I don’t claim that these are standard definitions, but they are the way I use them in my own work.

First of all, entity resolution is the most general term that encompasses the other two.  Entity resolution (ER) is a process that covers everything from the extracting or collecting of entity references from sources, to linking references to same entity, to exploring networks of entity associations.  Having said that, I find there are generally two uses of the term entity resolution.  Just as we see the term information technology used in the sense of “Big IT” (anything to do with computers) and “Little IT” (a specific curriculum of computer studies), the same can be said for entity resolution.  Big ER is when entity resolution is used to describe the entire process from end-to-end (as in my definition above).  On the other hand, Little ER is when the same term is used to describe just the middle step, the logic of determining which references are to the same entities, i.e. “resolving” the references.

Whereas entity resolution is the process of resolving whether references are to the same entity or to different entities, entity identification describes the special case of entity resolution in which the references are linked to “known” entities, i.e. matching to a set of previously established identities (probably a better term for this than entity identification would be “entity recognition” as in “customer recognition”).  Thus, entity resolution and entity identification (or recognition) mean different things because it is possible to resolve two references without actually knowing the identity of the entities to which they refer.

A good analogy is in criminal investigation.  If two sets of fingerprints are found at a crime scene, it is possible to determine from their characteristics that they belong to two different suspects.  However, the identification of the suspects to whom the fingerprints belong depends upon the completeness of the fingerprint files (known identities).  This is also an example of what is meant by the third term, entity disambiguation, i.e. resolving that two references are to different entities.  In this example, we can resolve that the two references are to different entities without knowing their identities.  Another example might be two records with the name “John Smith” but with different dates of birth.  Without other information we may not know exactly which John Smith’s they are, but could conclude that they are different John Smiths.

Similarly, the same sets of fingerprints could be found at two different crime scenes, but again without the prints being on file.  This would be another case of entity resolution without entity identification, i.e. we know they belong to the same person, but just don’t know whose they are.  When this process is done intentionally, we call it anonymous entity resolution.  As an example, for privacy reasons we may give school records anonymous identifiers that allow us to collect and analyze all of the grades for the same student without revealing the identity of the student.

Entity extraction is another term that I see used to describe entirely different processes, but let’s save that discuss for next time.

Leave a Reply


Bad Behavior has blocked 998 access attempts in the last 7 days.

Close
E-mail It
Portfolio Strategy News The Direct Marketing Voice