HOME

Archive for the ‘Identity Management’ Category

Making Systems Smarter

Thursday, September 16th, 2010

By Mike Betron, Infoglide Director of Marketing

Several years ago, identity resolution was almost exclusively tied to detecting fraud. Over time, the “identity” of identity resolution has continued to evolve and broaden. Many areas of commerce are discovering that efficiency can be improved dramatically when you have a clear picture of the individuals you’re dealing with and their social network.

Of course, identity resolution is not the only way to gain that efficiency. Another method, called preference profiling, self-learns by monitoring the actions of an individual or client. One example is Google News. The kind of preference profiling that Google captures and saves, for instance, is based on my behavior. That information is then used to improve my online experience and make better use of my time by employing search techniques that “know” what topics I’m interested in based on my past behavior. In my case, for example, Google News is more likely serve up sports stories than fashion articles when I ask for the latest news.

For the most part, preference profiling and other techniques besides identity resolution operate on the premise that individuals will continue to act as they have acted in the past. If this weren’t true, companies like Google wouldn’t be in business, but we’re discovering the limitations of working only with historical data. Sometimes people change and sometimes they aren’t forthcoming about their associations for various reasons.

Commercial systems, including those employed in financial services industries, are becoming “smarter.” By incorporating identity resolution technology, they enhance existing historical data systems with information drawn from a wide variety of dynamic data sources (e.g., social media). Providing a real-time “360 view” of an individual and his/her associations is improving daily business decisions at many leading companies.

Reference Linking Methods - Part 4

Thursday, September 2nd, 2010

By John Talburt, PhD, CDMP, Director, UALR Laboratory for Advanced Research in Entity Resolution and Information Quality (ERIQ)

This is the last in a series of four posts that discuss four methods for linking references.  These methods are:

  1. Direct matching
  2. Transitive linking
  3. Linking by association
  4. Asserted linking

In the direct matching, transitive linking, and association analysis methods discussed in previous posts, the evidence for establishing a link comes from the references themselves, either as attribute values or relationships with other references.  A link created in this way is also called an inferred link.

But in almost any ER context, some pairs of equivalent references (i.e. that refer to the same entity) will have insufficient evidence available in the references themselves to make that determination, thereby leaving them as unlinked false negatives.  For example, in the previous post we discussed how it might be possible to discover that the references to Mary Smith on Oak St and the Mary Smith on Elm St are equivalent through association analysis.  But if the collateral evidence of the shared address association were not available, then the link could not have been inferred.

A different way to approach this problem is through asserted linking.  An asserted link between two references is based on prior knowledge that they are equivalent.  For this reason, creating links in this way is also called knowledge-based linking, and ER systems that use this method of resolution are called knowledge-based ER systems.

An asserted link often takes the form of a single record carrying the attribute values of two non-matching references.  The assertion about Mary Smith’s change of address might be something like:

The Mary Smith previously residing at 123 Oak is now residing at 456 Elm.

It reflects the knowledge that references to Mary Smith on Oak Street and Mary Smith on Elm Street are equivalent independent of any similarity or dissimilarity between their corresponding attribute values.

So where do these assertions come from?  Not out of thin air.  An assertion like this could have been self-reported, acquired from public records, or gotten from a commercial data provider, such as a magazine subscription service.  If this knowledge were to be acquired and provisioned in the ER identity management system prior to processing a reference to either Mary Smith on Oak street or Mary Smith on Elm street, then both references would be recognized as equivalent and could be linked at the time they were processed, regardless of the order in which they were received.  Jeff Jonas calls ER systems that have this property “sequence neutral.”

Asserted linking is not just theoretical.  For example, Acxiom® Corporation has made asserted linking the backbone of its AbiliTec® CDI technology that manages billions of assertions for U.S. consumers alone.

The disadvantage of asserted linking is that it is a non-trivial activity to acquire, store, and manage the assertions.  Asserted linking divides the overall ER process into two concurrent processes.  One is a foreground process for resolving equivalence and applying links.  The other is a background process that acquires and integrates assertions into the identity management system.  Of course, timing is critical.  If an assertion is not acquired and available before processing the references that need them, then their equivalence will not be recognized and they will not be linked.

In the next post, I plan to discuss the role of ER in entity-based information exchange systems,  sometimes called “information hubs.”

Surface Web, Dark Web, and Social Media

Thursday, August 26th, 2010

By Mike Betron, Infoglide Software Director of Marketing

A recent article in Bank Systems & Technology says that financial services institutions are discovering increasingly sophisticated attempts to defraud their customers – more sophisticated in how they gather information and employ it in their criminal schemes. “As fraudsters increasingly seek to exploit weaknesses in consumers’ defenses through social engineering schemes rather than hack vulnerabilities in banks’ security systems, the need for enterprisewide solutions to detect fraud across channels is greater than ever.”

The sources of information about individual consumers are rapidly growing and increasingly accessible. Most of us concerned about our personal information think first about that portion of the World Wide Web that is indexable by conventional search engines, sometimes called the “surface web.” That information is more easily monitored and managed than social media sites (e.g. Facebook) and other dark web sources such as local, state, and government databases.

What is needed is a comprehensive picture of our personal online reputation, but it’s not a simple task. It requires the ability to tie together diverse data from a multitude of sources in a variety of formats. Using similarity searching, advanced filtering, and sophisticated scoring, federated searching across disparate data sources can produce a unified view of an individual’s “identity” for ongoing monitoring. Ideally, that unified identity can be refreshed using automated rather than manual processes.

Identity resolution offers the ideal core technology for any solution designed to present a unified picture of personal identity across surface web, dark web, and social media sources. Since identity resolution engines are designed to incorporate new data sources without requiring system rewrites, they offer the best hope for deployment of extensible systems for online identity management.

Identity Resolution Daily Links 2010-05-01

Saturday, May 1st, 2010

[Post from Infoglide] Architectures for Entity Resolution-Part 3

“In the last two posts we reviewed the basic architectures used to implement entity resolution (ER) systems.  We started with the most basic systems, the merge/purge  and heterogeneous join processes. In the last post, we discussed identity resolution systems, the first of two types of ER architectures that perform identity management.  By retaining identity information, these systems are able to recognize the same identity over time and to assign it a persistent identifier.”

PR-USA.net: Houston Medical Equipment Company Owner, Operator and Patient Recruiter Plead Guilty to Health Care

“Onward began billing Medicare for fraudulent durable medical equipment in 2003, according to court documents. Vinitski and Lachman admitted they paid kickbacks, sometimes $1,000 per patient, to recruiters who brought patients to Onward. Lachman and Vinitski then would bill Medicare for durable medical equipment that these patients did not need or never received.”

Inside Louisiana News: Governor Jindal Addresses First Annual Parish Leadership Summit

Governor Jindal also said GOHSEP worked with the Louisiana State Police to develop the Fusion Center which ensures Louisiana is constantly connected to the U.S. Department of Homeland Security. The state’s Fusion Center also established the first cyber branch in the country with a focus on identifying, mitigating, and thwarting a cyber security attack on any of the state’s critical infrastructure.”

ExpertVision: What is Identity Resolution?

“What is the difference between an entity and the names used to refer to that entity? One individual, product or company might be known by a number different names, each of which make sense in context.”

Architectures for Entity Resolution-Part 3

Thursday, April 29th, 2010

By John Talburt, PhD, CDMP, Director, UALR Laboratory for Advanced Research in Entity Resolution and Information Quality (ERIQ)

In the last two posts we reviewed the basic architectures used to implement entity resolution (ER) systems.  We started with the most basic systems, the merge/purge and heterogeneous join processes. In the last post, we discussed identity resolution systems, the first of two types of ER architectures that perform identity management.  By retaining identity information, these systems are able to recognize the same identity over time and to assign it a persistent identifier.

The distinguishing characteristic of identity resolution systems is that they start with a given set of identities to which input references are resolved.  An example would be a customer recognition system where the starting identities are the customers of the business. However, there are many situations where the identities are not necessarily known in advance. In some cases, it is not because the entities are unknown, but simply that they are not organized in a way that can be easily pre-loaded.

For example, two companies merge.  Each company has its own customer database, but the customers are identified in different ways.  The same situation can arise in one company through poor systems and practices, resulting in no confidence that the master records are not duplicated across business lines or company locations.

The type of system often used to address these situations is called an “ identity capture” system. Identity capture systems resemble a cross between a “smart” merge/purge system and an identity resolution system.  They support identity management and persistent identifiers, but start without a preloaded set of identities.

Here is how they work.  As references are resolved, the system saves what it has learned rather than discarding it, so identities are built on the fly as references are processed.  For example, suppose an elementary school has 10 years of enrollment records, i.e. for each year, it has records of all the students where were in grades 1 through 6.  Each year some students leave grade 6 for middle school or transfer from any of the grades to another school.  At the same time some new students enter at first grade or transfer into upper grades from another school.  In an identity capture system, the identity master starts out empty.  When the first enrollment file is processed, almost all of the enrollment records processed will represent new identities.  The identity characteristics in each record are captured and stored to create a new identity master record.

When the next year of enrollment is processed, the system should recognize students re-enrolling from the previous year, so it only captures as new identities those students entering the school that year.  However, in many identity capture systems, the process of capture goes beyond simply adding new identities and can also be used to enhance existing identities.  For example, suppose that from the first year enrollment,  an identity was created for student Edgardo Mendez with a 7/12/2000 date-of-birth (DOB).  Then in the next year of enrollment the system is presented with the record of Eddie Mendez with a 7/12/2000 DOB.  Based on the resolution rules (including conflict rules), the embedded identity resolution process may decide that these are both references to the same student.  If that were the case, it would enhance the identity master record to include the second year first name variant, so that going forward it would recognize the same identity with a first name of either Edgardo or Eddie.

The advantage of collecting identity information on the fly is offset to some extent by the problem of splits and consolidations.  The order that references are processed can sometimes affect the system’s identity decisions.  Information that connects two references may come after the two references have created separate identity master records (false negative).  This requires the two identity master records to be consolidated or merged.  Although the master records can be corrected, it defeats the idea of the persistent identifier in that many previously processed references could have been assigned the identifier associated with the retired master record while others were assigned the identifier of the surviving master record.

Splits are the reverse situation where two references to the same entity are mistakenly used to create a single master identity record (false positive).  Splits are harder to correct than consolidations, and for this reason, ER systems that manage identity tend to err on the side of false negatives than false positives.

In the next post we will discuss the four most common strategies linking references.

Architectures for Entity Resolution-Part 2

Wednesday, March 10th, 2010

By John Talburt, PhD, CDMP, Director, UALR Laboratory for Advanced Research in Entity Resolution and Information Quality (ERIQ)

In the last post we examined how entity resolution (ER) systems are actually implemented, starting with the most basic merge/purge process and heterogeneous join systems. Both of these approaches focus on collecting equivalent references from among the sources provided, either as a large batch of references in a single file, or through queries against a federation of databases.  The entity identities found by these ER systems are transient in the sense that they depend upon the sources input into the process.  When different sources are provided, different identities will emerge.

On the other hand, there are ER systems that retain and manage identity information.  By doing this they are able to “recognize” the same identity over time and assign that identity the same entity identifier (sometimes called “persistent identifiers” or “persistent links”).  In Customer Data Integration (CDI) applications, these kinds of systems are sometimes called Customer Recognition Systems.

Two major types of ER systems perform identity management.  The first type is the “identity resolution” system.  It is most effective in situations where a fairly stable set of known identities of interest exists, such as the set of vendors or customers of a company, a set of products, or the students enrolled in a school.  The attributes of these identities are pre-loaded into the system and assigned identifiers.  When a reference is given to the system, it then decides whether the reference is to one of the known identities, and if so, returns the identifier of that identity.

Identity resolution systems can operate in either batch or transactional mode.  In cases where there are a large number of pre-stored identities, the performance of batch operations can be improved through distributed processing where the identities are partitioned over multiple processors and resolved in parallel.

However, there are many situations where the identities are not necessarily known in advance, or in some cases  the entities are known but simply not organized in such a way that they can be easily pre-loaded.  For example, suppose two companies merge and each company has its own customer database. The customers are identified in different ways in each database, and furthermore, for the customers of one company, poor systems and practices prevent having any confidence that the master records are unduplicated across business lines or company locations.

The type of system often applied in these situations is an “identity capture” system.  The identity capture architecture can be seen as a hybrid of  merge/purge and identity resolution systems.  It supports identity management and persistent identifiers, but without starting with a preloaded set of identities.  In my next post, we’ll delve deeper into the identity capture process.

Identity Resolution Daily Links 2010-02-13

Saturday, February 13th, 2010

[Post from Infoglide] Architectures for Entity Resolution

“In the last post we looked at a formal model for describing entity-based integration. Now let’s turn our attention to how entity resolution (ER) systems are actually implemented.  One of the most important design decisions is whether the system will perform entity identity management.  Systems perform identity management when they create and store the attributes values for the identities that they process.”

tdwi: IBM and Informatica Acquire MDM Capabilities

“The two acquisitions focus the spotlight on two of the hottest functions today, in terms of user organizations adopting them, namely: MDM and identity resolution. More than ever, organizations need trusted data, in support of regulatory reporting, compliance, business intelligence, analytics, operational excellence, and other data-driven requirements. MDM and identity resolution are key enablers for these requirements, so it’s no surprise that two leading vendors have chosen to acquire these at this time.”

PoliceGrantsHelp.com: Building fusion centers for the next decade

“Serrao says that in the time he has spent in a dozen different fusion centers in the United States — coupled with his own background in law enforcement — he’s gleaned several ‘best practices’ for consideration. Ideally, he says, leadership should ’set a specific strategic mission before the center is even built. Everything else follows. Determine the role of the center and whether strategic intelligence analysis will be part of the mix. Then, it will be easier to define what processes will be developed, what reporting mechanisms are needed, what technology is appropriate, and what types of personnel are needed.’”

Prudent Press Agency: Kansas Takes Action Against Lottery Fraud

“The state of Kansas has been conducting sting operations to prevent this kind of theft by lottery terminal clerks. Law enforcement agents fanned out across the state and presented ‘winning’ tickets at several retail lottery outlets. In five separate cases clerks told the agents the tickets were worthless and then tried to redeem the ‘winning’ lottery tickets. The undercover investigation led to charges of attempted theft and computer crime against five people across the state.”

Architectures for Entity Resolution

Wednesday, February 10th, 2010

By John Talburt, PhD, CDMP, Director, UALR Laboratory for Advanced Research in Entity Resolution and Information Quality (ERIQ)

In the last post we looked at a formal model for describing entity-based integration. Now let’s turn our attention to how entity resolution (ER) systems are actually implemented.  One of the most important design decisions is whether the system will perform entity identity management.  Systems perform identity management when they create and store the attributes values for the identities that they process.  Identity management is necessary for systems that assign persistent entity identifiers, i.e. the system must give all of the references to the same entity the same identifier value from one resolution process to the next.

The most basic form of ER is the merge/purge process.  A merge/purge process reads a large batch of references and systematically makes pair-wise comparisons between them.  During the process, it assigns a group identifier to all of the references it determines to be for the same entity.  However, these identifiers are transient, only existing during the process of a particular batch of references since the end result is to create a single, merged record (called a “survivor” record) in place of each reference group.  The result is that references to the same entity occurring in two different merge/purge processes will likely be given different group identifiers from one process to the next.  For example, the references for John Doe in the first batch of references processed might given the group ID of 213, but references to the same John Doe in a batch of references processed the next day might be given a group ID of 634.  The merge/purge process can still correctly resolve the entity references in each batch, but the values of the group IDs don’t persist or carry over for the same entities from batch to batch.

Another characteristic of the merge/purge ER process is that it is designed to operate in batch mode.  However, there are transactional or “on-demand” versions of merge/purge that are sometimes referred to as heterogeneous database join systems.  Instead of combining all of the reference sources into a single file for batch processing, each reference source is loaded as a database table.  The application is connected to all of the source tables and has metadata that describes the structure of each reference source.  This allows a single query or “join request” to be submitted to the application, which then translates the request into an appropriate query for each source.  The individual query responses are collected and processed into a single view that is provided as the query result for the initial query.  Just as in the merge/purge process, the groups of references brought together for an entity (a query) are transient.  These types of query-based ER systems are common in law enforcement and other hypothesis testing applications.

On the other hand, there are other ER architectures designed to retain and manage entity identity information.  By doing this they are able to “recognize” references to the same entity over time and assigned those references the same entity identifier, i.e. maintain persistent entity identifiers.  In CRM applications these kinds of systems are sometimes called Customer Recognition Systems.

There are two major types of ER system architectures that perform identity management - “identity resolution” systems and “identity capture” systems. In the next post, I will pick up here with a discussion of how these systems manage identity and maintain persistent entity identifiers.

Healthcare Identity Resolution Confusion

Wednesday, January 20th, 2010

By Robert Barker, Infoglide Senior VP & Chief Marketing Officer

Confusion about medical records can lead to chaos. We’ve all heard horror stories about hospital tragedies caused by misidentification of a patient, such as applying an unnecessary surgery. It’s hard to overemphasize the importance of correct, unambiguous information in the practice of medicine. Knowing as much as possible about a patient enables a practitioner to reach a correct diagnosis and the proper treatment regimen in the least amount of time.

Underscoring the importance that accurate information plays in effective treatment, the American Recovery and Reinvestment Act (ARRA) passed in 2009 includes incentives for hospitals and doctors to adopt and support certified electronic health record (EHR) technology. In fact, the Act set aside $20 billion to encourage health care organizations to improve their recordkeeping through healthcare information technology.

Today’s hot healthcare industry topic, therefore, is electronic health records. While an EHR can create the potential for interoperability, it can’t deliver interoperability without robust identity resolution. High-quality health care depends on complete, unambiguous patient information being available at all times, so identity resolution technology has become a crucial component of a well-designed healthcare identification infrastructure.

By applying identity resolution to patient identification integrity, identity resolution can prevent common medical errors:
Duplicates are a simple example, where the two records exist for the same person within a single facility. More complex types of errors can easily start to mount up, including overlaps where more than one record exists for one person within two facilities within a single organization, and overlays where information for two people are integrated under a single record.

The rush to respond to ARRA resulted in overstatements of the identity resolution capabilities of many products. For example, most master data management (MDM) systems include matching and de-duplication capabilities that have become labeled “identity resolution” while in fact they lack the critical requirements for identity resolution. Dan Power of Hub Solution Designs has pointed out the growing role of identity resolution in MDM and the need for MDM vendors to move beyond “not invented here” thinking to incorporate true identity resolution into their offerings.

Confusion about medical records can lead to chaos. Clearing up confusion about identity resolution clears a path out of the chaos that will lead to better solutions.

Identity Resolution Daily Links 2009-12-11

Friday, December 11th, 2009

[Post from Infoglide] State Agencies Adopting Entity Resolution

“Significant opportunities to apply identity resolution and entity analytics exist at the state level. State agencies interact with citizens and corporations across many domains, including collection of tax revenues (e.g. oil and gas – I’m from Texas!), licenses (e.g. motor vehicles, hunting, fishing), housing programs, lotteries, child protective services, health care, workers’ compensation, the court system, law enforcement, and homeland security.”

thestar.com: Store owner guilty in $5.75M lottery fraud

“A former convenience store owner has pleaded guilty to defrauding the Ontario Lottery Corporation after misrepresenting a winning ticket worth $5.75 million as his own.”

ZDNet: Cloud computing, so much more than multi-tenancy

“The trouble with talking about multi-tenancy itself is that it draws you into an abstract debate with conventional software vendors over the relative merits of alternative deployment platforms for a given application. This immediately brings the debate onto their home ground — a place where applications are discrete, deployments happen as a batch process and you have to get the system up-and-running before you even start thinking about meeting the business requirement. That’s not where the cloud is at.”

Liliendahl on Data Quality: Phony Phones and Real Numbers

“There are plenty of data quality issues related to phone numbers in party master data. Despite that a phone number should be far less fuzzy than names and addresses I have spend lots of time having fun with these calling digits.”

UALR: UALR Joins National Identity Management Center

Dr. John R. Talburt, the Acxiom Chair of Information Quality at UALR, is an expert in the fields of information quality and entity resolution and will represent UALR at the center. ‘Dr. Talburt is a widely recognized, well-respected expert in the field of information quality and identity resolution. His vast knowledge in these areas of identity management will be an incredible asset for CAIMR and the research we are undertaking this coming year,’ said Dr. Gary R. Gordon, CAIMR’s executive director.”

EDITOR’S NOTE: Infoglide Corporation maintains a partnership with Dr. Talburt and his Laboratory for Advanced Research in Entity Resolution and Information Quality (ERIQ). The Lab conducts research addressing important problems related to entity resolution and information quality.


Bad Behavior has blocked 1166 access attempts in the last 7 days.

Close
E-mail It
Portfolio Strategy News The Direct Marketing Voice