HOME

Archive for the ‘Customer Data Integration’ Category

Architectures for Entity Resolution-Part 3

Thursday, April 29th, 2010

By John Talburt, PhD, CDMP, Director, UALR Laboratory for Advanced Research in Entity Resolution and Information Quality (ERIQ)

In the last two posts we reviewed the basic architectures used to implement entity resolution (ER) systems.  We started with the most basic systems, the merge/purge and heterogeneous join processes. In the last post, we discussed identity resolution systems, the first of two types of ER architectures that perform identity management.  By retaining identity information, these systems are able to recognize the same identity over time and to assign it a persistent identifier.

The distinguishing characteristic of identity resolution systems is that they start with a given set of identities to which input references are resolved.  An example would be a customer recognition system where the starting identities are the customers of the business. However, there are many situations where the identities are not necessarily known in advance. In some cases, it is not because the entities are unknown, but simply that they are not organized in a way that can be easily pre-loaded.

For example, two companies merge.  Each company has its own customer database, but the customers are identified in different ways.  The same situation can arise in one company through poor systems and practices, resulting in no confidence that the master records are not duplicated across business lines or company locations.

The type of system often used to address these situations is called an “ identity capture” system. Identity capture systems resemble a cross between a “smart” merge/purge system and an identity resolution system.  They support identity management and persistent identifiers, but start without a preloaded set of identities.

Here is how they work.  As references are resolved, the system saves what it has learned rather than discarding it, so identities are built on the fly as references are processed.  For example, suppose an elementary school has 10 years of enrollment records, i.e. for each year, it has records of all the students where were in grades 1 through 6.  Each year some students leave grade 6 for middle school or transfer from any of the grades to another school.  At the same time some new students enter at first grade or transfer into upper grades from another school.  In an identity capture system, the identity master starts out empty.  When the first enrollment file is processed, almost all of the enrollment records processed will represent new identities.  The identity characteristics in each record are captured and stored to create a new identity master record.

When the next year of enrollment is processed, the system should recognize students re-enrolling from the previous year, so it only captures as new identities those students entering the school that year.  However, in many identity capture systems, the process of capture goes beyond simply adding new identities and can also be used to enhance existing identities.  For example, suppose that from the first year enrollment,  an identity was created for student Edgardo Mendez with a 7/12/2000 date-of-birth (DOB).  Then in the next year of enrollment the system is presented with the record of Eddie Mendez with a 7/12/2000 DOB.  Based on the resolution rules (including conflict rules), the embedded identity resolution process may decide that these are both references to the same student.  If that were the case, it would enhance the identity master record to include the second year first name variant, so that going forward it would recognize the same identity with a first name of either Edgardo or Eddie.

The advantage of collecting identity information on the fly is offset to some extent by the problem of splits and consolidations.  The order that references are processed can sometimes affect the system’s identity decisions.  Information that connects two references may come after the two references have created separate identity master records (false negative).  This requires the two identity master records to be consolidated or merged.  Although the master records can be corrected, it defeats the idea of the persistent identifier in that many previously processed references could have been assigned the identifier associated with the retired master record while others were assigned the identifier of the surviving master record.

Splits are the reverse situation where two references to the same entity are mistakenly used to create a single master identity record (false positive).  Splits are harder to correct than consolidations, and for this reason, ER systems that manage identity tend to err on the side of false negatives than false positives.

In the next post we will discuss the four most common strategies linking references.

Architectures for Entity Resolution-Part 2

Wednesday, March 10th, 2010

By John Talburt, PhD, CDMP, Director, UALR Laboratory for Advanced Research in Entity Resolution and Information Quality (ERIQ)

In the last post we examined how entity resolution (ER) systems are actually implemented, starting with the most basic merge/purge process and heterogeneous join systems. Both of these approaches focus on collecting equivalent references from among the sources provided, either as a large batch of references in a single file, or through queries against a federation of databases.  The entity identities found by these ER systems are transient in the sense that they depend upon the sources input into the process.  When different sources are provided, different identities will emerge.

On the other hand, there are ER systems that retain and manage identity information.  By doing this they are able to “recognize” the same identity over time and assign that identity the same entity identifier (sometimes called “persistent identifiers” or “persistent links”).  In Customer Data Integration (CDI) applications, these kinds of systems are sometimes called Customer Recognition Systems.

Two major types of ER systems perform identity management.  The first type is the “identity resolution” system.  It is most effective in situations where a fairly stable set of known identities of interest exists, such as the set of vendors or customers of a company, a set of products, or the students enrolled in a school.  The attributes of these identities are pre-loaded into the system and assigned identifiers.  When a reference is given to the system, it then decides whether the reference is to one of the known identities, and if so, returns the identifier of that identity.

Identity resolution systems can operate in either batch or transactional mode.  In cases where there are a large number of pre-stored identities, the performance of batch operations can be improved through distributed processing where the identities are partitioned over multiple processors and resolved in parallel.

However, there are many situations where the identities are not necessarily known in advance, or in some cases  the entities are known but simply not organized in such a way that they can be easily pre-loaded.  For example, suppose two companies merge and each company has its own customer database. The customers are identified in different ways in each database, and furthermore, for the customers of one company, poor systems and practices prevent having any confidence that the master records are unduplicated across business lines or company locations.

The type of system often applied in these situations is an “identity capture” system.  The identity capture architecture can be seen as a hybrid of  merge/purge and identity resolution systems.  It supports identity management and persistent identifiers, but without starting with a preloaded set of identities.  In my next post, we’ll delve deeper into the identity capture process.

Is MDM Dead?

Wednesday, March 3rd, 2010

By Mike Shultz, Infoglide Software CEO

Andrew White of Gartner recently posed a question about whether master data management (MDM) is dead. He didn’t actually suggest that the demise of master data management is imminent. He was challenging whether our current terminology adequately clarifies the current reality about MDM and associated product areas.

Certainly the terms describing many markets and types of products are being associated with MDM. Jackie Roberts of DATAForge pointed out that the definition of MDM now seems to include “data integrity, data quality, entity resolution, matching, data integration, governance, metrics and analysis.”

While entity resolution was mentioned in her list, our obsessive focus on entity resolution (aka identity resolution) leads to the conclusion that, rather than being subsumed, its role is growing. Wayne Eckerson at TDWI seems to agree that identity resolution is a critical component of the recent MDM acquisitions. In his post about the acquisitions by Informatica and IBM of Siperian and Initiate Systems, respectively, he described the two transactions this way:

“You could say that Siperian is mostly MDM, but with identity resolution and other capabilities, whereas Initiate is mostly about identity resolution, but with MDM and other capabilities.”

Identity resolution is becoming an integral part of many product areas. Within MDM itself, creating a single-entity view is best done with an identity resolution engine. Data mining is greatly enhanced by the addition of entity resolution. Dan Power of Hub Solution Designs wrote about how key identity resolution is to data matching. We’ve talked about how social CRM can resolve identities of individuals across multiple disparate data sources using identity resolution, as well as “rationalize multiple variations and errors and anomalies that block finding existing customers within their systems”.

Although identity resolution technology has been years in the making, it has only recently risen into the consciousness of most analysts and customers. Because of its ability to bring enhanced clarity to ambiguous data, advanced identity resolution is now beginning to have a significant impact across many data-centered disciplines.

Identity Resolution Daily Links 2010-03-01

Monday, March 1st, 2010

By the Infoglide Team

IT-Director.com: The Informatica Event

[Philip Howard] “To begin with, the company talked about its acquisition of Siperian. I have already commented on this but one point that emerged at the conference was the way that Informatica describes Siperian as infrastructure MDM as opposed to application MDM. This is a hitherto unrecognised distinction (with respect to terminology) in the MDM market. Informatica distinguishes the former from the latter by saying that infrastructure MDM is domain and data model independent.”

Workforce Management: Medical Clinic Owners Plead No Contest to $60 Million Workers’ Compensation Fraud

“Investigators alleged that the pair purchased thousands of workers’ compensation client referrals from an attorney television advertising service. Clients were then sent to doctors who had a relationship with Premier, which would handle billing and collection work in return for a 50 percent fee for money they collected. Clients were then sent to attorneys who had a business relationship with Fish and Bacino, investigators allege. ‘Getting kickbacks for referring medical payments is illegal and drives up the costs in the system,’ California Insurance Commissioner Steve Poizner said in a statement.”

SignalScape: DC Police Chief Cathy Lanier Describes How Technology Is Changing Police Work in the Capitol

“The MPD also established a fusion center, which is responsible for the national capitol region. From a homeland security perspective, Chief Lanier said that the center collects and stores crime and terror alerts into a data warehouse.”

Injured Workers’ Law Firm Blog: Insurance Fraud Is a Huge Crime

“The fraudulent claims that can be made through insurance companies are categorized as being soft or hard. Soft fraud is the most common type of fraud and usually takes place when someone exaggerates a claim being made. Hard fraud takes place when someone deliberately plans a deceptive act such as a collision or the theft of their vehicle.”

Identity Resolution Daily Links 2010-02-09

Tuesday, February 9th, 2010

By the Infoglide Team

ovum: Informatica finally plugs MDM gap

MDM now creates another competitive front for Informatica against rivals and complicates some partial relationships - notably Oracle, which includes Informatica’s identity resolution software as part of its Siebel Universal Customer Master (UCM) MDM engine, as well as some parts of its data quality software. Informatica also has OEM relationships with IBM and DataFlux for address cleansing that might need revisiting.”

ovum: IBM acquires Initiate Systems to strengthen healthcare solutions

“Being acquired by a large player such as IBM also raises the question of whether Initiate will be able to unfold its potential under the large IBM umbrella, or whether it will wither and sink into oblivion alongside the multitude of applications in IBM’s broad portfolio. This will be a test of how well IBM integrates small but high-performing companies.”

TMCnet Healthcare Technology: ECRI Guides Hospitals on Electronic Health Record Implementation

“Electronic health records, or ‘EHRs,’ are the future of medical record keeping. The American Recovery and Reinvestment Act, or “ARRA,” includes incentive payments for hospitals that adopt an EHR, but the timetable for implementation is tight. To qualify for the full payment, hospitals will require proving ‘meaningful use’ by October 2012.”

2010 TDI Fraud Conference: Texas Workers’ Compensation Fraud

Workers’ comp fraud indicators… Frequent additions and cancellations of coverage, especially if several business entities appear to be owned or controlled by the same person or group”


Identity Resolution Daily Links 2010-02-05

Friday, February 5th, 2010

[Post from Infoglide] And Then There Were Two

“IBM announced today that it plans to buy MDM vendor Initiate Systems.  As hypothesized here in this blog last week, the move was not entirely unexpected, but on the heels of last week’s announcement by Informatica to purchase Siperian, it certainly creates yet another wave in the marketplace.  More moves are certain to take place as competing companies align – and realign – their Single Entity View (SEV) strategies.  The key to this realignment will be for current industry players to maximize their functionality beyond ‘playing with matches’.  That dated view of fuzzy matching is no longer enough.  Not for the large data quality vendors.  Certainly not for the customer.”

Information Week: Global CIO: IBM Data Strategy Is Flawed, Say Kalido And Informatica

“Noting that Initiate’s product is spefically designed to handle only certain types of data—customer data and product data—Kalido CEO Hewitt says, ‘Where they have struggled is in mastering multiple domains, even though they advertise their products as such. The problem is that as you add domains, the complexity of the data relationships expands exponentially. So one domain might have 100 relationships, two domains 300 relationships, 10 domains 3,000 relationships. So when one master data element changes, hundreds of relationships could change, which requires a governance process to manage it.’”


Columbia Daily Tribune: Networks advance child-trafficking investigation

“Watson called up a contact at the El Paso Intelligence Center (EPIC), a fusion center that combines intelligence from federal law enforcement and state and military sources. Watson also called a friend at U.S. Immigration and Customs Enforcement and asked him to prepare a ’serious incident report.’ ICE mobilized an officer specializing in human trafficking within minutes, Watson said.”

ITBusinessEdge: How Big Deals Affect MDM Competitors, Customers

“But the general upheaval in MDM aside, the IBM deal is interesting in another way. IBM has downplayed this as an MDM acquisition, positioning it more as buying into two verticals, health care and a government. Gartner’s Andrew White writes that at one point during the briefing, IBM was asked what the Initiate acquisition meant for MDM. IBM responded it reflects a ‘verticalization of MDM.’ White writes that’s good news for health care customers, but ‘troubling for IBM MDM product strategy.’”

Identity Resolution Daily Links 2009-09-28

Monday, September 28th, 2009

[Post from Infoglide] Social CRM, CDI, and Identity Resolution

“In her well-read book on CDI, Jill Dyché offers a definition of CDI that also seems to describe social CRM. Try reading her definition of CDI, replacing ‘CDI’ with ’social CRM’: CDI is a set of procedures, controls, skills and automation that standardize and integrate customer data originating from multiple sources.”

Concord Monitor: Don’t play games when giving your name

“What do they want? Your date of birth, your gender and your middle initial. This information will be relayed to the TSA, and the TSA will match the information against information maintained by the Terrorist Screening Center (an arm of the FBI that gathers and consolidates watch lists). The theory is that a 12-year-old boy named John X. Doe can more easily be separated from John Z. Doe, who happens to be a 37-year-old man with a history of making bombs, if additional information is collected during the booking process. Once TSA has cleared you, you’ll be issued a boarding pass.”

pressdemocrat.com: Achieving paperless health care

“Medical record-keeping, until recently, relied on rooms full of paper files that were easily misplaced and filled with hurried, handwritten entries that could be hard to read. Electronic records hold orderly, keyboard-entered data that never leaves a hard drive and have the potential to move seamlessly from a primary care provider’s office to an emergency room or specialist’s suite.”

ebizQ: MDM Becoming More Critical in Light of Cloud Computing

[David Linthicum] “We’re moving from complex federated on-premise systems, to complex federated on-premise and cloud-delivered systems.   Typically, we’re moving in these new directions without regard for an underlying strategy around MDM, or other data management issues for that matter.”

Homeland Security: I&A Reconceived: Defining a Homeland Security Intelligence Role

“There are currently 72 fusion centers up and running around the country (a substantial increase from 38 centers in 2006).  I&A has deployed 39 intelligence officers to fusion centers nationwide, with another five in pre-deployment training and nearly 20 in various stages of administrative processing.  I&A will deploy a total of 70 officers by the end of FY 2010, and will complete installation of the Homeland Secure Data Network (HSDN), which allows the federal government to share Secret-level intelligence and information with state and local partners, at all 72 fusion centers.”

Identity Resolution Daily Links 2009-9-25

Friday, September 25th, 2009

By the Infoglide Team

[Post from Infoglide] Social CRM, CDI, and Identity Resolution

“In her well-read book on CDI, Jill Dyché offers a definition of CDI that also seems to describe social CRM. Try reading her definition of CDI, replacing ‘CDI’ with ’social CRM’:  CDI is a set of procedures, controls, skills and automation that standardize and integrate customer data originating from multiple sources(1).”

Charleston Daily Mail: Former owner of WVa trucking company sentenced

“Leonard Cline formerly owned H & H Trucking. The insurance commissioner says he defrauded the old state workers’ compensation system of more than $500,000 in unpaid premiums, penalties and claims for benefits over about 10 years.”

WTVQ: Eight People Indicted for Insurance Fraud

“The US attorney’s office says the suspects intentionally damaged insured automobiles owned by other conspirators then filed claims.”

KansasCity.com: Push for electronic medical records picks up steam

“With or without health care reform this year, electronic medical records are picking up steam. Recent technological advances are easing the transition for doctors and hospitals, and there’s the little matter of the Health Information Technology for Economic and Clinical Health Act. The act, part of last spring’s stimulus package, included billions of dollars to ‘advance the use of health information technology.’ There’s plenty of advancing to do, with one group estimating that less than half the hospitals and only one in five physicians are equipped to fully use electronic records. ‘The United States is far more advanced in grocery store technology than in medical records technology,’ said Steve Lieber, president and chief executive officer of the Healthcare Information and Management Systems Society in Chicago.”

pnj.com: Man charged with workers’ comp fraud

“Florida Chief Financial Officer Alex Sink announced the arrest today in a news release. In the release, Sink said her Division of Insurance Fraud said Soto is charged with falsifying employment numbers with the intent of avoiding higher workers’ compensation premium payments.”

Federal News Radio: Update: Identity management in the Obama administration

“The alphabet soup of identity management programs from the Bush administration — HSPD-12, TWIC, Real ID, and many more — have gotten little attention publicly during the first nine months of the Obama presidency. But that doesn’t mean identity management has been ignored totally, says one senior administration official.”

London Evening Standard: Lloyd’s chief warns of more insurance fraud

“Lloyd’s of London’s chief executive Richard Ward today warned the deep recession would increase the number of fraudulent claims being made against the insurance market.”

Computerworld: Laptop searches at airports infrequent, DHS privacy report says

“The U.S. Department of Homeland Security’s annual privacy report card revealed more details on the agency’s  controversial policy involving searches of electronic devices at U.S. borders. . . . For instance, numbers released in the report indicate that warrantless searches of electronic devices at U.S. borders are occurring less frequently than some privacy and civil rights advocates might have feared. Of the more than 144 million travelers that arrived at U.S. ports of entry between Oct. 1, 2008 and May 5, 2009, searches of electronic media were conducted on 1,947 of them, the DHS said.Of this number, 696 searches were performed on laptop computers, the DHS said. Even here, not all of the laptops received an ‘in-depth’ search of the device, the report states. A search sometimes may have been as simple as turning on a device to ensure that it was what it purported to be. U.S. Customs and Border Protection agents conducted ‘in-depth’ searches on 40 laptops, but the report did not describe what an in-depth search entailed. . . . The report chronicled similar efforts to monitor the privacy implications of a range of projects that privacy groups are also watching. Examples include  Einstein 2.0 network monitoring technology that improves the ability of federal agencies to detect and respond to threats, and the  Real ID identity credentialing program. The DHS’s terror watch list program, its numerous  data mining projects  and the secure flight initiative were also mentioned in the report.”

Social CRM, CDI, and Identity Resolution

Wednesday, September 23rd, 2009

By Robert Barker, Infoglide Senior VP & Chief Marketing Officer

In her well-read book on CDI, Jill Dyché offers a definition of CDI that also seems to describe social CRM. Try reading her definition of CDI, replacing “CDI” with “social CRM”:

CDI is a set of procedures, controls, skills and automation that standardize and integrate customer data originating from multiple sources(1).

In fact, Ray Wang of A Software Insider’s Point of View suggests that social CRM initiatives could be more effective by leveraging MDM technology. In a recent post he listed key questions that social CRM and other relationship management initiatives like CDI have to answer:

1.    Do we know the identity of the individual?
2.    Can we tell if there are any apparent and potential relationships?
3.    Are they advocates or detractors?
4.    How do we know whether or not we have a false positive?
5.    What products and services have been purchased in the past?
6.    Have we assessed how much credit risk we can be exposed to?
7.    What pricing and entitlements are customers eligible for?

So how exactly can social CRM systems resolve identities of individuals across multiple disparate data sources? How can they rationalize multiple variations and errors and anomalies that block finding existing customers within their systems?

The obvious answer is identity resolution. We highlighted in an earlier post that Dyché declared that identity resolution supports and enhances five of the eight core MDM functions enumerated in her book with Evan Levy. Similarly, identity resolution is critical in accurately answering key questions about identity in social CRM.

Ray’s list of questions can be divided into two sets. Accurately answering the first set related to identity and relationships (questions 1, 2, and 4) is critical to answering the rest of the questions. If we blow it on identity, it is impossible to make sense of social CRM data.

Social media marketing and social CRM are becoming more and more mainstream. If you want to get more familiar with social media marketing and social CRM, Paul Gillin’s recent book is a great way to get started.

If you’re already familiar and want to comment or take issue with this post, let us hear from you.

(1)Dyché, Jill and Levy, Evan. Customer Data Integration: Reaching a Single Version of the Truth. John Wiley & Sons, Inc. 2006. Page 274.

Identity Resolution Daily Links 2009-08-14

Friday, August 14th, 2009

[Post from Infoglide] Vetting Sharks and Whales

“If you’re not in the casino industry, the title of this post may be meaningless, but for casino managers, “sharks” are the bad guys and “whales” are the good guys. Sharks are people who try to defraud the casino through illegal activities, while whales are the high rollers who are apt to win $20,000 one trip and lost $25,000 the next. If there’s any environment where you’d be motivated as a businessperson to know as much as you can about who you’re dealing with, it’s a casino.”

DATAWARE HOUSING: Business Intelligence and Identity Recognition—IBM’s Entity Analytics

“This article will define master data management (MDM) and explain how customer data integration (CDI) fits within MDM’s framework. Additionally, this article will provide an understanding of how MDM and CDI differ from entity analytics, outline their practical uses, and discuss how organizations can leverage their benefits.”

Workers’Comp Kit Blog: Failure to Pay Workers Compensation Premiums

“A New York asbestos  contractor failed to pay $1.6 Million in workers’ compensation premiums and will serve four years in prison. Upon his release he will be deported to his home country as he is an illegal immigrant… He repeatedly changed the name of his company.”

The TSA Blog: Secure Flight Q&A II

“Each one of these layers alone is capable of stopping a terrorist attack. In combination their security value is multiplied, creating a much stronger, formidable system. A terrorist who has to overcome multiple security layers in order to carry out an attack is more likely to be pre-empted, deterred, or to fail during the attempt.”


Bad Behavior has blocked 1166 access attempts in the last 7 days.

Close
E-mail It
Portfolio Strategy News The Direct Marketing Voice