HOME

Archive for the ‘Data Governance’ Category

Internal and External Views of Identity

Thursday, August 27th, 2009

By John Talburt, PhD, CDMP, Director, UALR Laboratory for Advanced Research in Entity Resolution and Information Quality (ERIQ)

In an earlier post, I stated my view that identity resolution and entity resolution are somewhat different processes.  In particular, I consider identity resolution as a special form of entity resolution in which entity references are resolved by comparing them to the characteristics of a given set of known entities.  Regardless of the approach, identity plays an important role in all forms of entity resolution.

The identity of an entity is a set of attributes and rules for comparing the attribute values that allow it to be distinguished from all other entities of the same type in a given context.  A key  feature is that identity is context-dependent, i.e., it depends upon the total set of entities under consideration.  For example, a common scheme for creating email addresses in an organization uses a person’s first two initials and last name, e.g. jrtalburt.  In a small organization, this is usually sufficient to make a unique address for each employee.  However, applying this in a much larger pool of users such as the yahoo.com or gmail.com domains quickly surfaces that these attributes are insufficient.

For a more relevant business example, consider the case of a customer, Mary Smith.  For simplicity, assume that the totality of her adult residential address history comprises:
1.    Mary Smith, 123 Oak St, Anytown, NY, 1998-06 to 2000-03
2.    Mary Jones, 234 Elm St, Anytown, NY, 2000-04 to 2002-11
3.    Mary Jones, 345 Pine St, Anytown, NY, 2002-12 to present

Despite having used 2 names and 3 addresses, these are all references to the same person. There are two ways to view the issue of identity as illustrated by this history.

One is to start with the identity based on vital statistics, e.g. Mary Smith, a female born on December 3, 1980, in Anytown, NY, to parents Robert and Susan Smith, then to follow that identity through its various representations of name and address as shown above.  This “internal view of identity” is the view of Mary Smith herself and might well be the view of a sibling or other close relative, someone with complete knowledge about her address history.  The internal view of identity represents a closed universe model in which all of the possible occupancy variants are known to the internal viewer (system) and any occupancy record not equivalent to one of the known variants must belong to some other identity.

On the other hand, an external view of identity is one in which some number of address records for a customer’s identity have been linked, but the viewer (system) does not know if it is the complete history.  Given another customer address record not equivalent to one of the records in the history, it must be determined if it does or does not belong to Mary’s history.

Suppose that a system has only the first two address records of Mary’s history.  In this case, the system’s knowledge of Mary’s identity would be incomplete.  It may be incomplete because either the third address record is not in the system (has not been acquired) or because the system hasn’t linked it to the first two records.  In the latter case, the system would assume that the third record is part of a different customer’s identity.  Even though an internal viewer would know that the third address record should also be part of the Mary’s complete history, the external viewer has not made that determination.

Conversely, an external viewer may assemble an inaccurate view of Mary’s history by linking the first two records of her address history to an address for a different Mary Smith.  These entity resolution failures, incomplete and inaccurate histories, are information quality dimensions and indicate why the areas of entity resolution and information quality are so closely related. (Several classes of failures were discussed in another recent post.)

In an external view, the identity of the customer is equivalent to the set of occupancy records that have been resolved (i.e. linked).  The known address records comprise the external viewer’s (or system’s) entire knowledge of the customer’s identity.  If additional occupancy records are acquired and are correctly determined to be for this same customer, then the system’s knowledge about this identity increases.

The external view of identity reflects the experience of a business or government agency using entity resolution tools and processes in an effort to link disparate records into a single view of a customer or agency client.  The “external view of identity” represents an open universe model because if the system is presented with a new occupancy record, it does not necessarily follow that the new records must be a part of a different identity.  It may or may not be part of an existing identity, something that the ER process must decide.

The major point to note is that an internal viewer is in a position to judge the quality of an external view.  With complete knowledge, the internal viewer can determine if any particular external viewer has omitted some records (completeness) or has linked records from different identities or failed to link records for the same identity (accuracy).

Along with Dr. Wang at MIT, I have introduced a quality metric in the form of an index for assessing the similarity of two identity resolutions.  In cases where one resolution represents an internal view (correct) and the other is an external view, the index provides a metric for entity resolution accuracy. I plan to explain this metric in my next post.

Identity Resolution Daily Links 2009-08-24

Monday, August 24th, 2009

By the Infoglide Team

CRMBuyer: The BI Outlook: A Bright Spot of Growth in a Gloomy Economy

“Investing in business intelligence is important for a company now more than ever, agreed Bill Barberg, president of Insightformation and an expert in Balanced Scorecard methodology. Sound business intelligence helps companies make fact-based decisions as they try to navigate in today’s stormy economy, he told CRM Buyer. “Business intelligence can help companies make much better decisions,’ he said.”

OCDQ Blog: Adventures in Data Profiling (Part 3)

“In Part 3, you will continue your adventures by using a combination of field values and field formats to begin your analysis of the following fields: Birth Date, Telephone Number and E-mail Address.”

SearchSOA.com: SOA with MDM prevents messaging confusion

“Increasingly, organizations are designing SOA into the MDM architecture from the beginning, says Dan Power, president and founder of consulting firm Hub Solution Designs Inc. in Hingham, Mass. This creates challenges in meshing the real-time realities with the need to keep the data accurate.”

iHealthBeat: Privacy and Security: Experts Focus on Legal Issues Surrounding EHR Use at AHIMA Summit

“Linda Kloss, AHIMA CEO, said many vendors have not focused on developing legally defensible EHR systems. In addition, health care providers have not created a demand for such functionality.”

Identity Resolution Daily Links 2009-07-31

Friday, July 31st, 2009

[Post from Infoglide] Data Finds Data in Real-Time Entity Resolution

“Jeff Jonas of IBM recently quoted from a chapter called “Data Finds Data”  that he co-wrote for a book entitled Beautiful Data: The Stories Behind Elegant Data Solutions, and I was impressed by how well this passage describes the effective use of entity resolution software (e.g., IRE 2.2)…”

IT-Director.com: GRC is not enough

[Philip Howard]”If you think about these different forms of risk, they can mostly be managed within existing GRC frameworks: business risk, data and IT governance and compliance cover five of these seven types of risk. But they don’t cover fraud or cyber attacks or similar security issues.”

SunSentinel.com: Roofer ducked $400,000 in worker’s comp premiums

“Investigators with the state’s Division of Insurance Fraud said Robert McDonald, owner of Gulfstream Roofing Inc., funneled $3 million in payroll through several fake companies between 2002 and 2006, claiming the money was being paid to insured subcontractors instead of his own workers.”

BNET Healthcare: What Can US Learn From European Health IT Experience?

“The three countries also use universal patient identification numbers in health care. This is much easier to do in Europe than it is in the U.S., where the mistrust of government is so high that the issue of having a single patient identifier number is no longer even under discussion. There’s also the small matter of our low EHR adoption rate, which is less than 20 percent for physicians and lower for hospitals. By contrast, most physicians in the three European countries are using some kind of EHR.”

Identity Resolution Daily Links 2009-07-27

Monday, July 27th, 2009

By the Infoglide Team

information management: Multidomain Master Data Management for Business Success

“All data that flows through an enterprise can be categorized into six different types: who, what, when, where, how and why. Master data is about who, what, when and where. ‘Who’ data is about the parties of interest that matter most to a business or organization including stakeholders, benefactors, customers, suppliers, owners, providers, partners, etc.”

HSToday: DHS Highlights Intelligence Improvements in Report Marking 9/11 Report Anniversary

“To date, 72 fusion centers have been designated throughout the country, with DHS having provided more than $340 million from fiscal years 2004-2009 to state and local governments to support these centers. DHS also deployed the Homeland Security Data Network to 29 fusion centers, which allows the federal government to share information and intelligence with states and provides fusion center staff access to the most current terrorism-related information.”

The Healthcare IT Guy: Guest Article: Why Doctors Hate Electronic Medical Records

“The fact is that doctors love high-tech. They have reason to hate EMRs but not computers and iPhones.”

DecisionStats: Interview Jim Harris Data Quality Expert OCDQ Blog

Jim Harris - ‘I know that Gartner has reported that 25% of critical data within large businesses is somehow inaccurate or incomplete and that 50% of implementations fail due to lack of attention to data quality issues.’”

Identity Resolution Daily Links 2009-07-24

Friday, July 24th, 2009

[Post from Infoglide] Entity Resolution as Data Mining

“In my last post, I suggested that entity resolution in the broadest sense (“Big ER”) really encompasses three activities.  The first is locating and collecting entity references from unstructured sources (entity extraction), the second is resolving and merging references to the same entity (“Little ER”), and the third is analyzing associations among entities.  Not every ER process involves all three activities.”

BeyeNETWORK: Some Perspectives on Quality

[Bill Inmon] “There are then very legitimate circumstances where incorrect data is best left in the database or data warehouse. Stated differently, there is no circumstance where correcting data or not correcting data is the right thing to do. In order to determine which approach is proper, the context of the corrections has to be known. Only then can it be determined whether correcting errors is the proper thing to do.”

Homeland Security Watch: How To Improve Homeland Security: Give the ODNI Oversight Responsibility for Fusion Centers

“To me, fusion centers are a fine example of Darwinian logic in homeland security.  There was no comprehensive national plan to create fusion centers.  In original intent, Founding-Fathers-federalism fashion, states and cities decided they were not getting the intelligence they wanted.  Arizona, Georgia, Illinois, New York and a handful of other jurisdictions took responsibility for processing - or “fusing” - their own intelligence.”

ITBusinessEdge: Master Data Management and the CIO’s Strategic Plan

“If we look at MDM as a collection of techniques providing enterprise-wide data requirements analysis and subsequent implementation of best practices in data management, then the savvy IT manager might cherry-pick from the tools offered by vendors to provide the optimal solution that unifies the view of critical data concepts while satisfying the data quality requirements imposed by a horizontal information solution.”

I, Cringely: Medical Records R Us

“So medical records are an area where IT could make us healthier and, if done correctly, ought to save lots of money, too.  What we need is some form of centralized medical record keeping that preserves patient privacy yet, at the same time, keeps us from shopping all over town for bogus Oxycontin prescriptions.”

Identity Resolution Daily Links 2009-07-10

Friday, July 10th, 2009

[Post from Infoglide] What’s the Data Quality Business Message?

“What’s it going to take to move the data quality space forward in the future? That’s the question recently addressed by Ted Friedman of Gartner as reported in an article in destinationCRM.com. He suggests that the real answer may be messaging.”

CiOZone: Master Data Management Ready For Prime Time

“As with many application areas, Microsoft’s sweeping move into MDM signals a mainstreaming of the field, according to Aaron Zornes, founder and chief research officer of the The MDM Institute, Burlingame, Calif. ‘As a practice, MDM has been going on in some industries since 1980s, but it’s only been formalized with a growing, purpose-built vendor base in recent years,’ he says. ‘In the time since the institute was founded in 2004, the industry has matured considerably.’”

data quality PRO: Data Quality Blog Roundup - June 2009 Edition

“Another marked increase in online publishing this month for the data quality sector. A smattering of new entrants means there is a steady flow of fresh ideas and insight in this months blog roundup.”

Government Security News: OPINION / Analyzing intelligence data: Matching information in foreign languages

“While the challenge is great, there is technology specifically designed to “connect the dots” among persons, places and things of interest. Called “entity resolution,” this technology is coming into the mainstream, specifically in light of the growing urgency to track down terrorists and stop terrorist threats before they happen.”

Life as a Healthcare CIO: International EHR Adoption

“The most widely implemented are England, Denmark, Netherlands, and certain regions of Spain which are close to 100%. Sweden, Norway are at 80% and behind and Germany/France are at 50%. The US is somewhere between 2 and 20%, depending on how you classify a comprehensive EHR.”

What’s the Data Quality Business Message?

Wednesday, July 8th, 2009

By Robert Barker, Infoglide Senior VP & Chief Marketing Officer

What’s it going to take to move the data quality space forward in the future? That’s the question recently addressed by Ted Friedman of Gartner as reported in an article in destinationCRM.com. He suggests that the real answer may be messaging.

“Vendors have done a reasonably poor job in that they could get better at articulating the true business value [of data quality solutions],” he says. The Gartner analyst notes that vendors tend to talk about functionality in terms of technological advances, rather than conveying how that technology actually supports the business infrastructure. Friedman also notes that, in general, vendors could get better at articulating how tools support initiatives such as information governance and regulatory compliance — two notable industry trends.

Ted is right that to call out vendors to improve our messaging. At Identity Resolution Daily, we get down into technical details fairly often. For example, our bloggers have talked about data matching, its relationship to identity resolution, critical requirements for identity resolution, and we’ve had a series on data quality. Professor John Talburt of UALR’s Center for Advanced Research in Entity Resolution and Information Quality (ERIQ) is a regular contributor who talks about technical definitions and issues surrounding these topic areas.

That’s not to say that business issues around entity resolution and information quality have been ignored. Real world problems like lottery retailer fraud have been a frequent topic, as has organized retail crime. Another business problem we’ve talked about is employers trying to cheat workers compensation laws, and we’ve actually discussed regulatory compliance (OK, so we did get a bit technical on that one).

A huge issue related to information governance is preserving the rights of individual privacy. Because of our involvement in TSA’s Secure Flight program, we’ve written about this issue repeatedly since Identity Resolution was created. A recent post captures the essence of the issue.

So at best, I’d have to say we get a C+ or a B- on our messaging. With our upcoming release of IRE 2.2, we’ll make every effort to respond to Ted’s constructive criticism of the data quality space.

Identity Resolution Daily Links 2009-07-06

Monday, July 6th, 2009

[Post from Infoglide] Entity Extraction

“In my last post I discussed my definitions for entity resolution, entity identification, entity disambiguation, and anonymous entity resolution.  (And I reiterate that these are just my definitions and are not binding on anyone except possibly my students.) Let’s go back to the overarching term entity resolution (ER).  In its broadest sense, I see ER as encompassing three major activities…”

Article Marketing: Electronic Medical Records – Are There Reasons For Low Implementation?

“Doctors may soon have little choice but to implement computerized medical billing and patient record systems. HIPAA’s scope recently expanded to health care providers with less than $5 million in revenue.”

OCDQ Blog: Worthy Data Quality Whitepapers (Part 1)

“It is about the data – the quality of the data… This is the subtitle of two brief but informative data quality whitepapers freely available (no registration required) from the Electronic Commerce Code Management Association (ECCMA): Transparency and Data Portability.”

SmartData Collective: Moving BI Into The Cloud Part 1

“Over the last year I have been reading a lot about cloud computing and trying to predict how this can be used in business intelligence and analytics. I believe that the cloud is becoming relevant for a number of reasons…”

Identity Resolution Daily Links 2009-06-22

Monday, June 22nd, 2009

By the Infoglide Team

intelligent enterprise: They Better Get This MDM Program Right

“As reported in The New York Times and on the TSA Web site, the Secure Flight program will improve upon current practices in matching passenger identities to watch lists in many ways. At first glance, this appears to be a well thought-out program that conforms to several basic tenets of Master Data Management (in bold below), in this case for the ‘Customer’ entity.”

EHRWMS: Georgia’s Best EMR Used By Three of Top Ten Pediatricians

“Of approximately 100 respondents, 28 used an EMR, of which 40% used the EncounterPRO Pediatric EMR. There were only three other EMRs used more than once, and they were used by only 10%, 7%, and 7% of the survey respondents respectively.”

Government Executive: Enforcement agencies boost cooperation on drug investigations

“In addition, ICE agents for the first time will fully participate in the Organized Crime Drug Enforcement Task Force Fusion Center. The center allows participating federal, state and local law enforcement agencies, including DEA and the FBI, to share information and analytical resources to enhance their overall investigative capacity.”

SmartData Collective: The Data-Information Continuum

“Data could be considered a constant while information is a variable that redefines data for each specific use. Data is not truly a constant since it is constantly changing. However, information is still derived from data and many different derivations can be performed while data is in the same state (i.e. before it changes again).”

The Growing Role of Identity Resolution in MDM

Wednesday, May 20th, 2009

By Dan Power, President and Founder, Hub Solution Designs

There definitely seems to be a trend lately with small companies in the master data management (MDM) and data quality space being purchased (as in the asset acquisition of Exeros by IBM) or partnering with larger firms (such as Silver Creek Systems’ OEM relationship with Oracle).

I think this is a good thing. Using the classic “build, buy or ally” strategy, it isn’t surprising that sometimes companies will conclude that it’s faster and/or cheaper to buy a technology, or partner with another company that has that technology, rather than build it themselves internally.

A lot of companies do tend to suffer from the “not invented here” syndrome, where anything not developed inside their four walls tends to be regarded with disdain. But that tendency leads to a much slower pace of innovation. In very competitive industries like enterprise software, getting there faster is a very definite advantage.

Since I’ve been working with the identity resolution experts at Infoglide, I’ve become much more aware of the role identity resolution technology plays in our daily lives. Every time you get on an airplane, file an insurance claim, apply for a job / mortgage / credit card, or even shop in a retail store or on a web site, your identity is probably being evaluated by an Identity Resolution Engine.

A lot of people in the MDM space refer to this as “matching”, but there’s considerably more to Identity Resolution than the sophisticated pattern matching that most MDM hub platforms use today. The more robust form – Identity Resolution – is mostly used currently for sophisticated applications like terrorist screening and anti-money laundering, where big consequences or big dollar amounts are at stake.

But that technology is gradually filtering down to more routine commercial applications like master data management for customers. The large MDM vendors like Oracle, IBM and SAP – and the smaller vendors like Siperian, Initiate Systems and D&B/Purisma – will follow the “build, buy or ally” pattern, with some opting to create their own more sophisticated Entity Resolution capabilities, some buying smaller firms who already have those advanced products, or perhaps partnering as a middle ground between building and buying.

Either way, this trend is good both for specialized companies like Infoglide and for the general public. We’ll all be a little safer getting on a plane, a little less likely to suffer from identity theft or confusion, and perhaps save a little money through reduced incidence of various types of fraud.

Full-fledged Identity Resolution is a capability that most MDM hubs should plan on adding in the next revision cycle or two, as MDM customers become more discriminating and more demanding of their hub’s ability to identify individuals and businesses from an ever-growing stream of data.

Dan Power is president of Hub Solution Designs, a consulting firm specializing in master data management and data governance. He writes a popular blog and a column for Information Management magazine, speaks frequently at technology conferences, and regularly advises clients on developing & implementing high impact MDM and data governance strategies.


Bad Behavior has blocked 1166 access attempts in the last 7 days.

Close
E-mail It
Portfolio Strategy News The Direct Marketing Voice