HOME

Archive for the ‘Entity Resolution and Analysis’ Category

Entity Identity Management

Friday, January 14th, 2011

By John Talburt, PhD, CDMP, Director, UALR Laboratory for Advanced Research in Entity Resolution and Information Quality (ERIQ)

First, let me wish everyone a Happy and Prosperous New Year.  Also, since my last post, my book Entity Resolution and Information Quality has been published and is now available from Morgan Kaufmann Publishing (http://mkp.com/news/entity-resolution-and-information-quality).

What is entity identity management? It simply means that an ER system can store and maintain a record of identity information that persists over time.  Entity identity management is essential for an ER engine to operate in identity resolution or identity capture mode and for it to maintain persistent entity identifiers.

As you may recall from previous discussions, an identity resolution ER system starts with a set of known (asserted) identities and attempts to determine if a given entity reference refers to one of these known entities.  On the other hand, an identity capture ER system starts with a blank slate and tries to construct an identity based on the (equivalent) references it processes.

Two important concepts here bear further discussion.  One is the structure for representing the identity of an entity, and the second and somewhat more philosophical question is, what constitutes entity identity.

There are two commonly used approaches to representing identity in ER systems – one is an attribute-level structure sometimes called a “merge identity” and the other is a reference-level structure sometimes called a “cluster identity.”  The difference between a merge identity and a cluster identity can be illustrated by a simple example.

Suppose we have a system where entity references have three attributes A, B, and C, and that we are given two specific entity references R1=(a1, b1, c1) and R2=(a2, b2, c1), where a1 and a2 are values for attribute A, b1 and b2 values for attribute B, and c1 a value for attribute C.  Finally assume that references R1 and R2 are determined to be equivalent references (i.e. references to the same real-world entity).  In the merge identity approach, the entity identity EM referenced by R1 and R2 would be represented as

EM=[A:{a1, a2}, B:{b1, b2}, C:{c1}]

Meaning that for identity EM the A attribute can take on either the value a1 or a2, the B attributes can take on the value b1 or b2, and the C attribute the value c1.  In a merge identity the binding between the values a1 and b1 that was expressed by their co-occurrence in the reference R1 is lost.  Similarly the binding between a2 and b2 expressed by R2 is no longer present in EM.

In a cluster identity structure, the original reference binding between attribute values is preserved.  In the cluster identity approach, the entity identity EC referenced by R1 and R2 would be represented as

EC=[(A:a1, B:b2, C:c1), (A:a2, B:b2, C:c1)]

Thus, for identity EC the attributes A, B, and C can only take on the permutations given by the original references R1 and R2. There are advantages and disadvantages to both approaches, but most significantly they can lead to different resolutions for the same set of references.

To illustrate, let’s continue with the preceding example by supposing that the systems using the merge identity and the cluster identity both use the same two resolution rules.  Rule 1 is that the two references are considered equivalent if they agree (exact match) on Attribute C.  Rule 2 is that they are equivalent if they agree (exact match) on both Attributes A and B.

Now suppose that each system processes a third entity reference R3=(a1, b2, c2).  Using the two rules just discussed, the merge identity system would resolve R3 as equivalent to the identity EM represented by references R1 and R2.  By Rule 1, R3 agrees with EM on attribute A and also attribute B.  On the other hand, R3 would not resolve to the identity EC in the cluster identity system.  R3 does not satisfy either Rule 1 or Rule 2 with respect to either of the references R1 and R2 that comprise the cluster identity EC.

Merge identities and cluster identities both represent valid, but different, approaches to identity management.  To some extent they also represent two different ways of thinking about entity identity.  I plan to discuss the concept of the entity identity further in the next post.

Identity Resolution Daily Links 2011-01-11

Tuesday, January 11th, 2011

By the Infoglide Team

BND.com: Insurance fraud investigators begin probe into workers’ comp claims at Menard

“A total of 389 guards and other workers have filed more than 500 claims, including about 290 still pending. About 230 of these claimed injury for the underlying cause of ‘repetitive trauma,’ including carpal tunnel syndrome, an injury of the wrist. The prison employs about 760 workers, of which 567 are guards. ‘The Department of Insurance is investigating recent questions raised in connection with workers’ compensation claims filed against the state of Illinois at the Menard Correctional Center,’ department spokesman Louis Pukelis said Tuesday in a written statement.”

HSToday: Fusion Centers: Tough Tightrope 

“As states and localities have put up fusion centers designed precisely to overcome this, however, they’ve had to face a different challenge: ensuring not only the quantity but the quality of information they collect and report. In candid conversations with Homeland Security Today, leading privacy advocates, scholars and state law enforcement and federal officials addressed some of the key facets of this challenge, as well as steps that can be taken to ensure that fusion centers live up to their full potential as a counterterrorism tool.”

StarNewsOnline: North Carolina collects big from Medicaid fraudsters

“North Carolina’s Medicaid fraud investigators pulled in millions last year through dozens of cases of fraud and patient abuse, the state’s attorney general’s office reported Monday. The office’s Medicaid Investigations Unit prosecuted 22 criminal convictions and 18 civil settlements, recovering $53.5 million, during the federal fiscal year that ended Sept. 30, according to a press release from N.C. Attorney General Roy Cooper.”

ReadWriteWeb: What Cloud Computing Means For Small Businesses

“Needless to say, it’s a huge deal. Gartner recently put cloud computing at the top of its list of top strategic technologies for 2011 and it’s far from the only expert extolling the glory of the Web-hosted software and infrastructure. For small businesses, the significance of this primarily comes down to cost. In many cases, using cloud-based infrastructure is cheaper than running and maintaining one’s own physical servers.”

Identity Resolution Daily Links 2011-01-09

Sunday, January 9th, 2011

[Post from Infoglide] You Can’t Handle the Truth

“We have a new Congress and a new House majority leader as of this week’s swearing in ceremony. The current House majority party (R) plans to pass a bill to repeal the ‘Obamacare’ bill passed during the last session by the former House majority party (D).  Both parties make ‘fact based’ arguments about why killing or keeping the bill will reduce the deficit, yet both can’t be right. This isn’t a political blog, and I’m not going to take a side on this issue. What struck me is how often we use ‘facts’ to bolster our argument, with ‘facts’ defined as any real data that can be massaged or misinterpreted to suggest that our desired outcome appears to be the best one.”

The Washington Post: The Navigator: Does Secure Flight program mean more money for the airlines?

“When she arrived at the screening area, her husband’s incorrect name had already been checked against a list of potential security threats and had passed. Once passengers receive their boarding passes, the Secure Flight process is already complete, according to the TSA.”

LinkedIn: Data Quality of Gender / Sex Codes and the Impacts on Identity Data Matching

“Identity matching requires matching practitioners to decide which collection of fields best allows the correct matching of one record with another. The choice can be made from fields such as name, date of birth, address details, sex / gender, and even unique identifier values (when they exist). The use of sex / gender in that process might be seen in a slightly different light.”

nj.com: Bill would allow people to buy New Jersey Lottery tickets electronically

“Under the bill, the commission would establish procedures for the payment of winning tickets holders, which may include crediting amounts won to a player’s account or direct deposit into a player’s account at a financial institution… The commission would also be directed to ensure that the program includes security measures to protect against fraud, prevent wagering by underage persons and protect the personal and financial information of players.”

You Can’t Handle the Truth

Friday, January 7th, 2011

By Mike Shultz, Infoglide Software CEO

We have a new Congress and a new House majority leader as of this week’s swearing in ceremony. The current House majority party (R) plans to pass a bill to repeal the “Obamacare” bill passed during the last session by the former House majority party (D).  Both parties make “fact based” arguments about why killing or keeping the bill will reduce the deficit, yet both can’t be right.

This isn’t a political blog, and I’m not going to take a side on this issue. What struck me is how often we use “facts” to bolster our argument, with “facts” defined as any real data that can be massaged or misinterpreted to suggest that our desired outcome appears to be the best one. Actual data is often plentiful but our preference for one alternative keeps us from embracing and promoting reality.

So mishandling the truth when you have all the facts you need is a conscious action. What happens when you think you have the data needed to make a rational decision but you aren’t conscious of important information that could totally change your perception? For example, we may have access to what look like sufficient pieces of information to reach a rational business decision, such as a driver’s license with a photo ID or a computed credit score based on the person’s history of business transactions.

However, what’s often missing from the decision process is knowledge about relationships between people. Understanding these relationships – who’s who, who knows who, and other non-obvious connections – can increase beneficial decisions in a colossal way, yet awareness of these relationships is rarely incorporated into the process.

Since entity resolution can increase the accuracy of business processes by an order of magnitude, our New Year’s resolution here at Infoglide is to introduce as many people as possible to its benefits.

Happy New Year!

Identity Resolution Daily Links 2011-01-04

Tuesday, January 4th, 2011

By the Infoglide Software Team

ebiz: Relevance of Enterprise Architecture to Cloud Computing

“Strategic decisions about cloud computing should both draw upon and inform the EA. An organization must have a mature and well formed understanding of its architecture components (e.g., business processes, services, applications and data) to make meaningful decisions related to cloud computing, such as whether a move to the cloud is advantageous, what services most lend themselves to a cloud deployment, and what cloud deployment model (e.g., private, public) makes the most sense. There are three key roles for EA in facilitating cloud computing strategy and planning…”

WRAL.com: State roots out $53M in Medicaid fraud

“‘Medicaid cheaters rob taxpayers, hurt needy patients and push medical costs higher for all of us,’ Cooper said in a statement. ‘We’re stopping the waste and abuse and making violators pay.’ During the federal fiscal year that ended Sept. 30, the Medicaid Investigations Unit of the state Attorney General’s Office won 22 criminal convictions and negotiated 18 civil settlements worth $53.5 million.”

AvStop.com: All Airline Passengers Now Screened Against Government Watchlists

“Under Secure Flight, the Transportation Security Administration (TSA) prescreens passenger name, date of birth and gender against terrorist watchlists before passengers receive their boarding passes. In addition to facilitating secure travel for all passengers, the program helps prevent the misidentification of passengers who have names similar to individuals on government watchlists. Prior to Secure Flight, airlines held responsibility for checking passengers against watchlists.”

Looking Back on 2010

Thursday, December 23rd, 2010

Looking back over the past year, we’re especially grateful for relationships we’ve built and grown with customers and partners. Despite a less than stellar economy, 2010 provided another good year of growth for Infoglide Software.

2010 also proved to be a year of accelerated visibility for identity resolution and entity analytics in general. Industry consolidation moves (e.g., IBM’s March acquisition of Initiate Systems) demonstrate the critical importance of entity resolution in the new era of Big Data that has been developing.

For the readers of IdentityResolutionDaily, please accept our thanks for your continuing interest and participation in the exciting growth of this market. 2010 promises to be a year of continued change and challenge, and we look forward to the opportunities it offers.

We’ll start with new posts again in January.

Happy Holidays, and Best Wishes for a Wonderful 2011!

Mike Shultz
CEO, Infoglide Software

Identity Resolution Daily Links 2010-12-21

Tuesday, December 21st, 2010

By the Infoglide Software Team

Cliffview Pilot: Fighting crime with modern tools amid budget cuts

“Professional analysts and law enforcement officers from more than 15 different agencies including the FBI, ATF, DEA, US Marshall’s, Homeland Security, and state and county partners work from one large room to put out intelligence products in a truly collaborative environment that defines New Jersey’s fusion center. Products include crime mapping with predictive analysis to help local departments know when and where crimes are likely to occur in the future.”

Thomasville Times-Enterprise: Pharmacist fraud

“Morgan’s prison sentence will be followed by three years of supervised release. Morgan was ordered to pay restitution of $2,804,462. Morgan, 64, was convicted in October 2008, of 69 counts of health care fraud, following a two-week jury trial in Albany. Michael J. Moore, U.S. attorney for the Middle District of Georgia, said the indictment charged that for a period of several years ending in August 2007, Morgan, a registered pharmacist and the owner of Thrift Center Pharmacy in Camilla, executed a scheme to defraud the Georgia Medicaid program, which is jointly funded with state and federal funds.

FATF: Money Laundering Using Trusts and Company Service Providers [PDF]

“TCSPs are often involved in some way in the establishment and administration of most legal persons and arrangements; and accordingly in many jurisdictions they play a key role as the gatekeepers for the financial sector. This report provides a number of case studies which demonstrate that TCSPs have often been used, wittingly or unwittingly, in the conduct of money laundering activities.”

Identity Resolution Daily Links 2010-12-19

Sunday, December 19th, 2010

[Post from Infoglide] Big Data and Entity Resolution (part 2)

“We talked a week ago about the rapidly emerging market space called Big Data. One statistic that opened my eyes is Gartner’s prediction that the volume of new data generated by enterprises will grow by 650% in the next five years, and 80% of that will be unstructured data! The 451Group’s definition of Big Data describes a growing need for non-traditional processes that can treat massive amounts of data as a whole, thereby making it impossible to use many traditional tools and techniques.”

KXAN.com: A look inside new crime-fighting tool

InformationWeek Healthcare: Medicare Expands Analytic Tools To Fight Fraud

“These tools will integrate many of the agency’s pilot programs into the National Fraud Prevention Program and complement the work of the joint HHS and Department of Justice Health Care Fraud Prevention and Enforcement Action Team (HEAT). ‘Preventing fraud is more effective than the old ‘pay and chase’ model of fighting fraud after a sham provider has been paid and disappeared,” CMS administrator Donald Berwick said in a statement. “By using new predictive modeling analytic tools we are better able to expand our efforts to save the millions — and possibly billions — of dollars wasted on waste, fraud, and abuse.’”

InformationWeek: The Morphing IT Budget: It’s About More Than Opex

“Concerns that internal initiatives, and the CIO’s clout, will be gutted and most funds redirected to the cloud are overstated–for now. But we are at an inflection point: IT has money to spend, but it can’t be allocated using the same old budget process that’s kept us in a rut of dedicating a third or more of our resources to keeping the lights on. Business leaders have little patience for high-priced, long-term IT slogs. They’ve seen massive 18-month projects fail and experienced success with lightweight software-as-a-service offerings. CIOs must look at each expenditure and think, ‘Will this buy us flexibility and advance the business?’”

Big Data and Entity Resolution (part 2)

Thursday, December 16th, 2010

By Mike Betron, Infoglide Software Director of Marketing

We talked a week ago about the rapidly emerging market space called Big Data. One statistic that opened my eyes is Gartner’s prediction that the volume of new data generated by enterprises will grow by 650% in the next five years, and 80% of that will be unstructured data!

The 451Group’s definition of Big Data describes a growing need for non-traditional processes that can treat massive amounts of data as a whole, thereby making it impossible to use many traditional tools and techniques. Data is voluminous, complex, and very dynamic, yet business drivers demand that it be captured, managed, and harnessed to benefit the organization.

While entity resolution (ER) software is technologically mature, the evolving requirements for managing Big Data fit ER perfectly. For example, Infoglide’s Identity Resolution Engine (IRE) scales to meet Big Data requirements, and together with its flexibility in handling ambiguous unstructured and structured data with missing elements makes it an ideal solution for wringing value from the “data deluge” we increasingly find ourselves in.

One of the unique problems associated with Big Data is its multiple disparate sources that include email, Word documents, spreadsheets, and social media such as IM, newsfeeds, Facebook, and LinkedIn, just to name a few. Again, entity resolution systems like IRE now include support for multiple data forms and have created special ways to incorporate social media.

So, while Big Data presents a daunting challenge for many organizations, flexible technologies like entity resolution represent a key element of any solution.

Identity Resolution Daily Links 2010-12-14

Tuesday, December 14th, 2010

By the Infoglide Software Team

American Medical Software: Electronic Medical Records Use Over Majority

“Results from the National Ambulatory Medical Care Survey (NAMCS) show that between 2009 and 2010, the percentage of physicians reporting having an electronic medical record/electronic health record (EMR/EHR) system that meets the criteria of a basic system increased by 14% and a fully functional system increased by 46%.”

avanade: Global Survey: The Impact of Big Data

“In the global marketplace, businesses, suppliers and customers are creating and consuming vast amounts of information. Gartner predicts that enterprise data in all forms will grow 650 percent over the next five years. According to IDC, the world’s volume of data doubles every 18 months. This flood of data, often referred to as “information overload,” “data deluge” and “big data,” clearly creates a challenge for business leaders.”

Gartner: Technology Trends You Can’t Afford to Ignore

  1. Virtualization
  2. Data Deluge
  3. Energy and Green IT
  4. Complex Resource Tracking
  5. Consumerization and Social Software
  6. Unified Communications
  7. Mobile and Wireless
  8. System Density
  9. Mashups and Portals
  10. Cloud Computing

Bad Behavior has blocked 1320 access attempts in the last 7 days.

Close
E-mail It
Portfolio Strategy News The Direct Marketing Voice