HOME

Archive for April, 2010

Architectures for Entity Resolution-Part 3

Thursday, April 29th, 2010

By John Talburt, PhD, CDMP, Director, UALR Laboratory for Advanced Research in Entity Resolution and Information Quality (ERIQ)

In the last two posts we reviewed the basic architectures used to implement entity resolution (ER) systems.  We started with the most basic systems, the merge/purge and heterogeneous join processes. In the last post, we discussed identity resolution systems, the first of two types of ER architectures that perform identity management.  By retaining identity information, these systems are able to recognize the same identity over time and to assign it a persistent identifier.

The distinguishing characteristic of identity resolution systems is that they start with a given set of identities to which input references are resolved.  An example would be a customer recognition system where the starting identities are the customers of the business. However, there are many situations where the identities are not necessarily known in advance. In some cases, it is not because the entities are unknown, but simply that they are not organized in a way that can be easily pre-loaded.

For example, two companies merge.  Each company has its own customer database, but the customers are identified in different ways.  The same situation can arise in one company through poor systems and practices, resulting in no confidence that the master records are not duplicated across business lines or company locations.

The type of system often used to address these situations is called an “ identity capture” system. Identity capture systems resemble a cross between a “smart” merge/purge system and an identity resolution system.  They support identity management and persistent identifiers, but start without a preloaded set of identities.

Here is how they work.  As references are resolved, the system saves what it has learned rather than discarding it, so identities are built on the fly as references are processed.  For example, suppose an elementary school has 10 years of enrollment records, i.e. for each year, it has records of all the students where were in grades 1 through 6.  Each year some students leave grade 6 for middle school or transfer from any of the grades to another school.  At the same time some new students enter at first grade or transfer into upper grades from another school.  In an identity capture system, the identity master starts out empty.  When the first enrollment file is processed, almost all of the enrollment records processed will represent new identities.  The identity characteristics in each record are captured and stored to create a new identity master record.

When the next year of enrollment is processed, the system should recognize students re-enrolling from the previous year, so it only captures as new identities those students entering the school that year.  However, in many identity capture systems, the process of capture goes beyond simply adding new identities and can also be used to enhance existing identities.  For example, suppose that from the first year enrollment,  an identity was created for student Edgardo Mendez with a 7/12/2000 date-of-birth (DOB).  Then in the next year of enrollment the system is presented with the record of Eddie Mendez with a 7/12/2000 DOB.  Based on the resolution rules (including conflict rules), the embedded identity resolution process may decide that these are both references to the same student.  If that were the case, it would enhance the identity master record to include the second year first name variant, so that going forward it would recognize the same identity with a first name of either Edgardo or Eddie.

The advantage of collecting identity information on the fly is offset to some extent by the problem of splits and consolidations.  The order that references are processed can sometimes affect the system’s identity decisions.  Information that connects two references may come after the two references have created separate identity master records (false negative).  This requires the two identity master records to be consolidated or merged.  Although the master records can be corrected, it defeats the idea of the persistent identifier in that many previously processed references could have been assigned the identifier associated with the retired master record while others were assigned the identifier of the surviving master record.

Splits are the reverse situation where two references to the same entity are mistakenly used to create a single master identity record (false positive).  Splits are harder to correct than consolidations, and for this reason, ER systems that manage identity tend to err on the side of false negatives than false positives.

In the next post we will discuss the four most common strategies linking references.

Identity Resolution Daily Links 2010-04-27

Tuesday, April 27th, 2010

 Community Impact: Controversial fusion center moves forward

“The center is not yet operating, but by as early as midsummer it will be one of more than 70 working fusion centers across the country collecting data from financial, health care, retail, energy, electronic and education sectors.”

austin-fusion-center.jpg

The News Tribune: Sumner clerk arrested in undercover Lottery sting

“The ticket indicated it was a winner, and that the person who possessed it was due $20,000, Coe said. But an employee told the compliance officer the ticket was worth a $50 payout, Coe said. The woman kept the ticket after giving the Lottery employee $50, she said. On Monday, the store employee and a companion tried to claim the $20,000 ticket at Washington’s Lottery headquarters on Fourth Avenue in Olympia, according to Coe. Lottery officials called Olympia police, and officers arrested the two women.”

HSToday: Secure Flight On-Budget and On-Time

“The Transportation Security Administration (TSA) generally has fulfilled congressional requirements for bringing the Secure Flight program in for a landing on budget and on schedule, congressional investigators reported Tuesday.”

Identity Resolution Daily Links 2010-04-25

Sunday, April 25th, 2010

[Post from Infoglide] Solving Medicare Fraud

“Health care reform, aka health insurance reform, is a broad and complex issue with many “moving parts” in need of repair.  With this in mind, I would like to look at one area of health care that has received some attention in the press — Medicare fraud. It’s been estimated that 10% of Medicare’s spend is waste due to fraud.”

Houston: A target-rich environment for big-dollar Medicare, Medicaid fraud

“The dollar amounts are staggering: allegations that people who live in some of Houston’s finest neighborhoods owe some of their wealth to bilking the federal Medicare and Medicaid programs out of tens of millions of dollars. ‘What was off the charts was Miami, Los Angeles and Houston,’ said James Buchanan, who heads up the white-collar fraud unit at the U.S. Attorney’s Office in Houston.”

Government Security News: Using good technology to find bad people

Entity resolution technology is able to handle hundreds of millions of records in sub-second response times and provides unsurpassed matching and linking technology that identifies and resolves information routinely, even when there is duplicate, fragmented, incomplete or dirty data. By disambiguating distributed data sets into resolved entities of high confidence, government agencies charged with keeping us safe can help detect and defeat threats before they occur.”

Health Care News: Pennsylvania Medicaid Waste Estimated at $1/4 Billion a Year

“An audit by Pennsylvania Auditor General Jack Wagner found improper Medicaid eligibility determinations on nearly 2,000 randomly selected Medicaid applications between 2005 and 2009. The audit found a 14.7 percent fraud rate, three times the rate anticipated by the Pennsylvania Department of Welfare (DPW)… ‘The audit is looking only at a sample but determining the percentage of fraudulent payments—from which we can extrapolate that upwards of $1 billion was spent in this period on those who were not eligible for Medicaid.’”

Solving Medicare Fraud

Thursday, April 22nd, 2010

By Haroon Alvi, CEO Southlake Medical Supplies, Inc.

Health care reform, aka health insurance reform, is a broad and complex issue with many “moving parts” in need of repair.  With this in mind, I would like to look at one area of health care that has received some attention in the press — Medicare fraud.

It’s been estimated that 10% of Medicare’s spend is waste due to fraud.  In 2009, the Congressional Budget Office (CBO) reported that Medicare was paying $504 billion annually in health care benefits (see pie chart below).  This would imply that there is $50 billion in Medicare fraud.

haroon-pie-chart-042210-v2.jpg

Note: Does not include administrative expenses such as spending to administer the Medicare drug benefit and the Medicare Advantage Program.              SOURCE: CBO Medicare Baseline, March 2009

The problem with detecting or even measuring Medicare fraud is that the existing tools are limited in their scope and capabilities.  They typically look for patterns in billing, and to be effective, the patterns tend to be specific to a narrow segment of the pie above.  Yet, Medicare fraud is a broad problem that spans many types of providers who bill for a vast variety of services and goods.  The pie chart above highlights the major types of providers as well as the four Medicare programs (Part A, B, C and D).

The process is further complicated by Medicare’s complex fee schedules and billing requirements that are unique to each type of provider such as a hospital, a pharmacy, a physician, a home health agency or a durable medical equipment supplier.  And fraud is not just committed by providers, but is also committed by beneficiaries.

And finally, throw in a critical requirement to minimize “false positives” to avoid damaging Medicare’s credibility and its relationship to the community of providers and population of beneficiaries.  Given the vast complexity of this issue, it’s clear that solving Medicare fraud will require a systematic approach that utilizes multiple tools versus a “one size fits all” or a brute force attack of the problem.

Identity resolution is a tool that is relatively new to the Medicare fraud space.  Identity resolution adds another dimension to current pattern matching tools, and should improve our ability to find Medicare fraud.

Identity Resolution Daily Links 2010-04-20

Tuesday, April 20th, 2010

By the Infoglide Team

The Miami Herald: Medicare’s fraud hot line begins to root out billing scams

“By September, Feliberto Ramos was arrested on fraud charges accusing him and his company, Miracle Group Rehabilitation Center, of falsely billing the federal healthcare program $3.1 million over just three months. Medicare paid Ramos $1.9 million for rehab services never provided to angry beneficiaries.”

OCDQ:Data, data everywhere, but where is data quality?

“Data matters because everything—and not just the rows in our relational databases and spreadsheets, but also our status updates from Facebook and Twitter, our blog posts, and even most of our daily conversations—is data. The growing challenge is can we extract meaningful insights from these vast and veritable oceans of unrelenting data volumes, and use those insights to make better decisions in near real-time in order to positively impact the various aspects of our lives.”

eBusiness Tweets: Microsoft entering the electronic medical record (EMR) software market

“You would think Microsoft would be in such a promising industry, but you won’t find a Microsoft EHR available. The primary reason why is that EHRs are highly specialized, and Microsoft’s main products (Dynamics, CRM, and SharePoint) don’t come anywhere near the needs of physician practices. It would be very difficult for Microsoft to build an EHR from scratch and introduce it to the market. so what should Microsoft do to enter the industry? Acquire a current player.”

Identity Resolution Daily Links 2010-04-17

Saturday, April 17th, 2010

[Post from Infoglide] Medicaid Fraud In the News

“Medicaid is in the news almost daily, as states take steps to crack down on fraudulent claims. For example, Maryland is in the process of passing stiffer laws to support its efforts to reduce the 5-10% of fraudulent claims made that draw from the $6.2 billion it pays out annually. Other states like Texas, New Jersey, New York, Florida, and Missouri  as well as the District of Columbia are enacting new laws and supporting stronger efforts to catch fraudulent claims. How are fraudsters caught?”

LegislativeGazette.com: GOP outlines plans to tackle Medicaid fraud

Sen. George Winner Jr., R-Elmira, said the Medicaid system was clearly out of control, but there are some possibilities that are coming to light, such as the use of proper data analysis technology that would give localities and the state the ability to get a handle on the system. ‘The technology is there,’ said Winner, ‘and the technology can be utilized to bring this monster under control, saving taxpayers millions if not billions of dollars.’”

Federal Computer Week: GAO on board with Secure Flight plans

TSA was working with 74 U.S. air carriers and 19 foreign carriers on the program as of March 31, according to the report. Secure Flight had so far assumed the watch-list matching function for 39 U.S. air carriers for domestic flights only, and for 5 foreign air carriers for international flights departing to and from the United States, according to TSA, GAO said.”

The Miami Herald: Miami-Dade clinic operator pleads guilty to Medicare fraud

“Between 2005 and 2007, he and his partners raked in $22 million from the taxpayer-funded healthcare program. Marquez, a Miami-Dade resident who could face more than 20 years in prison, ranks as a big spender among the hundreds of local Medicare-licensed operators accused of ripping off the government program for the elderly and disabled.”

Medicaid Fraud In the News

Tuesday, April 13th, 2010

By Mike Betron, Infoglide Director of Marketing

Medicaid is in the news almost daily, as states take steps to crack down on fraudulent claims. For example, Maryland is in the process of passing stiffer laws to support its efforts to reduce the 5-10% of fraudulent claims made that draw from the $6.2 billion it pays out annually. Other states like Texas, New Jersey, New York, Florida, and Missouri as well as the District of Columbia are enacting new laws and supporting stronger efforts to catch fraudulent claims. How are fraudsters caught?

In some cases, “whistle blowers” alert officials to those making fraudulent claims. Someone working at a clinic, hospital, or other business finds out about the fraud being perpetrated and reports it. Other cases are caught by governmental anti-fraud units that find suspicious patterns like one provider in North Carolina who bilked the system out of $45,000 for “miscellaneous” prosthetic limbs.

Increasingly, enforcers are discovering that they already have access to data that, properly analyzed, can highlight wrongdoing. Neil Versel recently wrote in FierceHealthIT that “the Obama administration is pushing high-tech in its pursuit of fraudsters, sending out ‘bounty hunter auditors’ to find waste, fraud and abuse in Medicare and Medicaid.” The “bounty hunters” are planned to be private auditors who would electronically analyze billing data for signs of fraud. They will be paid with a portion of the funds that the government recovers.

It’s not hard to predict that identity resolution technology will become a key weapon in this war against healthcare fraud. We’ve decided to start covering the latest efforts of the U.S. and state government agencies to stop the fraud and restore the funds to government coffers.

Identity Resolution Daily Links 2010-04-11

Saturday, April 10th, 2010

By the Infoglide Team

Liliendahl on Data Quality: What is a best-in-class match engine?

“I don’t think anyone knows what product is the best match engine, because I don’t think that all match engines have been benchmarked with a representative set of data.”

ITBusinessEdge: SOA Spending on the Rise. Surprised? Here’s Why

“It’s important to realize that SOA is really a rather loose collection of best practices. It’s not necessarily a well-defined list where you have some checklist of things to do SOA and if you miss one, you’re not doing SOA. What’s happening is architecture teams are incorporating SOA best practices into various other initiatives.”

BTNonline.com: TSA To Assume All Watchlist Matching For U.S. Carriers By June, All Carriers By January

“The U.S. Transportation Security Administration is on track to assume watchlist matching from all U.S. carriers by the end of May, only slightly behind its March 31 U.S. implementation target for the Secure Flight passenger prescreening system, according to a U.S. Government Accountability Office report. The Secure Flight program also calls for TSA to assume watchlist matching from foreign carriers, and the agency already is working with 19 airlines outside the United States to do so. Five of those carriers are fully functional within the program, and an additional 14 are testing, GAO reported.”

[video] KENS5.com: UT Health Science Center helps bring medicine into computer age

“Currently 80 to 90 percent of all medical records are stored on paper.  The goal is that have an electronic health record for everyone in the U.S. by 2014. Electronic health records are expected to greatly reduce the number of medical errors, which is significant.  Each year in the United States, as many as 100,000 people die in hospitals because of such errors.  That’s the equivalent of one major airline crash every single day of every single year.”

Identity Resolution Daily Links 2010-04-06

Tuesday, April 6th, 2010

By the Infoglide Team

ITBusinessEdge: TIBCO Makes MDM Move, but Where Is Oracle?

“There are still options, Karel writes, including S3 Matching Technologies, Syslore or identity resolution/matching vendor Infoglide Software. But if Oracle plans to just use its own matching engine from the Oracle Customer Hub in Oracle Universal Content Management – ‘that would be a step backwards in my opinion,’ he writes.”

The Austin Chronicle: Drug Trafficking Gets Intense?

“But both FBI Agent Royce Curtain and the DPS’ Tom Ruocco said that communication among law enforcement agencies in the area is good – and, said Ruocco, the addition of a local Austin Regional Intelligence Center (a.k.a. a ‘fusion center‘) would be an asset to getting information needed to detect if there is an increase in local drug trafficking activity. Getting involved in the fusion center is ‘proactive on the city’s part,’ said Ruocco. When there are trends ‘coming forward’ the city will be in a ‘better position to react.’”

Insurance Journal: Kentucky Coal Mine Operator Charged with Workers’ Compensation Fraud

“The indictment alleged that between May 2004 and May 2005, Allen underreported monthly payroll and the number of miners working for her to Kentucky Employers Mutual Insurance. She did this by creating a sham trucking company and placing many of her mining employees on that payroll.”

Liliendahl on Data Quality: Breaking through an open door

“Why are some people always reminding us that this and that must be seen in a business context? Of course everything we do in our professional life within data quality, master data management, business intelligence and so on must be seen in a business context.”

Identity Resolution Daily Links 2010-04-03

Saturday, April 3rd, 2010

[Post from Infoglide] Unobtrusive Measures and Identity Resolution

“For decades, researchers in the social sciences have used “unobtrusive measures” as defined originally in a 1966 book by Webb, Campbell, Schwartz, and Sechrest. The idea is to collect and analyze data without disturbing the subjects of the study. For example, instead of surveying subjects to find out how many candy bars they eat each day, the subjects’ garbage is searched and the number of candy wrappers is tallied.”

Information Management: TIBCO Software Acquires Netrics

“Gartner Research VP Andrew White highlighted the continued frenzy of acquisitions in the master data management space in a blog on the latest acquisition. ‘… this new acquisition highlights the dwindling set of data quality tools for master data management (and other interested) vendors to partner with, and/or acquire,’ White wrote. ‘The acquisition seems logical, and good, for packaged MDM (TIBCO offers one) though; but as the music dies down, who will be left standing without a partner…’”

Workers’ Comp Kit Blog: Business Owners and Secretary Facing Prison For Lying to Wiggle Out of High Premiums

“The Ventura County District Attorney’s office recently arraigned the owners, along with the company’s secretary, on five felony counts of insurance premium fraud and two counts of conspiracy to commit insurance fraud. According to authorities, the three lied to their insurer to save an estimated $500,000, making it appear their employees were more experienced than they actually were.”

SmartDataCollective: MDM Can Challenge Traditional Development Paradigms

“Dealing with imperfect data has traditionally been unacceptable because it slowed down processing; ignoring it or returning an error was a best practice. The difference about MDM development is the focus on data content (and value-based) processing.  The whole purpose MDM is to deal with all data, including the unacceptable stuff. It assumes that the data is good enough.”


Bad Behavior has blocked 1175 access attempts in the last 7 days.

Close
E-mail It
Portfolio Strategy News The Direct Marketing Voice