HOME

Archive for the ‘Data Matching’ Category

The Real Test of Identity Resolution

Wednesday, June 24th, 2009

By Robert Barker, Infoglide Senior VP & Chief Marketing Officer

So the title “Catching Terrorists and Making the World a Safer Place” certainly caught my eye! And the content of the post did not disappoint, as the author Chris Boorman of Informatica did a great job of crystallizing the issue that drove the creation of this blog over two years ago: “So how do we balance the freedom of movement we have come to expect as hard-working citizens with the need to spot terrorists?” His answer is “technology” and of course we agree.

When Identity Resolution Daily first began in the summer of 2007, we pointed out the constant tension between freedom and privacy versus the need for security:

In the US, the debate between personal privacy (and perhaps liberties in general) versus security is a long-standing one with roots in the very founding of the nation itself. Folks interested in obtaining data often wonder how much people are willing to give up in the name of greater security or convenience. On the other hand, those more focused on privacy worry about how data is obtained, what it’s used for and where it ends up.

Infoglide CEO Mike Shultz also discussed the responsibility that comes with providing technology that deals with identity:

It was important to all of us here that we didn’t create some sort of Big-Brother-enabling technology. As a result, we designed software that can resolve identities across multiple sources while protecting data privacy and security.

The point he made about the design of the software being critical is vital, and The Center for Digital Government’s white paper entitled “Resolving Identity: The Importance of Who’s Who and the Search for the Perfect Engine” delves into what technology can do to answer questions like “who’s who” and “who’s related to whom.”

In a more recent post, we talked about the components needed for an effective identity resolution solution. It’s not enough to have great similarity matching algorithms, and it’s not even enough to be able to find hidden connections in real time across millions of rows of data, although both those capabilities are obviously required. The real test in catching terrorists and making the world a safer place using identity resolution is how decision-making is automated and integrated into existing business processes.

Identity Resolution Daily Links 2009-06-22

Monday, June 22nd, 2009

By the Infoglide Team

intelligent enterprise: They Better Get This MDM Program Right

“As reported in The New York Times and on the TSA Web site, the Secure Flight program will improve upon current practices in matching passenger identities to watch lists in many ways. At first glance, this appears to be a well thought-out program that conforms to several basic tenets of Master Data Management (in bold below), in this case for the ‘Customer’ entity.”

EHRWMS: Georgia’s Best EMR Used By Three of Top Ten Pediatricians

“Of approximately 100 respondents, 28 used an EMR, of which 40% used the EncounterPRO Pediatric EMR. There were only three other EMRs used more than once, and they were used by only 10%, 7%, and 7% of the survey respondents respectively.”

Government Executive: Enforcement agencies boost cooperation on drug investigations

“In addition, ICE agents for the first time will fully participate in the Organized Crime Drug Enforcement Task Force Fusion Center. The center allows participating federal, state and local law enforcement agencies, including DEA and the FBI, to share information and analytical resources to enhance their overall investigative capacity.”

SmartData Collective: The Data-Information Continuum

“Data could be considered a constant while information is a variable that redefines data for each specific use. Data is not truly a constant since it is constantly changing. However, information is still derived from data and many different derivations can be performed while data is in the same state (i.e. before it changes again).”

Identity Resolution Daily Links 2009-06-19

Friday, June 19th, 2009

[Post from Infoglide] Speaking of Narrative Fallacy

Nassim Nicholas Taleb’s book The Black Swan: The Impact of the Highly Improbable uses “narrative fallacy” to describe how we humans tend to enhance ex post facto our ability to predict events that in fact are extremely complex and random. A recent post on Netrics HD attempts to leverage this argument to demonstrate the superiority of “Machine Learning” (i.e. probabilistic analysis) over “data matching” (i.e. deterministic analysis).

advance: Security and Privacy Challenges to EHR Adoption

“Lest we forget, our country is trying to establish similar capabilities with the widespread initiative to implement electronic health records (EHRs). My health history should travel with me — just as easily as my financial information. With some sort of authentication process, a “core” set of data should be easily available to assist in my receipt of health services.”

New York Times: Flying? Don’t Book Under a Nickname

“The government’s aim is to streamline the process of checking travelers’ names against its watch lists — a task currently handled separately by each airline — and to collect more detailed information so passengers with names similar to those on the watch list are less likely to be mistakenly detained. Asking for a birth date, for instance, decreases the likelihood that a child with a name close to one on the list would be subject to an additional search — one example of a false match that has led to complaints.”

Integrated Solutions for Retailers: Organized Retail Crime: Scope, Solutions

“Popular targets of organized retail crime rings include Crest Whitestrips, Rogaine, Similac baby formula, razor blades, and pregnancy tests. Having not been stored or managed properly, these items can pose serious health risks for innocent shoppers looking for a good bargain. And, because most of these items are sold “new in box,” well-meaning consumers are unaware that what they purchased may be spoiled or expired  —  and stolen.”

Speaking of Narrative Fallacy

Wednesday, June 17th, 2009

By Robert Barker, Infoglide Senior VP & Chief Marketing Officer

Nassim Nicholas Taleb’s book The Black Swan: The Impact of the Highly Improbable uses “narrative fallacy” to describe how we humans tend to enhance ex post facto our ability to predict events that in fact are extremely complex and random. A recent post on Netrics HD attempts to leverage this argument to demonstrate the superiority of “Machine Learning” (i.e. probabilistic analysis) over “data matching” (i.e. deterministic analysis).

Product managers have a long history of creating oversimplified comparisons to competing products and technologies to demonstrate the superiority of their own. A favorite technique is to set up a straw man that can then be knocked down. In the case under discussion, describe a “rules based” system that is very unwieldy to use and requires huge amounts of time to tune, and embed an underlying premise that assumes each new application of a rules-based system starts from scratch with no accumulated domain-specific intelligence. (Of course, this doesn’t work if you choose a more intelligent identity resolution system for comparison.)

We’ve spent time here before talking about the differences between these two approaches, so I’m not going to restate the details again. Truthfully, probabilistic systems like that from Netrics have their place in screening large amounts of data, but like any system, they have their limitations. While they can reach a certain level of performance in emulating users’ decisions, they typically don’t leave a trail for an investigator to follow, they don’t support a rational drill-down into possible suspect transactions the way that deterministic systems do, and they don’t allow attribute-specific tweaking so you can leverage the information and better understanding that you’ve gained over time.

The larger issue is whether a solution can take advantage of appropriate technologies in appropriate circumstances (e.g. using both probabilistic and deterministic analytics in one solution), rather than being forced into an either/or, one-size-fits-all scenario. Solutions like those offered by identity resolution companies supply a framework that can incorporate all of them.

Identity Resolution Daily Links 2009-06-12

Friday, June 12th, 2009

[Post from Infoglide] Data Source Disintermediation?

“According to Wikipedia, ‘disintermediation is the removal of intermediaries in a supply chain: ‘cutting out the middleman’… Buyers bypass the middlemen (wholesalers and retailers) in order to buy directly from the manufacturer and thereby pay less.’”

[Jim Harris] OCDQ Blog: The Two Headed Monster of Data Matching

“Data matching is commonly defined as the comparison of two or more records in order to evaluate if they correspond to the same real world entity (i.e. are duplicates) or represent some other data relationship (e.g. a family household). Data matching is commonly plagued by what I refer to as The Two Headed Monster…”

CorpWatch: CorpWatch announces release of the CrocTail application and open CorpWatch API

CrocTail provides an interface for browsing information about several hundred thousand U.S. publicly traded corporations and their many foreign and domestic subsidiaries. Information from company Securities and Exchange Commission (SEC) filings has been parsed and annotated by CorpWatch to highlight specific corporate accountability issues. CrocTail also serves as a demonstration of the features and data available through the CorpWatch API.”

Vos Is Neias: Washington - TSA Advising Travelers To Book Airline Tickets Using Full Real Names

“While the T.S.A. has announced Aug. 15 as a target date for the airlines to begin asking for each passenger’s full name, gender and date of birth, and has already begun publicizing the program, called Secure Flight, the agency acknowledged that it would go into effect in phases as the airlines update their systems.”

Data Source Disintermediation?

Wednesday, June 10th, 2009

By Robert Barker, Infoglide Senior VP & Chief Marketing Officer

According to Wikipedia, “disintermediation is the removal of intermediaries in a supply chain: ‘cutting out the middleman’… Buyers bypass the middlemen (wholesalers and retailers) in order to buy directly from the manufacturer and thereby pay less.” Some famous disintermediation examples are:

•    Bookselling (e.g., Amazon’s long-tail marketing of millions of books online)
•    Travel (e.g., Southwest Airlines selling tickets direct to consumers on the web)
•    Computers (e.g., Dell selling computers direct to consumer and businesses over the internet).

Disintermediation was THE hot topic during the dot com boom, but the heady prediction that virtually every industry would be disintermediated has yet to become a reality. Nevertheless, over the past decade or so we’ve all tracked the news as one business model after another is attacked by competitors who seek a way to “disintermediate” a particular sector.

Part of the power of identity resolution solutions derives from the data sources upon which they’re based, and both the quantity and quality of data sources can affect the results. One challenging identity resolution problem we’ve written about that relies on a variety of data sources is insider trading (see Leveraging Identity Resolution Data Sources). Drawing on multiple data internal and external, public and private data sources, identity resolution unwinds multiple degrees of business, friendship, and familial relationships to uncover likely illegal stock market gains.

Now potential disintermediation plays related to data sources are emerging. CrunchBase is a well-known example, offering a free database of technology companies, people, and investors that anyone can edit. San Francisco-based CorpWatch is a non-profit engaged in “investigative research and journalism to expose corporate malfeasance and to advocate for multinational corporate accountability and transparency”. They’ve just announced an API that makes it easier to search SEC data:

“Although the SEC provides a search interface for locating company filings (EDGAR / IDEA), and the subsidiary information is not presented in a standardized format suitable for automated use or insertion into a database. The CorpWatch API uses parsers to “scrape” the subsidiary relationship information from Exhibit 21 of the 10-K filings and provides a well-structured interface for programs to query and process the subsidiary data.”

The free CorpWatch API enables identity resolution and other applications to look up the formal names of corporations, ascertain their relationships to other corporations, find their locations around the world, learn their alternate and formal names, and access other useful information. Up to now, you could only get this kind of information from relatively expensive paid subscriptions from commercial data providers.

Is it possible that the efforts of organizations like CorpWatch point to a future in which an abundance of new, free sources of data will make it even easier to create identity resolution applications?

Identity Resolution Daily Links 2009-06-05

Friday, June 5th, 2009

[Post from Infoglide] Entity Resolution vs. Entity Identification

“In entity resolution, as in any new research area, different authors or practitioners may use the same term but intend different meanings. You always have to be careful to understand exactly what a writer means when he or she uses a particular term.”

Ramon Chen: Shared Musings: Informatica acquires AddressDoctor GmbH - adds another MDM component

[Ramon Chen] “Outside of Informatica’s purchases, over the last few years there have been several purchases of supporting MDM products including IBM’s acquisition of Exeros, SAP buying Business Objects, who prior to that bought FirstLogic for $69M in 2006, IBM acquiring Ascential QualityStage and DataStage for $1.1B, D&B acquiring Purisma for $48M. This is a fast moving market and commodity components of the MDM lifecycle are being snapped up by the big boys faster than you can say Master D…”

Health Newstrack: Patients want computer consultations, electronic health record

“‘It seems that as the population ages and finds itself facing more illness and serious medical conditions, privacy of health information becomes much less important to patients than it is when they are healthy,’ she notes. ‘Patients are willing to trade some privacy in order to have records fully available in emergency settings and available to new caregivers as well as to multiple clinicians.’”

Information Week’s Analytics Blog: IT Fusion Centers

“The Fusion Center consolidates, analyzes, and distributes information through the many different organizations in order to enhance the ability to foresee and hopefully forestall terrorist activities. Many IT organizations are seeking to adopt the Fusion Center model as a means of obtaining a better overall view of their operations. They want to maximize resources and streamline operations just as their peers in the field of counterterrorism have done.”

Workers Comp Insider: Aging America: A Looming Catastrophe?

“The IAIABC Journal is published two times per year by the International Association of Industrial Accident Boards and Commissions (IAIABC), an association of government agencies that administer and regulate their jurisdiction’s workers’ compensation acts. It’s a peer-reviewed Journal, and one of a few remaining venues that publishes original research papers and in-depth treatment of workers compensation issues and opinions.”

Identity Resolution Featured in IAIABC Journal

Wednesday, May 27th, 2009

If you’re not familiar with the International Association of Industrial Accident Boards & Commissions (IAIABC), it’s a very active non-profit organization of government agencies that administer workers’ compensation programs in the U.S., Canada, and other countries. In addition to sponsoring a large number of industry events including conferences and training seminars, they publish an excellent journal twice yearly that provides educational articles about education, research, and management of workers’ compensation issues.

The April issue of IAIABC Journal includes an article authored by Infoglide’s Charles Clendenen. “Introducing Identity Resolution: A New Approach to Workers’ Compensation Fraud” discusses three types of workers’ compensation fraud and how identity resolution (aka entity analytics or entity resolution) is being applied to make the process of finding potential employer fraud easier and more cost-effective.

While medical fraud and employee fraud are significant problems, “employer premium fraud, while less publicized, can involve millions of dollars in unpaid or underpaid premiums and can cause much more damage to the insuring agency.”

Employer premium fraud can take several forms. In order to avoid paying premiums, a company’s owners may illegally classify permanent employees as contractors. Alternately, they may operate for some time without paying their premiums, and then when the insurer is about to take action, they simply shut down the company on paper and reconstitute it under another name. Companies also use this “going out of business” ploy in cases where their experience (or modification) rating has gone up due to multiple injuries, thereby resulting in higher premiums. By reopening as another company, they can effectively reset their experience rating. 

Clendenen goes on to introduce identity resolution technology and discuss its origins, then talks about how it can be applied to solve workers’ comp employer fraud.

While identity resolution technologies can be applied to employee and provider fraud, they are particularly effective at uncovering employer premium fraud. Finding companies who are not registered for workers’ compensation involves comparing databases where companies are advertising themselves as open for business to lists of businesses registered with state workers’ compensation programs. The results can highlight companies who have not registered or are not paying premiums, companies who have changed their name often, and companies involved in hidden contractor/ subcontractor relationships.

The rest of the article talks in more detail about how identity resolution can be applied and the potential return on investment (ROI) agencies can expect.

Click here to read the full article, and to learn more about IAIABC, check out their web site.




The Growing Role of Identity Resolution in MDM

Wednesday, May 20th, 2009

By Dan Power, President and Founder, Hub Solution Designs

There definitely seems to be a trend lately with small companies in the master data management (MDM) and data quality space being purchased (as in the asset acquisition of Exeros by IBM) or partnering with larger firms (such as Silver Creek Systems’ OEM relationship with Oracle).

I think this is a good thing. Using the classic “build, buy or ally” strategy, it isn’t surprising that sometimes companies will conclude that it’s faster and/or cheaper to buy a technology, or partner with another company that has that technology, rather than build it themselves internally.

A lot of companies do tend to suffer from the “not invented here” syndrome, where anything not developed inside their four walls tends to be regarded with disdain. But that tendency leads to a much slower pace of innovation. In very competitive industries like enterprise software, getting there faster is a very definite advantage.

Since I’ve been working with the identity resolution experts at Infoglide, I’ve become much more aware of the role identity resolution technology plays in our daily lives. Every time you get on an airplane, file an insurance claim, apply for a job / mortgage / credit card, or even shop in a retail store or on a web site, your identity is probably being evaluated by an Identity Resolution Engine.

A lot of people in the MDM space refer to this as “matching”, but there’s considerably more to Identity Resolution than the sophisticated pattern matching that most MDM hub platforms use today. The more robust form – Identity Resolution – is mostly used currently for sophisticated applications like terrorist screening and anti-money laundering, where big consequences or big dollar amounts are at stake.

But that technology is gradually filtering down to more routine commercial applications like master data management for customers. The large MDM vendors like Oracle, IBM and SAP – and the smaller vendors like Siperian, Initiate Systems and D&B/Purisma – will follow the “build, buy or ally” pattern, with some opting to create their own more sophisticated Entity Resolution capabilities, some buying smaller firms who already have those advanced products, or perhaps partnering as a middle ground between building and buying.

Either way, this trend is good both for specialized companies like Infoglide and for the general public. We’ll all be a little safer getting on a plane, a little less likely to suffer from identity theft or confusion, and perhaps save a little money through reduced incidence of various types of fraud.

Full-fledged Identity Resolution is a capability that most MDM hubs should plan on adding in the next revision cycle or two, as MDM customers become more discriminating and more demanding of their hub’s ability to identify individuals and businesses from an ever-growing stream of data.

Dan Power is president of Hub Solution Designs, a consulting firm specializing in master data management and data governance. He writes a popular blog and a column for Information Management magazine, speaks frequently at technology conferences, and regularly advises clients on developing & implementing high impact MDM and data governance strategies.

Identity Resolution Daily Links 2009-05-18

Monday, May 18th, 2009

By the Infoglide Team

e-patients.net: Meaningful Use: The Elephant IS In The Room

“A recent NPR/Kaiser Family Foundation poll shows that the American public is surprisingly more positive about the potentials of EHRs than most professionals. People already are familiar with computerized information and accept its risks.”

IT-Director.com: Trends in Master Data Management

“The interesting question is how much pressure this puts on the other MDM players with data quality solutions (like Dataflux and SAP/Business Objects) to build out their data profiling capabilities into the area of data discovery.”

NationalSecurity.org: MYTHBUSTER: TSA’S WATCH LIST IS MORE THAN 1 MILLION PEOPLE STRONG

“There are less than 400,000 individuals on the consolidated terrorist watch list and less than 50,000 individuals on the no-fly and selectee lists. Individuals on the no-fly and selectee lists are identified by law enforcement and intelligence partners as legitimate threats to transportation requiring either additional screening or prohibition from boarding an aircraft.”

OCDQ Blog: TDWI World Conference Chicago 2009

“TDWI World Conference Chicago 2009 was held May 3-8 in Chicago, Illinois at the Hyatt Regency Hotel and was a tremendous success.  I attended as a Data Quality Journalist for the International Association for Information and Data Quality (IAIDQ). I used Twitter to provide live reporting from the conference.  Here are my notes from the courses I attended…”


Bad Behavior has blocked 333 access attempts in the last 7 days.

Close
E-mail It
Portfolio Strategy News The Direct Marketing Voice