HOME

Archive for May, 2010

Identity Resolution Daily Links 2010-05-30

Sunday, May 30th, 2010

Identity Resolution Daily Links 2010-05-29

[Post from Infoglide] Reference Linking Methods - Part 1

“In the last few posts, we reviewed the basic architectures used to implement entity resolution (ER) systems.  Although this gives us the big picture at the systems level, ER really takes place at the reference (record) level where the system must ultimately decide whether two references are for the same or for different real-world objects, i.e. to link or not to link.  In this series I’ll discuss some of the most common methods for making these linking decisions.”

P&C National Underwriter: New York City Listed As No.1 In Staged Auto Accident Fraud

NYAAIF noted that the New York-based Insurance Information Institute reported that fraud and abuse in the New York no-fault system accounts for roughly 20 percent of every no-fault claim paid—or about $1,561 per claim. Spread across the state, that amounted to nearly $230 million in ‘fraud taxes’ in 2009, according to the Alliance.”

Bank Info Security: 6 Signs of Business Loan Fraud

Three states (Wyoming, Nevada and Delaware) do not require any proof of identification to set up a business. Another 26 states allow a limited liability corporation (LLC) to be set up without showing beneficial ownership. ‘When banks try to cross-reference within their own business customers, they’ll find the connection,’ she says. ‘But when they distribute it across several banks, it’s not clearly visible. It’s hard to do pattern relationships because banks don’t compare notes, so that’s how [the fraudsters] dilute and avoid detection.’”

GovMonitor: Homeland Security Outlines 2010 Summer Travel Tips

“The Secure Flight watch-list matching process occurs before a passenger even gets to the airport so if you get a boarding pass, the Secure Flight matching process is done. In other words, you are clear once you get that pass.”

Reference Linking Methods - Part 1

Thursday, May 27th, 2010

By John Talburt, PhD, CDMP, Director, UALR Laboratory for Advanced Research in Entity Resolution and Information Quality (ERIQ)

In the last few posts, we reviewed the basic architectures used to implement entity resolution (ER) systems.  Although this gives us the big picture at the systems level, ER really takes place at the reference (record) level where the system must ultimately decide whether two references are for the same or for different real-world objects, i.e. to link or not to link.  In this series I’ll discuss some of the most common methods for making these linking decisions.

I classify these methods into four categories:

  1. Direct matching
  2. Transitive matching
  3. Association analysis
  4. Assertion

In this post, we’ll consider the first and most familiar category, direct matching.  Here we decide to link based on the degree of similarity between the values of corresponding identity attributes.  For example, if the identity attributes are first name, last name, and date-of-birth in a certain context, then in direct matching we would compare the values for these attributes between two references.  In its simplest form, deterministic matching, the decision is yes if and only if all three values match exactly, i.e. two references should be linked (are equivalent) only when the first names are the same, the last names are the same, and the dates-of-birth are the same.  Otherwise they are judged to be references to different persons.

Deterministic matching is very easy to implement, but it is not usually very effective.  Its lack of effectiveness stems from the pervasiveness of information quality (IQ) issues.  If the reference values are inaccurate, inconsistent, or missing, then direct matching creates too many false negatives, i.e. references to the same entity that really should be linked but don’t satisfy the deterministic matching criteria.  When names are misspelled, nicknames are used, values are missing, or date formats are inconsistent, the direct match between references can fail even though the references were intended to reference the same person. It should be clear that IQ is closely related to ER.

To address these issues, most systems rely upon some of level of probabilistic matching.  In this form of matching, we link records even if some attributes’ values are different as long as the values of certain other attributes are the same.  Using the previous example, we might decide that the context in which we are working only requires an exact match on last name and date-of-birth in order to link the records.  Generally this incurs a certain amount of risk of creating a false positive link, i.e. references to the different entities that match on certain attributes but should not be linked.  This risk is expressed as the probability that this might happen, hence the term “probabilistic” matching.

Probabilistic matching has been the subject of extensive research.  Most modern practice in probabilistic matching is based in the work of two Canadian statisticians, I.P. Fellegi and A. B. Sunter, who published A Theory for Record Linkage in 1969.  Their model, the Fellegi-Sunter Model, provides a systematic way of creating a probabilistic matching scheme that is optimal with respect to a given level (tolerance) of false positive and false negative risk.

Probabilistic matching is binary in the sense that attribute values either match or don’t match.  In our example, we could represent the case where we only require the last name and date-of-birth to match by the binary string “011” where the zero in the first position means that the first name doesn’t match, while the ones in the second and third positions mean that the last name and date-of-birth must match exactly.  When there are three attributes there would be 8 possible binary combinations to consider.  The problem with the binary model is that it doesn’t account for similarity.  Intuitively we would feel much more confident that the references “John, Doe, 1989-08-13” and “Jon, Doe, 1989-08-13” should be linked than we would the references “John, Doe, 1989-08-13” and “Mary, Doe, 1989-08-13”.

Therefore, a common extension of probabilistic matching is to allow for intermediate levels of similarity between values, i.e. accounting for the fact that attributes values may not be the same but are similar.  For example, if the name values differ only by one character, we could say that the names are similar, or if the dates-of-birth differ by less than 10 days, we could say the dates are similar.  We have now moved from a binary model to a tertiary (base 3) model so that in our previous example, the first pair of references would fit the pattern “122” and the second pair the pattern “022” where 1 represents similar values and 2 epresents the same value.  The downside is that there are now more patterns to analyze and evaluate.  For 3 attributes there are now 27 cases to consider instead of 8.

Probabilistic matching that allows for intermediate levels of similarity is sometimes called fuzzy matching.  Although the term fuzzy implies that there is some leeway, in practice we must always set a discrete threshold that limits the amount dissimilarity we are willing to tolerate.  Fuzzy matching also introduces a plethora of schemes for measuring similarity between two values.  In the cases where the values are character strings, such as for names, the schemes are called approximate string matching (ASM) algorithms.

One of the most often used is the Levenshtein Edit Distance that counts the minimum number of character transformations (usually insertion, deletion, and substitution) that will transform one string into another.  For example the edit distance between “Smythe” and “Smith” is 2 because in the first string you can substitute “i” for “y” and delete the “e” to create the second string in 2 transformations.

Typically ASM outputs are normalized to a scale from 0 to 1.  To normalize edit distance, divide the edit distance by the number of characters is the longest string.  In this example, the normalized edit distance would be 2/6 or 0.33.  Many other ASM algorithms have been developed such as Jaro, Jaro-Winkler, q-grams, Soundex, Smith-Waterman, and Ukkonen, just to mention a few.

In the next post we will discuss transitive matching.

Identity Resolution Daily Links 2010-05-25

Tuesday, May 25th, 2010

By the Infoglide Team

Information Management: 10 Key Trends In MDM

“During 2010, independent/standalone data quality vendors (Clavis, Pitney Bowes, Human Inference and Trillium) will focus on name and address cleansing as they struggle against better-funded match/merge and data profiling capabilities increasingly integrated with megavendor MDM. Also at this time, a dearth of non-aligned matching algorithms (such as those from Digital Trowel, Infoglide, Omikron and Uniserve) will engender ‘algorithm envy’ among disenfranchised MDM providers.”

NewCityPatch: Legislator: Rockland Should Review Medicaid Spending

“Rockland County Legislator Ed Day, R-New City, has called for a review of Medicaid spending by the county that would also determine whether enough is being done to prevent and detect Medicaid fraud. ‘Medicaid expenditures represent an amount that is 110 percent of all the property taxes collected here in Rockland,’ said Day.”

Canadian Immigration: Canada should improve its AML efforts according to US report

“The most significant area of concern is organized crime. Canadian Security Intelligence Service estimates that there are about 750 organized crime groups operating in Canada and 80% of them are involved in the illicit drug trade. The cross-border movement of currency was identified as a continued concern.”

Identity Resolution Daily Links 2010-05-22

Saturday, May 22nd, 2010

[Post from Infoglide] Customer Authentication and Identity Resolution

“The accepted meaning of ‘multi-factor authentication’ is employing at least two of the three standard factors used to authenticate identities:

  1. something the user knows (e.g. , PIN or password)
  2. something the user has (e.g., ATM or smart card)
  3. something the user is (e.g., biometric such as fingerprint)

Building upon this well understood concept in the banking and financial services world, I’d like to describe how identity resolution technology extends and greatly enhances the value of authentication systems to the enterprise.”

LexisNexis Workers’ Compensation Law Community: NY: Owner of Manhattan Temp Agency Hit With $25M Comp Fraud

“Mr. Goldstein also failed to cooperate to allow NYSIF to audit the companies’ payrolls, wherein NYSIF would simply raise premium rates on the policies in effect. To avoid paying higher rates, Mr. Goldstein allowed NYSIF to cancel policies for non-payment, and repeated this pattern by allegedly obtaining other policies from NYSIF under false pretenses.”

CRMBuyer: The Big Business of Electronic Health Records, Part 2

“The federal EHR program authorized under the American Recovery and Reinvestment Act of 2009 consists of two parts. The first provides financial assistance through Medicare and Medicaid to healthcare providers who implement EHR systems. In the second phase, instead of receiving financial assistance, providers who fail to comply with EHR implementation requirements will be penalized by reductions in their Medicare or Medicaid reimbursements.”

Technology Review: TR10: Cloud Programming

“Today, many developers are converting existing programs to run on clouds, rather than creating new types of applications that could work nowhere else. And they are held back by difficulties in keeping track of data and getting reliable information about what’s going on across a cloud.”

Customer Authentication and Identity Resolution

Thursday, May 20th, 2010

By Mike Betron, Infoglide Director of Marketing

The accepted meaning of “multi-factor authentication” is employing at least two of the three standard factors used to authenticate identities:

  1. something the user knows (e.g. , PIN or password)
  2. something the user has (e.g., ATM or smart card)
  3. something the user is (e.g., biometric such as fingerprint)

Building upon this well understood concept in the banking and financial services world, I’d like to describe how identity resolution technology extends and greatly enhances the value of authentication systems to the enterprise.

A tacit assumption of multi-factor authentication is that the user mentioned above is legitimate: how else could he or she have the password, smart card, or biometric data? While this assumption may be enough to protect against stolen cards, it doesn’t guard against the user who finds a way to open an account through legitimate means with the intent to defraud. Let me explain.

Billions of dollars are laundered through banks and other financial institutions each year. Accounts (and account owners) that appear legitimate to the institution often move money into and out of the financial system undetected by various means, including trade-based money laundering. Presumably this activity happens despite the presence of authentication measures.

To catch the perpetrators, institutions often focus on improving data quality, either through simple measures like de-duplication or more sophisticated master data management systems. We’ve talked many times here about how these “data quality” efforts can actually harm the process of identifying multiple identities and hidden relationships held by bad actors.

Consider instead the benefits of of integrating authentication systems with high-powered identity resolution systems tied to multiple data sources. Existing multiple identities and hidden relationships become another layer of authentication and incorporate fraud identification into the process. If Joe Blow has multiple related identities, for example, the system can pose a question during authentication drawn from one of the identities that would validate whether the user was legitimate.

Food for thought, and we’d like to hear your thoughts!

Identity Resolution Daily Links 2010-05-18

Tuesday, May 18th, 2010

By the Infoglide Team

Consumer Traveler: TSA announces Secure Flight will be complete within a month

“Speaking at U.S. Travel Association’s Pow Wow conference to encourage foreign tourism, Leyh noted that TSA is about to complete their mission of taking back the watchlist matching. This is part of maintaining control of the actual list for security and of relieving the airlines of the responsibility of performing the matches prior to allowing passengers to board.”

iHealthBeat: Reform Law Calls for Use of New Technology To Fight Medicare Fraud

“At the report’s release, HHS and Department of Justice officials said a new CMS program called the Center for Program Integrity would help implement the anti-fraud provisions of the reform law by using sophisticated techniques to uncover improper payments. Officials said CMS also would work with the private health care sector to combat fraud.”

Journal of Online Business: What Is There To Learn About Cloud Computing?

“One big advantage to a small business of accessing only what is needed at the time that it is needed is that initial capital outlay for end user licenses, individual work stations, and the like is considerably reduced. Thus there is little reason to outright purchase something that is merely required for special circumstances, monthly, or year end reports.”

Identity Resolution Daily Links 2010-05-15

Saturday, May 15th, 2010

[Post from Infoglide] Trade-Based Money Laundering

“Who’d have thought that iTunes could be used for money laundering? Yet that is exactly what five men in Great Britain were recently jailed for the other day. Using stolen credit card numbers, they bought £750,000 in vouchers, then sold them at cheaper prices over eBay. Methods of money laundering continue to evolve.”

Liliendahl on Data Quality: Big Time ROI in Identity Resolution

“So the question is if authorities may have avoided losing 5 billion taxpayer Euros if some identity resolution including automated fuzzy connection checks and real world checks was implemented. I know that you are so much more enlightened on what could have been done when the scam is discovered, but I actually think that there may be a lot of other billions of Euros (Pounds, Dollars, Rupees) to avoid losing out there by making some decent identity resolution.”

LISTA: The Privacy and Security Challenges of Electronic and Personal Health Records: Is Your Business Prepared?

“In a 2008 study conducted by Kroll Fraud Solutions/HIMSS Analytics to better understand the status of patient data security at hospitals, the hospitals surveyed reported an average level of preparedness to deal with a security breach of 5.88 on a one to seven ascending scale.19  Yet the same study indicated that only 56 percent of these hospitals had notified patients whose information was compromised as a result of a security breach.”

Newsweek: Intel Paper Says Al Qaeda’s Yemeni Affiliate More Determined Than Ever to Attack Inside U.S.

“The ‘official use only’ bulletin, produced by the Northern California Regional Intelligence Center, a partnership of federal, state, and local agencies originally set up to deal with drug trafficking, is entitled ‘Al-Qa’ida in the Arabian Peninsula’s Online Rhetoric Signals Shift in Intentions.’”

Trade-Based Money Laundering

Friday, May 14th, 2010

By Mike Betron, Infoglide Director of Marketing

Who’d have thought that iTunes could be used for money laundering? Yet that is exactly what five men in Great Britain were recently jailed for the other day. Using stolen credit card numbers, they bought £750,000 in vouchers, then sold them at cheaper prices over eBay.

Methods of money laundering continue to evolve. When authorities constrain certain types of money laundering, perpetrators migrate to other methods. Since law enforcement has focused its efforts on two methods – (1) the movement of value through the financial system using checks and wire transfers, and (2) the physical movement of banknotes via cash couriers and bulk cash smuggling – a third method called “trade-based” money laundering is growing in popularity.

Trade-based money laundering is defined by the Financial Action Task Force (FATF) as “the process of disguising the proceeds of crime and moving value through the use of trade transactions in an attempt to legitimize their illicit origins.” Kenneth Rijock, Financial Crime Consultant for international anti-money laundering risk intelligence firm World-Check, recently commented that disguising funds as goods is now the way a significant portion most of laundered money is moved illicitly. “If I can move $100 million from New York to Columbia via Venezuela, I’m certainly not going to smuggle it down there when I can move it through trade-based money laundering.”

The newly revised Bank Secrecy Act and Anti-Money Laundering Examination Manual contains an expanded section on trade-based money laundering. These operations are successful because of the difficulty in detecting complex relationships between trading operations, operators, and money movements. Three key barriers make it tough to detect trade-based money laundering:
1.    The tremendous volume of trade makes it easy to hide individual transactions;
2.    The complexity that is often involved in multiple foreign exchange transactions; and
3.    The limited resources available to agencies wanting to detect the fraud.

These barriers are difficult if not impossible for traditional methods to address. The volume of trade means that highly scalable automated methods are needed, but the complexity of sifting through multiple transactions and finding hidden connections is beyond the capabilities of normal methods.

For those familiar with identity resolution (e.g., Identity Resolution Engine [IRE]) technology, its strengths address these barriers directly:

  1. Volume - IRE can process millions upon millions of transactions daily in the largest and most demading application environments.
  2. Complexity - Non Obvious Relationship Analysis finds hidden relationships, or neural networks,  across multiple disparate and remote data sources, including both internal and external data.
  3. Resources - Configurable, automated processes optimize the use of available human resources by eliminating “clean” transactions and prioritizing potential “dirty” ones.

Identity Resolution Daily Links 2010-05-11

Tuesday, May 11th, 2010

By the Infoglide Team

Media Health Leaders Media: Detroit Doc Gets Six Years for Medicare Fraud

“Myint, of Bloomfield Hills, MI, was also ordered to pay more than $3.1 million in restitution, jointly with co-defendants, and to serve two years of supervised release following his prison term. Terrence Hicks, of Jackson, MI, the patient recruiter, was ordered to pay more than $4.9 million in restitution, jointly with co-defendants, and to serve three years of supervised release following his prison term.”

AolTravel: Is the No-Fly List Working?

“‘The TSA is hoping to smooth glitches with the new Secure Flight program — a system by which the ‘TSA will conduct uniform prescreening of passenger information against federal government watchlists,’ according to an official statement. ‘The TSA is taking over this responsibility from the airlines.’ The TSA says the Secure Flight system will be in effect for all domestic flights by mid-2010 and all international flights by the end of 2010, at which time the latest two-hour notification rule will become moot (since the airlines will no longer be responsible). Meanwhile, in the case of Shahzad, Kahn says it’s important to remember that the current system — for all its perceived faults related to his near escape — ultimately did what it was meant to do.”

ITBusinessEdge: Baby Steps to Master Data Management

“If you want to start small with master data management, you’ve got to start with a noun, says Evan Levy, a partner at Baseline Consulting  and an instructor with The Data Warehousing Institute… The problem is, IT doesn’t think in nouns. IT is all about the verb: Defining, coding, testing, supporting. What’s more, IT departments tend to view the world in terms of projects – fulfilling this feature request, upgrading to this release, migrating to this server.”

Liliendahl on Data Quality: Aadhar (or Aadhaar)

“In Denmark we have had such an identifier (one for citizens and one for companies) for many years. It is not used by everyone everywhere – so you still are able to make money being a data quality professional specializing in data matching. The main reason that the unique citizen identifier is not used all over is of course privacy considerations.”

Identity Resolution Daily Links 2010-05-07

Friday, May 7th, 2010

 [Post from Infoglide] The Big Short: How the Credit Scoring World Has Shifted

“The hottest non-fiction book at the moment is The Big Short: Inside the Doomsday Machine. Best-selling author Michael Lewis explores and explains what went on behind the scenes during the years leading up to the big stock market crash in 2008 and answers a crucial question: “Who understood the risk inherent in the assumption of ever-rising real estate prices, a risk compounded daily by the creation of those arcane, artificial securities loosely base on piles of doubtful mortgages?” While misguided government policies together with greed and stupidity provide the larger answer, events during that time beg certain questions about the specific ways in which credit risk is evaluated.”

BANK INFO SECURITY: 22 Banking Breaches So Far in 2010

“There have been 173 reported data breaches so far in 2010, and 34 of these involve financial services companies. This means that in less than one quarter of the year, we already have seen more than one-third of the 62 banking-related breaches reported in all of 2009… If the breach trends do continue as they did in 2009, then financial service companies will continue to experience malicious hacking and insider theft. The challenge for organizations such as the ITRC is that many organizations fail to report their breaches.”

nbc4i: Clerk Faces Felony Charges After Alleged Lottery Fraud

“Both tickets were presented to Ikhlayel by undercover lottery investigators posing as customers. In both instances, Ikhlayel told the investigators the tickets were not winning tickets. An investigation indicated both tickets were validated at the Downtowner Marathon shortly after being presented to Ikhlayel, authorities said.”

San Francisco Examiner: Posh Bagel’s managers charged with workers’ comp fraud

“Employers that aim to lower their workers’ comp expense through dishonest means try all sorts of tricks, from under-reporting payroll to lying about the state in which their employees work. Such ruses seldom succeed since insurers regularly audit insureds for premium fraud, and they’ve seen every trick in the book. Moreover, the modest boost that premium fraud gives the bottom line is hardly worth the risk.”

 


Bad Behavior has blocked 1175 access attempts in the last 7 days.

Close
E-mail It
Portfolio Strategy News The Direct Marketing Voice