HOME

Archive for the ‘Identity Matching’ Category

Big Oil and Big Data

Saturday, February 5th, 2011

By Mike Betron, Infoglide Software Director of Marketing

In “Mining the Tar Sands of Big Data”, Matthew Driscoll and Roger Ehrenberg draw an apt parallel between the earth’s vast oil reserves and big data: until recently it wasn’t economically and technically feasible to mine these resources efficiently. In both cases, that’s changing.

The authors trace the growth in the amount of data generated to “advances in three principal areas: sensor networks, cloud computing, and machine learning.” Both physical (e.g., RFID) and software (e.g., tweets) sensors exist, and multiple forms are being deployed in products and processes each day, thus generating a tsunami of data that grows exponentially each year.

In fact, the growth in big data is even affecting the consumption of energy:

Just as these devices have multiplied, so have the data centers that they communicate with. Housed in climate-controlled warehouses, they consume an estimated 2 percent — and represent the fastest growing segment — of the United States energy budget.

We’ve covered and written about the impact and potential of cloud computing here before. By treating computing resources as a utility served up “by the drink”, cloud computing is another enabler of today’s spectacular increase in data generation.

Machine learning is the third promising factor listed by the authors that is related to the big data explosion:

Its algorithms lie at the heart of spam filters, self-driving cars, and movie recommendation systems, including one to which Netflix awarded its million-dollar prize to in 2009. While data storage and distributed computing technologies are being commoditized, machine learning is increasingly a source of competitive advantage among data-driven firms.

Those who want to exploit the availability of big data have another powerful tool at their disposal – entity resolution. The ability to search across multiple databases with disparate forms residing in different locations can tame large amounts of data very quickly, efficiently resolving multiple entities into one and finding hidden connections without human intervention in many application areas, including detecting financial fraud.

By exploiting advancing technologies like entity resolution, systems can give organizations a distinct competitive advantage over those who lag in technology adoption.

Identity Resolution Daily Links 2011-02-01

Tuesday, February 1st, 2011

By the Infoglide Software Team

Proactive Investors UK: Westminster Group secures distribution rights for ID fraud detection software in UK and Middle East

Westminster Group PLC (LON:WSG) said it has secured distribution rights in the UK and the Middle East for the identity fraud detection software of Texas-based Infoglide Software. The commercial details of the deal were not disclosed. Westminster expects the product to be popular with national security agencies across the Middle East in particular. Infoglide’s identity resolution technology searches, matches and links entities across multiple, disparate data sources using over 50 algorithms.”

WSJ: Confidentiality Cloaks Medicare Abuse

“Dr. Wayne, a 50-year-old osteopath, denies abusing the system and hasn’t been accused of wrongdoing by authorities. He says his regimen ‘does wonders’ if used correctly. He adds that he gave physical therapy to ‘patients who needed it, with appropriate diagnoses, and I should get paid for it.’ Medicare administrators apparently felt otherwise. In 2009 he says he was placed on heightened scrutiny and eventually sold his business. But not until he had received more than $2.6 million from Medicare between 2007 and 2009, according to the person familiar with the matter.”

GigaOm: Mining the Tar Sands of Big Data

“Unlike oil reserves, data is an abundant resource on our wired planet. Though much of it is noise, at scale and with the right mining algorithms, this data can yield information that can predict traffic jams, entertainment trends, even flu outbreaks. These are hints of the promise of big data, which will mature in the coming decade, driven by advances in three principal areas: sensor networks, cloud computing, and machine learning.”

Criminal Justice Online: West Haven Woman Admits Role in Mortgage Fraud Scheme

“Specifically, on October 1, 2009, MARTINEAU purchased a residence at 211 Lloyd Street in New Haven after working with others to obtain an FHA-insured loan to buy the house at the fraudulently inflated price of $160,000. The loan package for this transaction included false information about the MARTINEAU’s employment, assets and liabilities, and MARTINEAU’s intention to occupy the property as her principal residence. The loan application also was supported by false documentation, including earning statements and fraudulent bank records.”

Identity Resolution Daily Links 2011-01-23

Sunday, January 23rd, 2011

[Post from Infoglide] Financial Services Has a Growing Problem: Internal Fraud

“The Aite Group recently authored a report entitled ‘Internal Fraud: The Devil Within.’ After surveying 35 fraud and product executives at financial institutions across the U.S. and Canada, they concluded that internal fraud is a severe and growing problem that often goes undetected and almost always flies under the radar of public scrutiny.”

Bloor: There’s identity resolution and then there’s identity resolution

“The second type of identity resolution is similar but different. The classic example is in police work. Here you want to know that some particular criminal has fifteen different aliases, say. Moreover, under each of those identities he or she will have multiple contacts and you may want to do social network analysis against those contacts to see who else might have criminal tendencies.”

Chicago Sun Times: Police sensing crime before it happens

“In October, the Chicago Police Department’s new crime-forecasting unit was analyzing 911 calls for service and produced an intelligence report predicting a shooting would happen soon on a particular block on the South Side. Three minutes later, it did, police officials say. That got police Supt. Jody Weis thinking. He wondered if the department could produce intelligence reports even quicker. Next time, officers might have an hour’s notice before a shooting — instead of just a few minutes.”

KERO23:Ten People Indicted In Wide-Ranging Real Estate Scam

“The indictment alleges that, from approximately January 2004 to September 2007, the defendants perpetrated a scheme to defraud mortgage lenders by submitting fraudulent loan applications with material misrepresentations, including misrepresentations concerning the borrower’s income, assets, employment status, and intent to use the home as the borrower’s primary residence… The scheme involved more than $20 million in losses to lenders.”

Identity Resolution Daily Links 2011-01-16

Sunday, January 16th, 2011

[Post from Infoglide] Entity Identity Management

“What is entity identity management? It simply means that an ER system can store and maintain a record of identity information that persists over time.  Entity identity management is essential for an ER engine to operate in identity resolution or identity capture mode and for it to maintain persistent entity identifiers.”

InformationWeek: 4 Companies Getting Real Results From Cloud Computing

informationweek-cloud-computing-survey.jpg

“Why are they moving to the cloud? Rarely because it’s considered cheaper. In some cases, the cloud represents a faster, more flexible way to get a new system up and running. Oftentimes, it’s the ease of integration afforded by the cloud servers, using standard Web service practices, that lets a company launch a new mobile application faster or run a business process that cuts across many partners more efficiently.”

MindHealthBiz: Consumer ID

“A little over a year ago, Rand Corporation said that the Unique Patient Identifier would cost $11 billion, and pay off nationwide in reducing these sorts of medical errors, and in simplifying the nationwide effectiveness of the Electronic Health Record (EHR), which in turn can introduce a high level of efficiency, and a way to enforce patient privacy.”

pi newswire: US Sues NYC for Medicaid Fraud

“Enrollment in Personal Care Service requires approval from a qualified health care professional. This approval is missing in the cases of a ’substantial percentage’ of the 17,500 individuals who have received 24-hour care since 2000, claims the lawsuit. The lawsuit lists several examples of patients allegedly not properly assigned to the Personal Care Service. A 65-year-old woman was deemed to only need limited care, being of sound mind and body. Instead, she was provided with 24-hour care on the federal government’s bill.”

Entity Identity Management

Friday, January 14th, 2011

By John Talburt, PhD, CDMP, Director, UALR Laboratory for Advanced Research in Entity Resolution and Information Quality (ERIQ)

First, let me wish everyone a Happy and Prosperous New Year.  Also, since my last post, my book Entity Resolution and Information Quality has been published and is now available from Morgan Kaufmann Publishing (http://mkp.com/news/entity-resolution-and-information-quality).

What is entity identity management? It simply means that an ER system can store and maintain a record of identity information that persists over time.  Entity identity management is essential for an ER engine to operate in identity resolution or identity capture mode and for it to maintain persistent entity identifiers.

As you may recall from previous discussions, an identity resolution ER system starts with a set of known (asserted) identities and attempts to determine if a given entity reference refers to one of these known entities.  On the other hand, an identity capture ER system starts with a blank slate and tries to construct an identity based on the (equivalent) references it processes.

Two important concepts here bear further discussion.  One is the structure for representing the identity of an entity, and the second and somewhat more philosophical question is, what constitutes entity identity.

There are two commonly used approaches to representing identity in ER systems – one is an attribute-level structure sometimes called a “merge identity” and the other is a reference-level structure sometimes called a “cluster identity.”  The difference between a merge identity and a cluster identity can be illustrated by a simple example.

Suppose we have a system where entity references have three attributes A, B, and C, and that we are given two specific entity references R1=(a1, b1, c1) and R2=(a2, b2, c1), where a1 and a2 are values for attribute A, b1 and b2 values for attribute B, and c1 a value for attribute C.  Finally assume that references R1 and R2 are determined to be equivalent references (i.e. references to the same real-world entity).  In the merge identity approach, the entity identity EM referenced by R1 and R2 would be represented as

EM=[A:{a1, a2}, B:{b1, b2}, C:{c1}]

Meaning that for identity EM the A attribute can take on either the value a1 or a2, the B attributes can take on the value b1 or b2, and the C attribute the value c1.  In a merge identity the binding between the values a1 and b1 that was expressed by their co-occurrence in the reference R1 is lost.  Similarly the binding between a2 and b2 expressed by R2 is no longer present in EM.

In a cluster identity structure, the original reference binding between attribute values is preserved.  In the cluster identity approach, the entity identity EC referenced by R1 and R2 would be represented as

EC=[(A:a1, B:b2, C:c1), (A:a2, B:b2, C:c1)]

Thus, for identity EC the attributes A, B, and C can only take on the permutations given by the original references R1 and R2. There are advantages and disadvantages to both approaches, but most significantly they can lead to different resolutions for the same set of references.

To illustrate, let’s continue with the preceding example by supposing that the systems using the merge identity and the cluster identity both use the same two resolution rules.  Rule 1 is that the two references are considered equivalent if they agree (exact match) on Attribute C.  Rule 2 is that they are equivalent if they agree (exact match) on both Attributes A and B.

Now suppose that each system processes a third entity reference R3=(a1, b2, c2).  Using the two rules just discussed, the merge identity system would resolve R3 as equivalent to the identity EM represented by references R1 and R2.  By Rule 1, R3 agrees with EM on attribute A and also attribute B.  On the other hand, R3 would not resolve to the identity EC in the cluster identity system.  R3 does not satisfy either Rule 1 or Rule 2 with respect to either of the references R1 and R2 that comprise the cluster identity EC.

Merge identities and cluster identities both represent valid, but different, approaches to identity management.  To some extent they also represent two different ways of thinking about entity identity.  I plan to discuss the concept of the entity identity further in the next post.

Identity Resolution Daily Links 2011-01-11

Tuesday, January 11th, 2011

By the Infoglide Team

BND.com: Insurance fraud investigators begin probe into workers’ comp claims at Menard

“A total of 389 guards and other workers have filed more than 500 claims, including about 290 still pending. About 230 of these claimed injury for the underlying cause of ‘repetitive trauma,’ including carpal tunnel syndrome, an injury of the wrist. The prison employs about 760 workers, of which 567 are guards. ‘The Department of Insurance is investigating recent questions raised in connection with workers’ compensation claims filed against the state of Illinois at the Menard Correctional Center,’ department spokesman Louis Pukelis said Tuesday in a written statement.”

HSToday: Fusion Centers: Tough Tightrope 

“As states and localities have put up fusion centers designed precisely to overcome this, however, they’ve had to face a different challenge: ensuring not only the quantity but the quality of information they collect and report. In candid conversations with Homeland Security Today, leading privacy advocates, scholars and state law enforcement and federal officials addressed some of the key facets of this challenge, as well as steps that can be taken to ensure that fusion centers live up to their full potential as a counterterrorism tool.”

StarNewsOnline: North Carolina collects big from Medicaid fraudsters

“North Carolina’s Medicaid fraud investigators pulled in millions last year through dozens of cases of fraud and patient abuse, the state’s attorney general’s office reported Monday. The office’s Medicaid Investigations Unit prosecuted 22 criminal convictions and 18 civil settlements, recovering $53.5 million, during the federal fiscal year that ended Sept. 30, according to a press release from N.C. Attorney General Roy Cooper.”

ReadWriteWeb: What Cloud Computing Means For Small Businesses

“Needless to say, it’s a huge deal. Gartner recently put cloud computing at the top of its list of top strategic technologies for 2011 and it’s far from the only expert extolling the glory of the Web-hosted software and infrastructure. For small businesses, the significance of this primarily comes down to cost. In many cases, using cloud-based infrastructure is cheaper than running and maintaining one’s own physical servers.”

Identity Resolution Daily Links 2011-01-09

Sunday, January 9th, 2011

[Post from Infoglide] You Can’t Handle the Truth

“We have a new Congress and a new House majority leader as of this week’s swearing in ceremony. The current House majority party (R) plans to pass a bill to repeal the ‘Obamacare’ bill passed during the last session by the former House majority party (D).  Both parties make ‘fact based’ arguments about why killing or keeping the bill will reduce the deficit, yet both can’t be right. This isn’t a political blog, and I’m not going to take a side on this issue. What struck me is how often we use ‘facts’ to bolster our argument, with ‘facts’ defined as any real data that can be massaged or misinterpreted to suggest that our desired outcome appears to be the best one.”

The Washington Post: The Navigator: Does Secure Flight program mean more money for the airlines?

“When she arrived at the screening area, her husband’s incorrect name had already been checked against a list of potential security threats and had passed. Once passengers receive their boarding passes, the Secure Flight process is already complete, according to the TSA.”

LinkedIn: Data Quality of Gender / Sex Codes and the Impacts on Identity Data Matching

“Identity matching requires matching practitioners to decide which collection of fields best allows the correct matching of one record with another. The choice can be made from fields such as name, date of birth, address details, sex / gender, and even unique identifier values (when they exist). The use of sex / gender in that process might be seen in a slightly different light.”

nj.com: Bill would allow people to buy New Jersey Lottery tickets electronically

“Under the bill, the commission would establish procedures for the payment of winning tickets holders, which may include crediting amounts won to a player’s account or direct deposit into a player’s account at a financial institution… The commission would also be directed to ensure that the program includes security measures to protect against fraud, prevent wagering by underage persons and protect the personal and financial information of players.”

You Can’t Handle the Truth

Friday, January 7th, 2011

By Mike Shultz, Infoglide Software CEO

We have a new Congress and a new House majority leader as of this week’s swearing in ceremony. The current House majority party (R) plans to pass a bill to repeal the “Obamacare” bill passed during the last session by the former House majority party (D).  Both parties make “fact based” arguments about why killing or keeping the bill will reduce the deficit, yet both can’t be right.

This isn’t a political blog, and I’m not going to take a side on this issue. What struck me is how often we use “facts” to bolster our argument, with “facts” defined as any real data that can be massaged or misinterpreted to suggest that our desired outcome appears to be the best one. Actual data is often plentiful but our preference for one alternative keeps us from embracing and promoting reality.

So mishandling the truth when you have all the facts you need is a conscious action. What happens when you think you have the data needed to make a rational decision but you aren’t conscious of important information that could totally change your perception? For example, we may have access to what look like sufficient pieces of information to reach a rational business decision, such as a driver’s license with a photo ID or a computed credit score based on the person’s history of business transactions.

However, what’s often missing from the decision process is knowledge about relationships between people. Understanding these relationships – who’s who, who knows who, and other non-obvious connections – can increase beneficial decisions in a colossal way, yet awareness of these relationships is rarely incorporated into the process.

Since entity resolution can increase the accuracy of business processes by an order of magnitude, our New Year’s resolution here at Infoglide is to introduce as many people as possible to its benefits.

Happy New Year!

Identity Resolution Daily Links 2011-01-04

Tuesday, January 4th, 2011

By the Infoglide Software Team

ebiz: Relevance of Enterprise Architecture to Cloud Computing

“Strategic decisions about cloud computing should both draw upon and inform the EA. An organization must have a mature and well formed understanding of its architecture components (e.g., business processes, services, applications and data) to make meaningful decisions related to cloud computing, such as whether a move to the cloud is advantageous, what services most lend themselves to a cloud deployment, and what cloud deployment model (e.g., private, public) makes the most sense. There are three key roles for EA in facilitating cloud computing strategy and planning…”

WRAL.com: State roots out $53M in Medicaid fraud

“‘Medicaid cheaters rob taxpayers, hurt needy patients and push medical costs higher for all of us,’ Cooper said in a statement. ‘We’re stopping the waste and abuse and making violators pay.’ During the federal fiscal year that ended Sept. 30, the Medicaid Investigations Unit of the state Attorney General’s Office won 22 criminal convictions and negotiated 18 civil settlements worth $53.5 million.”

AvStop.com: All Airline Passengers Now Screened Against Government Watchlists

“Under Secure Flight, the Transportation Security Administration (TSA) prescreens passenger name, date of birth and gender against terrorist watchlists before passengers receive their boarding passes. In addition to facilitating secure travel for all passengers, the program helps prevent the misidentification of passengers who have names similar to individuals on government watchlists. Prior to Secure Flight, airlines held responsibility for checking passengers against watchlists.”

Looking Back on 2010

Thursday, December 23rd, 2010

Looking back over the past year, we’re especially grateful for relationships we’ve built and grown with customers and partners. Despite a less than stellar economy, 2010 provided another good year of growth for Infoglide Software.

2010 also proved to be a year of accelerated visibility for identity resolution and entity analytics in general. Industry consolidation moves (e.g., IBM’s March acquisition of Initiate Systems) demonstrate the critical importance of entity resolution in the new era of Big Data that has been developing.

For the readers of IdentityResolutionDaily, please accept our thanks for your continuing interest and participation in the exciting growth of this market. 2010 promises to be a year of continued change and challenge, and we look forward to the opportunities it offers.

We’ll start with new posts again in January.

Happy Holidays, and Best Wishes for a Wonderful 2011!

Mike Shultz
CEO, Infoglide Software


Bad Behavior has blocked 1167 access attempts in the last 7 days.

Close
E-mail It
Portfolio Strategy News The Direct Marketing Voice