In “Mining the Tar Sands of Big Data”, Matthew Driscoll and Roger Ehrenberg draw an apt parallel between the earth’s vast oil reserves and big data: until recently it wasn’t economically and technically feasible to mine these resources efficiently. In both cases, that’s changing.
The authors trace the growth in the amount of data generated to “advances in three principal areas: sensor networks, cloud computing, and machine learning.” Both physical (e.g., RFID) and software (e.g., tweets) sensors exist, and multiple forms are being deployed in products and processes each day, thus generating a tsunami of data that grows exponentially each year.
In fact, the growth in big data is even affecting the consumption of energy:
Just as these devices have multiplied, so have the data centers that they communicate with. Housed in climate-controlled warehouses, they consume an estimated 2 percent — and represent the fastest growing segment — of the United States energy budget.
We’ve covered and written about the impact and potential of cloud computing here before. By treating computing resources as a utility served up “by the drink”, cloud computing is another enabler of today’s spectacular increase in data generation.
Machine learning is the third promising factor listed by the authors that is related to the big data explosion:
Its algorithms lie at the heart of spam filters, self-driving cars, and movie recommendation systems, including one to which Netflix awarded its million-dollar prize to in 2009. While data storage and distributed computing technologies are being commoditized, machine learning is increasingly a source of competitive advantage among data-driven firms.
Those who want to exploit the availability of big data have another powerful tool at their disposal – entity resolution. The ability to search across multiple databases with disparate forms residing in different locations can tame large amounts of data very quickly, efficiently resolving multiple entities into one and finding hidden connections without human intervention in many application areas, including detecting financial fraud.
By exploiting advancing technologies like entity resolution, systems can give organizations a distinct competitive advantage over those who lag in technology adoption.
“The Aite Group recently authored a report entitled ‘Internal Fraud: The Devil Within.’ After surveying 35 fraud and product executives at financial institutions across the U.S. and Canada, they concluded that internal fraud is a severe and growing problem that often goes undetected and almost always flies under the radar of public scrutiny.”
“The second type of identity resolution is similar but different. The classic example is in police work. Here you want to know that some particular criminal has fifteen different aliases, say. Moreover, under each of those identities he or she will have multiple contacts and you may want to do social network analysis against those contacts to see who else might have criminal tendencies.”
“In October, the Chicago Police Department’s new crime-forecasting unit was analyzing 911 calls for service and produced an intelligence report predicting a shooting would happen soon on a particular block on the South Side. Three minutes later, it did, police officials say. That got police Supt. Jody Weis thinking. He wondered if the department could produce intelligence reports even quicker. Next time, officers might have an hour’s notice before a shooting — instead of just a few minutes.”
“The indictment alleges that, from approximately January 2004 to September 2007, the defendants perpetrated a scheme to defraud mortgage lenders by submitting fraudulent loan applications with material misrepresentations, including misrepresentations concerning the borrower’s income, assets, employment status, and intent to use the home as the borrower’s primary residence… The scheme involved more than $20 million in losses to lenders.”
“A contentious battle is heating up once again between State Farm Insurance Co. and a group of doctors that the insurer alleges have been involved in a multimillion-dollar insurance fraud scheme, according to a lawsuit. The suit claims the doctors submitted fraudulent claims based on ‘medically unnecessary diagnostic procedures’ used on those in car accidents.”
“A Booz & Company study recently quantified some of the projected benefits from a proposed e-health initiative in Australia: by 2020, the programme could eliminate up to 10,000 deaths caused by medication mistakes, along with up to 310,000 unnecessary hospital admissions, 2 million unnecessary outpatient visits, and 7 million laboratory tests.”
“In 2009, midsize businesses (53%) were mainly consumed with reducing costs and increasing efficiencies. The progress and momentum gained from these efforts continue to yield critical benefits and advantages for midsize businesses. Because of this momentum, they are now in a position to turn their attention to more forward-looking aspects of their business. This is demonstrated by the significant increase in focus on customers (+20 pts), innovation (+7 pts), and revenue growth (+5 pts).”
“A total of 389 guards and other workers have filed more than 500 claims, including about 290 still pending. About 230 of these claimed injury for the underlying cause of ‘repetitive trauma,’ including carpal tunnel syndrome, an injury of the wrist. The prison employs about 760 workers, of which 567 are guards. ‘The Department of Insurance is investigating recent questions raised in connection with workers’ compensation claims filed against the state of Illinois at the Menard Correctional Center,’ department spokesman Louis Pukelis said Tuesday in a written statement.”
“As states and localities have put up fusion centers designed precisely to overcome this, however, they’ve had to face a different challenge: ensuring not only the quantity but the quality of information they collect and report. In candid conversations with Homeland Security Today, leading privacy advocates, scholars and state law enforcement and federal officials addressed some of the key facets of this challenge, as well as steps that can be taken to ensure that fusion centers live up to their full potential as a counterterrorism tool.”
“North Carolina’s Medicaid fraud investigators pulled in millions last year through dozens of cases of fraud and patient abuse, the state’s attorney general’s office reported Monday. The office’s Medicaid Investigations Unit prosecuted 22 criminal convictions and 18 civil settlements, recovering $53.5 million, during the federal fiscal year that ended Sept. 30, according to a press release from N.C. Attorney General Roy Cooper.”
“Needless to say, it’s a huge deal. Gartner recently put cloud computing at the top of its list of top strategic technologies for 2011 and it’s far from the only expert extolling the glory of the Web-hosted software and infrastructure. For small businesses, the significance of this primarily comes down to cost. In many cases, using cloud-based infrastructure is cheaper than running and maintaining one’s own physical servers.”
“We have a new Congress and a new House majority leader as of this week’s swearing in ceremony. The current House majority party (R) plans to pass a bill to repeal the ‘Obamacare’ bill passed during the last session by the former House majority party (D). Both parties make ‘fact based’ arguments about why killing or keeping the bill will reduce the deficit, yet both can’t be right. This isn’t a political blog, and I’m not going to take a side on this issue. What struck me is how often we use ‘facts’ to bolster our argument, with ‘facts’ defined as any real data that can be massaged or misinterpreted to suggest that our desired outcome appears to be the best one.”
“When she arrived at the screening area, her husband’s incorrect name had already been checked against a list of potential security threats and had passed. Once passengers receive their boarding passes, the Secure Flight process is already complete, according to the TSA.”
“Identity matching requires matching practitioners to decide which collection of fields best allows the correct matching of one record with another. The choice can be made from fields such as name, date of birth, address details, sex / gender, and even unique identifier values (when they exist). The use of sex / gender in that process might be seen in a slightly different light.”
“Under the bill, the commission would establish procedures for the payment of winning tickets holders, which may include crediting amounts won to a player’s account or direct deposit into a player’s account at a financial institution… The commission would also be directed to ensure that the program includes security measures to protect against fraud, prevent wagering by underage persons and protect the personal and financial information of players.”
We have a new Congress and a new House majority leader as of this week’s swearing in ceremony. The current House majority party (R) plans to pass a bill to repeal the “Obamacare” bill passed during the last session by the former House majority party (D). Both parties make “fact based” arguments about why killing or keeping the bill will reduce the deficit, yet both can’t be right.
This isn’t a political blog, and I’m not going to take a side on this issue. What struck me is how often we use “facts” to bolster our argument, with “facts” defined as any real data that can be massaged or misinterpreted to suggest that our desired outcome appears to be the best one. Actual data is often plentiful but our preference for one alternative keeps us from embracing and promoting reality.
So mishandling the truth when you have all the facts you need is a conscious action. What happens when you think you have the data needed to make a rational decision but you aren’t conscious of important information that could totally change your perception? For example, we may have access to what look like sufficient pieces of information to reach a rational business decision, such as a driver’s license with a photo ID or a computed credit score based on the person’s history of business transactions.
However, what’s often missing from the decision process is knowledge about relationships between people. Understanding these relationships – who’s who, who knows who, and other non-obvious connections – can increase beneficial decisions in a colossal way, yet awareness of these relationships is rarely incorporated into the process.
Since entity resolution can increase the accuracy of business processes by an order of magnitude, our New Year’s resolution here at Infoglide is to introduce as many people as possible to its benefits.
“Strategic decisions about cloud computing should both draw upon and inform the EA. An organization must have a mature and well formed understanding of its architecture components (e.g., business processes, services, applications and data) to make meaningful decisions related to cloud computing, such as whether a move to the cloud is advantageous, what services most lend themselves to a cloud deployment, and what cloud deployment model (e.g., private, public) makes the most sense. There are three key roles for EA in facilitating cloud computing strategy and planning…”
“‘Medicaid cheaters rob taxpayers, hurt needy patients and push medical costs higher for all of us,’ Cooper said in a statement. ‘We’re stopping the waste and abuse and making violators pay.’ During the federal fiscal year that ended Sept. 30, the Medicaid Investigations Unit of the state Attorney General’s Office won 22 criminal convictions and negotiated 18 civil settlements worth $53.5 million.”
“Under Secure Flight, the Transportation Security Administration (TSA) prescreens passenger name, date of birth and gender against terrorist watchlists before passengers receive their boarding passes. In addition to facilitating secure travel for all passengers, the program helps prevent the misidentification of passengers who have names similar to individuals on government watchlists. Prior to Secure Flight, airlines held responsibility for checking passengers against watchlists.”
Looking back over the past year, we’re especially grateful for relationships we’ve built and grown with customers and partners. Despite a less than stellar economy, 2010 provided another good year of growth for Infoglide Software.
2010 also proved to be a year of accelerated visibility for identity resolution and entity analytics in general. Industry consolidation moves (e.g., IBM’s March acquisition of Initiate Systems) demonstrate the critical importance of entity resolution in the new era of Big Data that has been developing.
For the readers of IdentityResolutionDaily, please accept our thanks for your continuing interest and participation in the exciting growth of this market. 2010 promises to be a year of continued change and challenge, and we look forward to the opportunities it offers.
We’ll start with new posts again in January.
Happy Holidays, and Best Wishes for a Wonderful 2011!
“Professional analysts and law enforcement officers from more than 15 different agencies including the FBI, ATF, DEA, US Marshall’s, Homeland Security, and state and county partners work from one large room to put out intelligence products in a truly collaborative environment that defines New Jersey’s fusion center. Products include crime mapping with predictive analysis to help local departments know when and where crimes are likely to occur in the future.”
“Morgan’s prison sentence will be followed by three years of supervised release. Morgan was ordered to pay restitution of $2,804,462. Morgan, 64, was convicted in October 2008, of 69 counts of health care fraud, following a two-week jury trial in Albany. Michael J. Moore, U.S. attorney for the Middle District of Georgia, said the indictment charged that for a period of several years ending in August 2007, Morgan, a registered pharmacist and the owner of Thrift Center Pharmacy in Camilla, executed a scheme to defraud the Georgia Medicaid program, which is jointly funded with state and federal funds.
“TCSPs are often involved in some way in the establishment and administration of most legal persons and arrangements; and accordingly in many jurisdictions they play a key role as the gatekeepers for the financial sector. This report provides a number of case studies which demonstrate that TCSPs have often been used, wittingly or unwittingly, in the conduct of money laundering activities.”
“We talked a week ago about the rapidly emerging market space called Big Data. One statistic that opened my eyes is Gartner’s prediction that the volume of new data generated by enterprises will grow by 650% in the next five years, and 80% of that will be unstructured data! The 451Group’s definition of Big Data describes a growing need for non-traditional processes that can treat massive amounts of data as a whole, thereby making it impossible to use many traditional tools and techniques.”
“These tools will integrate many of the agency’s pilot programs into the National Fraud Prevention Program and complement the work of the joint HHS and Department of Justice Health Care Fraud Prevention and Enforcement Action Team (HEAT). ‘Preventing fraud is more effective than the old ‘pay and chase’ model of fighting fraud after a sham provider has been paid and disappeared,” CMS administrator Donald Berwick said in a statement. “By using new predictive modeling analytic tools we are better able to expand our efforts to save the millions — and possibly billions — of dollars wasted on waste, fraud, and abuse.’”
“Concerns that internal initiatives, and the CIO’s clout, will be gutted and most funds redirected to the cloud are overstated–for now. But we are at an inflection point: IT has money to spend, but it can’t be allocated using the same old budget process that’s kept us in a rut of dedicating a third or more of our resources to keeping the lights on. Business leaders have little patience for high-priced, long-term IT slogs. They’ve seen massive 18-month projects fail and experienced success with lightweight software-as-a-service offerings. CIOs must look at each expenditure and think, ‘Will this buy us flexibility and advance the business?’”
Infoglide Software provides entity resolution and analysis solutions for retail, banking, insurance, government, and law enforcement. Without the need for data cleansing or warehousing, Infoglide Software's Identity Resolution Engine™ (IRE) analyzes all of the information relating to individuals and/or entities from multiple sources of data and then applies...