In “Mining the Tar Sands of Big Data”, Matthew Driscoll and Roger Ehrenberg draw an apt parallel between the earth’s vast oil reserves and big data: until recently it wasn’t economically and technically feasible to mine these resources efficiently. In both cases, that’s changing.
The authors trace the growth in the amount of data generated to “advances in three principal areas: sensor networks, cloud computing, and machine learning.” Both physical (e.g., RFID) and software (e.g., tweets) sensors exist, and multiple forms are being deployed in products and processes each day, thus generating a tsunami of data that grows exponentially each year.
In fact, the growth in big data is even affecting the consumption of energy:
Just as these devices have multiplied, so have the data centers that they communicate with. Housed in climate-controlled warehouses, they consume an estimated 2 percent — and represent the fastest growing segment — of the United States energy budget.
We’ve covered and written about the impact and potential of cloud computing here before. By treating computing resources as a utility served up “by the drink”, cloud computing is another enabler of today’s spectacular increase in data generation.
Machine learning is the third promising factor listed by the authors that is related to the big data explosion:
Its algorithms lie at the heart of spam filters, self-driving cars, and movie recommendation systems, including one to which Netflix awarded its million-dollar prize to in 2009. While data storage and distributed computing technologies are being commoditized, machine learning is increasingly a source of competitive advantage among data-driven firms.
Those who want to exploit the availability of big data have another powerful tool at their disposal – entity resolution. The ability to search across multiple databases with disparate forms residing in different locations can tame large amounts of data very quickly, efficiently resolving multiple entities into one and finding hidden connections without human intervention in many application areas, including detecting financial fraud.
By exploiting advancing technologies like entity resolution, systems can give organizations a distinct competitive advantage over those who lag in technology adoption.
Westminster Group PLC (LON:WSG) said it has secured distribution rights in the UK and the Middle East for the identity fraud detection software of Texas-based Infoglide Software. The commercial details of the deal were not disclosed. Westminster expects the product to be popular with national security agencies across the Middle East in particular. Infoglide’s identity resolution technology searches, matches and links entities across multiple, disparate data sources using over 50 algorithms.”
“Dr. Wayne, a 50-year-old osteopath, denies abusing the system and hasn’t been accused of wrongdoing by authorities. He says his regimen ‘does wonders’ if used correctly. He adds that he gave physical therapy to ‘patients who needed it, with appropriate diagnoses, and I should get paid for it.’ Medicare administrators apparently felt otherwise. In 2009 he says he was placed on heightened scrutiny and eventually sold his business. But not until he had received more than $2.6 million from Medicare between 2007 and 2009, according to the person familiar with the matter.”
“Unlike oil reserves, data is an abundant resource on our wired planet. Though much of it is noise, at scale and with the right mining algorithms, this data can yield information that can predict traffic jams, entertainment trends, even flu outbreaks. These are hints of the promise of big data, which will mature in the coming decade, driven by advances in three principal areas: sensor networks, cloud computing, and machine learning.”
“Specifically, on October 1, 2009, MARTINEAU purchased a residence at 211 Lloyd Street in New Haven after working with others to obtain an FHA-insured loan to buy the house at the fraudulently inflated price of $160,000. The loan package for this transaction included false information about the MARTINEAU’s employment, assets and liabilities, and MARTINEAU’s intention to occupy the property as her principal residence. The loan application also was supported by false documentation, including earning statements and fraudulent bank records.”
“By 2013 the amount of traffic flowing over the internet annually will reach 667 exabytes, according to Cisco, a maker of communications gear. And the quantity of data continues to grow faster than the ability of the network to carry it all.”
“The federal government pointed to the expansion of Medicare Fraud Strike Force teams as one reason for the increased recovery. In FY 2010, the total number of cities with strike force prosecution teams was increased to seven, all of which have teams of investigators and prosecutors dedicated to fighting fraud. The strike force teams use advanced data analysis techniques to identify high-billing levels in health care fraud hot spots so that interagency teams can target emerging or migrating schemes along with chronic fraud by criminals masquerading as health care providers or suppliers.”
“MDM is the latest attempt to solve the old problem of inconsistent versions of important data at the centre of an organization,” said Andrew White, research vice president at Gartner. “As with any new initiative, there is a lot of hype and confusion, and with hype and confusion comes misunderstanding. Executive sponsors of MDM and MDM program managers must avoid several common mistakes that have been known to derail MDM initiatives in the past.”
“The report notes that by 2020, much of this data will be held in cloud environments or will be “touched by cloud,” which means data that transits through a cloud service or is temporarily held in a cloud application. The report estimates that perhaps 15% of all data will be held in the cloud, and that around one-third will live in or pass through the cloud.”
“What is entity identity management? It simply means that an ER system can store and maintain a record of identity information that persists over time. Entity identity management is essential for an ER engine to operate in identity resolution or identity capture mode and for it to maintain persistent entity identifiers.”
“Why are they moving to the cloud? Rarely because it’s considered cheaper. In some cases, the cloud represents a faster, more flexible way to get a new system up and running. Oftentimes, it’s the ease of integration afforded by the cloud servers, using standard Web service practices, that lets a company launch a new mobile application faster or run a business process that cuts across many partners more efficiently.”
“A little over a year ago, Rand Corporation said that the Unique Patient Identifier would cost $11 billion, and pay off nationwide in reducing these sorts of medical errors, and in simplifying the nationwide effectiveness of the Electronic Health Record (EHR), which in turn can introduce a high level of efficiency, and a way to enforce patient privacy.”
“Enrollment in Personal Care Service requires approval from a qualified health care professional. This approval is missing in the cases of a ’substantial percentage’ of the 17,500 individuals who have received 24-hour care since 2000, claims the lawsuit. The lawsuit lists several examples of patients allegedly not properly assigned to the Personal Care Service. A 65-year-old woman was deemed to only need limited care, being of sound mind and body. Instead, she was provided with 24-hour care on the federal government’s bill.”
“A total of 389 guards and other workers have filed more than 500 claims, including about 290 still pending. About 230 of these claimed injury for the underlying cause of ‘repetitive trauma,’ including carpal tunnel syndrome, an injury of the wrist. The prison employs about 760 workers, of which 567 are guards. ‘The Department of Insurance is investigating recent questions raised in connection with workers’ compensation claims filed against the state of Illinois at the Menard Correctional Center,’ department spokesman Louis Pukelis said Tuesday in a written statement.”
“As states and localities have put up fusion centers designed precisely to overcome this, however, they’ve had to face a different challenge: ensuring not only the quantity but the quality of information they collect and report. In candid conversations with Homeland Security Today, leading privacy advocates, scholars and state law enforcement and federal officials addressed some of the key facets of this challenge, as well as steps that can be taken to ensure that fusion centers live up to their full potential as a counterterrorism tool.”
“North Carolina’s Medicaid fraud investigators pulled in millions last year through dozens of cases of fraud and patient abuse, the state’s attorney general’s office reported Monday. The office’s Medicaid Investigations Unit prosecuted 22 criminal convictions and 18 civil settlements, recovering $53.5 million, during the federal fiscal year that ended Sept. 30, according to a press release from N.C. Attorney General Roy Cooper.”
“Needless to say, it’s a huge deal. Gartner recently put cloud computing at the top of its list of top strategic technologies for 2011 and it’s far from the only expert extolling the glory of the Web-hosted software and infrastructure. For small businesses, the significance of this primarily comes down to cost. In many cases, using cloud-based infrastructure is cheaper than running and maintaining one’s own physical servers.”
“Strategic decisions about cloud computing should both draw upon and inform the EA. An organization must have a mature and well formed understanding of its architecture components (e.g., business processes, services, applications and data) to make meaningful decisions related to cloud computing, such as whether a move to the cloud is advantageous, what services most lend themselves to a cloud deployment, and what cloud deployment model (e.g., private, public) makes the most sense. There are three key roles for EA in facilitating cloud computing strategy and planning…”
“‘Medicaid cheaters rob taxpayers, hurt needy patients and push medical costs higher for all of us,’ Cooper said in a statement. ‘We’re stopping the waste and abuse and making violators pay.’ During the federal fiscal year that ended Sept. 30, the Medicaid Investigations Unit of the state Attorney General’s Office won 22 criminal convictions and negotiated 18 civil settlements worth $53.5 million.”
“Under Secure Flight, the Transportation Security Administration (TSA) prescreens passenger name, date of birth and gender against terrorist watchlists before passengers receive their boarding passes. In addition to facilitating secure travel for all passengers, the program helps prevent the misidentification of passengers who have names similar to individuals on government watchlists. Prior to Secure Flight, airlines held responsibility for checking passengers against watchlists.”
“We talked a week ago about the rapidly emerging market space called Big Data. One statistic that opened my eyes is Gartner’s prediction that the volume of new data generated by enterprises will grow by 650% in the next five years, and 80% of that will be unstructured data! The 451Group’s definition of Big Data describes a growing need for non-traditional processes that can treat massive amounts of data as a whole, thereby making it impossible to use many traditional tools and techniques.”
“These tools will integrate many of the agency’s pilot programs into the National Fraud Prevention Program and complement the work of the joint HHS and Department of Justice Health Care Fraud Prevention and Enforcement Action Team (HEAT). ‘Preventing fraud is more effective than the old ‘pay and chase’ model of fighting fraud after a sham provider has been paid and disappeared,” CMS administrator Donald Berwick said in a statement. “By using new predictive modeling analytic tools we are better able to expand our efforts to save the millions — and possibly billions — of dollars wasted on waste, fraud, and abuse.’”
“Concerns that internal initiatives, and the CIO’s clout, will be gutted and most funds redirected to the cloud are overstated–for now. But we are at an inflection point: IT has money to spend, but it can’t be allocated using the same old budget process that’s kept us in a rut of dedicating a third or more of our resources to keeping the lights on. Business leaders have little patience for high-priced, long-term IT slogs. They’ve seen massive 18-month projects fail and experienced success with lightweight software-as-a-service offerings. CIOs must look at each expenditure and think, ‘Will this buy us flexibility and advance the business?’”
“Results from the National Ambulatory Medical Care Survey (NAMCS) show that between 2009 and 2010, the percentage of physicians reporting having an electronic medical record/electronic health record (EMR/EHR) system that meets the criteria of a basic system increased by 14% and a fully functional system increased by 46%.”
“In the global marketplace, businesses, suppliers and customers are creating and consuming vast amounts of information. Gartner predicts that enterprise data in all forms will grow 650 percent over the next five years. According to IDC, the world’s volume of data doubles every 18 months. This flood of data, often referred to as “information overload,” “data deluge” and “big data,” clearly creates a challenge for business leaders.”
“The era of Big Data has only just begun. In the latest edition of Database Trends & Applications, I provided a series of predictions about the year ahead, with an emphasis on data management. Here are 10 of them…”
“The segment of the cloud Salesforce leads, SaaS (software as a service), has grown from a tiny sliver of the enterprise software market just a few years ago to 10 percent in 2009, according to Gartner, which predicts that slice will expand to 16 percent by 2014. Even more dramatic is the firm’s projection that 85 percent of all new software will be delivered as a service by 2010.”
“In 2005 the United Kingdom embarked on the largest investment ($18 billion) in health information technology in the world. Yet despite expectations that the system would increase efficiency and reduce medical errors, their efforts neither improved health nor saved money — in fact in some cases, they may have led to patient harm. Britain’s government-run medical system is obviously different from our complex public-private insurance system.”
“‘This is where Secure Flight, the government-run program that is now vetting passengers before they receive their boarding passes, comes in. It replaces a more ad hoc system run by the carriers. ‘Prior to Secure Flight, the airlines themselves were responsible for matching all of their passengers against the watch lists, so each airline had their own system for doing that,’ said Greg Soule, a spokesman for the Transportation Security Administration. ‘Secure Flight takes the passenger watch list matching process away from the airlines and puts it all in one program under TSA, so it is a more consistent process across the board.’”
“The technology to connect the dots from disparate data sources already exists, and has done for quite some time. It’s called “entity resolution,” and corporations have been using it for years to compile and ensure accuracy in consumer data. Entity resolution can help avoid many of the mistakes that led to the attempted Christmas bombing: it can overcome spelling errors in databases, alert the right people to a threat in real time, and correlate literally billions of records on an ongoing basis.”
“So-called false positives, such as when Senator Edward Kennedy of Massachusetts was barred from a flight in 2004 because his name matched an alias on a watch list, are eliminated under the new program, the agency has said. The computer system the government uses is more sophisticated than the one employed by airlines, and more detailed information is now collected from travelers, the security agency has said.”
Infoglide Software provides entity resolution and analysis solutions for retail, banking, insurance, government, and law enforcement. Without the need for data cleansing or warehousing, Infoglide Software's Identity Resolution Engine™ (IRE) analyzes all of the information relating to individuals and/or entities from multiple sources of data and then applies...