The 451Group’s definition of Big Data describes a growing need for non-traditional processes that can treat massive amounts of data as a whole, thereby making it impossible to use many traditional tools and techniques. Data is voluminous, complex, and very dynamic, yet business drivers demand that it be captured, managed, and harnessed to benefit the organization.
While entity resolution (ER) software is technologically mature, the evolving requirements for managing Big Data fit ER perfectly. For example, Infoglide’s Identity Resolution Engine (IRE) scales to meet Big Data requirements, and together with its flexibility in handling ambiguous unstructured and structured data with missing elements makes it an ideal solution for wringing value from the “data deluge” we increasingly find ourselves in.
One of the unique problems associated with Big Data is its multiple disparate sources that include email, Word documents, spreadsheets, and social media such as IM, newsfeeds, Facebook, and LinkedIn, just to name a few. Again, entity resolution systems like IRE now include support for multiple data forms and have created special ways to incorporate social media.
So, while Big Data presents a daunting challenge for many organizations, flexible technologies like entity resolution represent a key element of any solution.
Early this year, Gartner suggested that a “data deluge” has begun. In his recent Dataspora Blog post about “Big Data” and what it means, author Michael Driscoll presents a unique and interesting perspective on the massive amounts of data being generated and stored. According to The 451 Group’s definition,
“Big data is a term applied to data sets that are large, complex and dynamic (or a combination thereof) and for which there is a requirement to capture, manage and process the data set in its entirety, such that it is not possible to process the data using traditional software tools and analytic techniques within tolerable time frames.”
While the term “Big Data” continues to evolve, no one argues that there are unique problems associated with capturing and using it, and part of the challenge derives from the multiple disparate sources of data.
Source: Avanade Global Survey: The Business Impact of Big Data, November 2010
“The era of Big Data has only just begun. In the latest edition of Database Trends & Applications, I provided a series of predictions about the year ahead, with an emphasis on data management. Here are 10 of them…”
“The segment of the cloud Salesforce leads, SaaS (software as a service), has grown from a tiny sliver of the enterprise software market just a few years ago to 10 percent in 2009, according to Gartner, which predicts that slice will expand to 16 percent by 2014. Even more dramatic is the firm’s projection that 85 percent of all new software will be delivered as a service by 2010.”
“In 2005 the United Kingdom embarked on the largest investment ($18 billion) in health information technology in the world. Yet despite expectations that the system would increase efficiency and reduce medical errors, their efforts neither improved health nor saved money — in fact in some cases, they may have led to patient harm. Britain’s government-run medical system is obviously different from our complex public-private insurance system.”
“‘This is where Secure Flight, the government-run program that is now vetting passengers before they receive their boarding passes, comes in. It replaces a more ad hoc system run by the carriers. ‘Prior to Secure Flight, the airlines themselves were responsible for matching all of their passengers against the watch lists, so each airline had their own system for doing that,’ said Greg Soule, a spokesman for the Transportation Security Administration. ‘Secure Flight takes the passenger watch list matching process away from the airlines and puts it all in one program under TSA, so it is a more consistent process across the board.’”
“Even if EHRs reduce the risk of errors overall, they may produce entirely new ones, Edward Fotsch, CEO of PDR Network, which will provide network operations for the new reporting system, tells the Health Blog. For example, EHRs may cut the risk of failing to alert a patient to an abnormal test result, but confusing user interfaces may produce their own mistakes and need tinkering.”
“Even with these differences, a human can rapidly determine that they refer to the same individual for two reasons. The first is that the values that differ across the pair of records are not too different from each other, and the second is that there seems to be enough support from across each pair of attributes to assert some degree of similarity.”
We’re currently in the heat of the election season. No matter how impeccable the record of any candidate that the major parties put forward, minions of the opposing parties go to great lengths to uncover an embarrassing incident that can be exposed (or even an incident that can be twisted to appear embarrassing) in order to influence voters away from voting for that candidate. While the populace is reasonably good at figuring these tricks out, even more disturbing are the stories involving voter fraud.
Take the case of absentee ballots being requested by someone other than the actual person. According to a recent news story, this has happened in multiple places across the country already this year, including the states of Pennsylvania, New York, Nevada, and Florida (surprise). A common theme is that votes are cast using absentee ballots in the names of people who later say they never requested the ballots and didn’t vote absentee.
Some fraudulent “voters” were found to have made up creative excuses about why they needed to vote absentee. In other cases, a person knocked on the door and convinced the person answering to sign a “petition” that later turned out to be an absentee ballot.
Given what Infoglide does, it’s easy to speculate how to solve these problems using identity resolution technology. Many data sources found in the surface web, dark web, and social media contain a wealth of information about every voter, so it shouldn’t be that hard to construct a solution. For example, imagine if when someone asked for an absentee ballot, their identifying information was entered and they were asked a random question about their background that another person would be unlikely to know?
While anything is possible with existing technology, the harder problem is getting the affected political parties to come to a consensus on a solution. Now that would be something to get behind!
“On the one hand, recognition of the power that entity resolution can bring to bear on challenging problems both in the commercial and public realms continues to increase. On the other hand, resistance to change and lack of budget seem to be inhibiting dramatic increases in productivity and effectiveness that could be gained by a more rapid uptake of this new technology.”
“Cloud computing, social computing, context-aware computing, and pattern-based strategy are the four big trends that will alter IT in the next few years, according to Peter Sondergaard, SVP of Research for Gartner… Each of these trends is disruptive, he said, but the combination is an ‘unimaginable force’ that will transform not just IT, but business and government.”
“The TSA estimates that only about 1 percent of travelers won’t make it through security because of a discrepancy, Kimball says. Although it’s unlikely you won’t be able to fly because of a mishap, you still might be delayed if your ID and ticket don’t match up. That hold-up will likely be less than five or 10 minutes while screeners verify your ID and boarding pass, Orbitz’s Tornatore estimates.”
“Several years ago, identity resolution was almost exclusively tied to detecting fraud. Over time, the ‘identity’ of identity resolution has continued to evolve and broaden. Many areas of commerce are discovering that efficiency can be improved dramatically when you have a clear picture of the individuals you’re dealing with and their social network. Of course, identity resolution is not the only way to gain that efficiency.”
“In a recently published article, Law School Professor Gregory M. Duhl and attorney Jaclyn S. Millner, focus on the issues of professional responsibility, discovery, privacy and evidence when social networking factors integrate with a workers’s compensation proceeding. Since the compensation system is theoretically no-fault and the evidentiary system is informal, the authors theorize that the workers’ compensation arena will act as a fertile ground for experimentation in the legal application of this new technology.”
“‘We will continue to work with our state and federal partners to police abuse of the program that so many people depend on,’ Coakley said in prepared remarks. Over the last three years, Coakley’s Medicaid fraud division has recovered approximately $125 million for the state Medicaid program, according to the AG’s office.”
“In the direct matching, transitive linking, and association analysis methods discussed in previous posts, the evidence for establishing a link comes from the references themselves, either as attribute values or relationships with other references. A link created in this way is also called an inferred link. But in almost any ER context, some pairs of equivalent references (i.e. that refer to the same entity) will have insufficient evidence available in the references themselves to make that determination, thereby leaving them as unlinked false negatives.”
“Doing ‘Social Master Data Management’ will become an integrated part of customer master data management offering both opportunities for approaching a ’single version of the truth’ and some challenges in doing so. Of course privacy is a big issue.”
“Total cloud-related information and communications technology spending among SMBs globally surpassed $52 billion in 2009, representing just 6 percent of total worldwide SMB ICT spending. But AMI predicts that that will nearly double over a five-year period.”
“According to court documents, between Jan. 1, 2004 and Dec. 31, 2008, Lemine, owner of Sorrento Grocery in Sorrento, Fla., cashed more than $4 million in checks from a local construction company in return for a fee of between 1 and 1.5 percent of the checks’ face value. He did so knowing that the owners of the construction company were attempting by cashing the checks through the grocery to conceal their employment of illegal aliens, avoid paying worker’s compensation and employment taxes, and hide income from state and federal tax officials.”
Infoglide Software provides entity resolution and analysis solutions for retail, banking, insurance, government, and law enforcement. Without the need for data cleansing or warehousing, Infoglide Software's Identity Resolution Engine™ (IRE) analyzes all of the information relating to individuals and/or entities from multiple sources of data and then applies...