HOME

If Only Data Quality Were That Simple

By Robert Barker, Infoglide Senior Vice President & Chief Marketing Officer

In a set of three recent posts, industry analyst Philip Howard of Bloor Research compares different types of data quality technologies. Setting aside the fact that the focus was on two specific companies, let’s examine the key conclusions.

First, “next generation” data quality solutions must employ “improved matching with less human involvement.” While no one will argue that “better results, less cost” should be (and is!) a goal of all matching technologies, implying that mathematical modeling and semantic analytics alone can solve every problem ignores the breadth of requirements and attribute types across multiple industries. For example, a solution that’s great at matching product data may fail miserably at identifying insider trading on Wall Street.

Another key point made in the posts: most products require the user to “tinker around with your guesses and see if your match percentage improves” and that “means a lot of manual work, not just to begin with but on an on-going basis.” In reality, users often list configurability as a top criterion in choosing a solution for complex problems, and our experience is that in most instances the amount of ongoing adjustment after the initial learning phase is minimal. “One size fits all” works OK with t-shirts but not so well with data.

A final conclusion is that “all the leading products have been built using out-of-date technology that has now been superseded.” In point of fact, both companies cited have been around for years, and all leading companies (including mine) continually evolve their techology. Perhaps more importantly, a key requirement for all but the simplest problems is a solution that can incorporate newer, better analytics as they emerge, rather than locking the customer into a single “my way or the highway” approach that works well for some classes of data attributes but not so well on others.

The most effective approach blends several best-of-class techniques, and it scales without compromising performance. A multifaceted solution combines an extensive rules base for nicknames and abbreviations, heuristics, semantics, and a large array of public and proprietary algorithms and other types of analytics. As important as matching is, a strong solution will enable easy integration with existing systems and can evolve as requirements grow and new analytics emerge.

Stimulating conversation about the range of solutions available to address data quality problems is a highly desirable activity. However, considering only one or two vendors (including Infoglide!) for any solution can limit your thinking about how best to address your unique requirements.

Leave a Reply


Bad Behavior has blocked 1166 access attempts in the last 7 days.

Close
E-mail It
Portfolio Strategy News The Direct Marketing Voice