Getting ducks into rows: the complicated world of data synchronization
It’s no secret that the information age is shrinking borders and expanding corporate reach. The means of acquiring that information is becoming simpler, and businesses and laypeople can access vast stores of information in seconds (or faster). Well… sometimes.
The fact of the matter is that information, especially the kind of data gathered by governments and enterprise-sized corporations, is often stored in lots of different places and in a variety of different formats. So maybe a more accurate statement would be that while the information age is definitely creating more information, it’s not necessarily putting that information in close reach.
Jeff Jonas recently argued on his blog that people are becoming “trapped by their data trail,” referring to the ubiquity of “historical data” and people’s increased access to it. However, he also points out that navigating the vast body of that data means figuring out ways to resolve the contradictions we mentioned above: lots of data, variety of places/formats. Jonas says that “we are really calling for custom crafted lenses to intentionally narrow our perceptions.”
The precision demanded by those kinds of “lenses” is incredible, especially in light of the sheer volume of digitally stored information out there. Bill Bittner, President of BWH Consulting, wonders if the synchronization of data, even through specialized lenses, is even possible. In a recent post on RetailWire, he argues that
“Data synchronization is built on the assumption that there exists one version of the truth, but the fact is that the supply chain is never standing still. The parts are always moving and variations between units will always exist.”
He’s talking predominately about retail data, but the argument can really be applied to any diverse collection of information.
One solution is the creation of a single data source, as Bittner says HP did in reducing “750 internal data marts into one data warehouse.” However, those kinds of data warehouses address requirements and support uses that are sometimes different from the needs for fraud detection, for example. Collapsing data into warehouses through ETL or similar means can risk losing lots of information that’s useful in fraud detection. Centralizing data in that way certainly streamlines the storage of that data, but it doesn’t always guarantee its completeness for all applications.
Perhaps a middle ground can exist for identity resolution applications. Instead of an overly-specialized “lens” through which very specific data is retrieved or the expensive and complicated endeavor of creating a centralized data warehouse, disparate data should be resolved into a unified collection. As the sources of information grow in size and complexity, that kind of solution will likely end up being a very important option for identity resolution solutions.
