Identity resolution and the goal of good data
Some interesting claims by Gareth Herschel, research director with Gartner, got us thinking. In a post on Data Management News Herschel muses about how and why an organization deploys solutions for getting useful data out of potentially huge databases. He says that before an organization gets too deep into data-mining, they should ask themselves several questions including:
- What type of questions is an organization hoping to get answered with customer data mining?
- Who is going to be asking those questions — analysts, business users or both?
- How, and where, will the results of analysis be deployed into the business? Will they go into one specific application or potentially to many places — such as difficult call centers, point-of-sale or retail applications?
- Where is the data? Is it all cleansed in a data warehouse or spread out in different data marts? What form is it in?
and
Sure, his focus is on customer data mining, but the same issues seem applicable to identity resolution. In retail, the goals of identity resolution might be screening current or future employees. Or it might be resolving the fraudulent identities of a repeat shoplifter. In other scenarios, identity resolution might have the goal of accurately targeting security threats, say among mass-transit passengers. The point is that understanding exactly what your goals are in acquiring the data puts you one step closer to responsibly using the data.
That notion of responsibility really speaks to Herschel’s other questions, and they all point to the question of privacy. Regardless of a business’ goals in acquiring data, the privacy of identities within the data should always be maintained. That’s why knowing answers to “who goes looking for the data” and “how will the data be deployed” are key to protecting the security of the information.
Understanding those issues is even more important if you plan on using (or selling) secondary data. As Bruce Schneier points out in a Wired post, legislation and technology should come together and do what they can to protect our rights in secondary data, but people play an important role too. He reminds us that
“It’s easy to build systems that collect data on everything - it’s what computers naturally do - but it’s far better to take the time to understand what data is needed and why, and only collect that.”
So while acquiring various kinds of data, especially resolving identities across multiple data sources, can be vital to a business’ success, a great burden rests on the shoulders of those doing the acquisition. Figuring out your needs for the data and controlling the flow of that data, through direct human contact and/or a well designed vertical search engine, will help ensure not only that you get “good data” but that you are protecting individuals’ rights to privacy.
