Few would find a problem with large contributions coming to political candidates and committees from a single household. But over $200,000 in contributions since 2004 from a single middle-class household might raise some eyebrows, especially when the head of that household is reportedly a mail carrier.
The Paw family of Daly City, California was highlighted in a Wall Street Journal article. The Paws were noted for their apparent connections with Norman Hsu of New York, a one-time fugitive and major campaign fundraiser. Hsu’s status as a “bundler” meant that he brought big money to campaigns. While all of the details still aren’t clear, it is undeniable that Hsu faces Federal fraud charges for his business dealings and campaign funding activities.
Campaigns are required to report contributions they receive to the Federal Elections Commission (FEC). That data is publicly available through several sources, including public interest groups, political websites, and the FEC itself.
But campaign data is like all data. It suffers from the same problems that we see in industries such as retail, insurance, and banking — it is “noisy”. Variations in formatting and mistakes in typing cause straight “equality” comparisons to fail. And when you can’t count on the data to agree, you can’t make decisions.
Traditional, exact searches of the public sources yield conflicting results that range from variations in the spelling of contributor names and their employers to differing cities. Exact matching yields only a portion of the contributions that were apparently from their household.
You cannot count on common database queries or even “similarity” measures like soundex to reveal matches across variations such as:
- Multipart names, nicknames, and misspellings
- Address abbreviations (e.g., “Parkway” versus “Pkwy.”) and misspellings “6300 Bridgepoint” versus “6300 Bridge Point”)
- City misspellings and miscodings
- Zip code misspellings and formatting (e.g., “78730″ versus “78703″ versus “787303824″)
Advanced similarity search algorithms, however, can resolve most of these inconsistencies.
And in the cases in which field-by-field similarity comparisons fail, all is not lost. Advanced relationship detection and identity resolution techniques can see past inconsistencies and still connect the dots. So contributions attributed to residents of a specific address but related cities can still be resolved to the same individuals. And the connections can be exposed, analyzed, and highlighted.
Here’s a scenario for how identity resolution software might have automatically highlighted the Hsu connection for further scrutiny:
- Similarity search would be applied to group contributions by identical household to find the highest contributors.
- The total for the household would be above average. It would be flagged as a household of interest.
- Relationships of household members to others would be examined.
- Some contributions report variations of “Next Components, Ltd.” as an employer.
- Another person related to “Next Components, Ltd.” is Norman Hsu.
- Hsu repeatedly made contributions to the same committees on the same days as the household.
Identity resolution technology, like that of Infoglide Software’s Identity Resolution Engine™, could have detected this as an interesting pattern warranting further investigation. It cannot unequivocally determine that these patterns indicate that there is criminal behavior involved vs. just an innocent coincidence where two co-workers have similar political leanings. But for organizations and individuals seeking potentially dirty needles in a haystack of dirty data, it could narrow down the search significantly.
All of this is revealed from the data, if enough brute force or hindsight — or the right technology — is brought to bear.
It does not take sophisticated technology to tell some parts of the story, however. When the scandal broke, campaigns had to field uncomfortable questions, return hundreds of thousands of dollars in contributions, and find new contributors to replace the old.
Too bad the campaigns didn’t use technology to help them “know their contributors” and avoid a black eye. But others can learn from their woes and protect their organizations from similar problems.
Other industries faces analogous, if not identical, issues.
It’s a good thing that the solution to these problems exists. Is your organization using it? Or is it waiting for an Hsu-sized black eye before it takes action?
Share This