<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.2" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
	<title>Comments on: The Myth of Matching: Why We Need Entity Resolution</title>
	<link>http://identityresolutiondaily.com/493/the-myth-of-matching-why-we-need-entity-resolution/</link>
	<description>All About Identity and Entity Resolution</description>
	<pubDate>Sun, 12 Feb 2012 12:37:46 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.2</generator>

	<item>
		<title>By: Steve Sieloff</title>
		<link>http://identityresolutiondaily.com/493/the-myth-of-matching-why-we-need-entity-resolution/#comment-679</link>
		<author>Steve Sieloff</author>
		<pubDate>Mon, 13 Jul 2009 22:57:17 +0000</pubDate>
		<guid>http://identityresolutiondaily.com/493/the-myth-of-matching-why-we-need-entity-resolution/#comment-679</guid>
		<description>John --

Another great post and on point! I find it very interesting linking "point in time" occupancies to the current state location of an entity.  Public records, while fruitful, are spotty in availability and lack many standard data quality measures.  Name distributions per a given geography (zip or zip+4) are helping in making links between names with materially different addresses -- Zawarek Timonsky 123 Main St and Zawarek Timonsky 456 Elm Dr in same zip code where only one Zawarek first name is known and 3 Timonsky surnames known ... the unique combination creates a high degree of confidence we are talking same person -- even with differing addresses.

As for the example of St. in the street not always meaning Street, it is clear that the software causing the incorrect classification and standardization is not looking at both the keyword AND the pattern or semantics in which the keyword or phrase is referenced.  This type of semantic parsing and standardization is gaining traction in document classification and phrase searching (aka Google).

Keep up the thought provoking articles!

Steve</description>
		<content:encoded><![CDATA[<p>John &#8211;</p>
<p>Another great post and on point! I find it very interesting linking &#8220;point in time&#8221; occupancies to the current state location of an entity.  Public records, while fruitful, are spotty in availability and lack many standard data quality measures.  Name distributions per a given geography (zip or zip+4) are helping in making links between names with materially different addresses &#8212; Zawarek Timonsky 123 Main St and Zawarek Timonsky 456 Elm Dr in same zip code where only one Zawarek first name is known and 3 Timonsky surnames known &#8230; the unique combination creates a high degree of confidence we are talking same person &#8212; even with differing addresses.</p>
<p>As for the example of St. in the street not always meaning Street, it is clear that the software causing the incorrect classification and standardization is not looking at both the keyword AND the pattern or semantics in which the keyword or phrase is referenced.  This type of semantic parsing and standardization is gaining traction in document classification and phrase searching (aka Google).</p>
<p>Keep up the thought provoking articles!</p>
<p>Steve</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daragh O Brien</title>
		<link>http://identityresolutiondaily.com/493/the-myth-of-matching-why-we-need-entity-resolution/#comment-638</link>
		<author>Daragh O Brien</author>
		<pubDate>Sun, 29 Mar 2009 09:47:49 +0000</pubDate>
		<guid>http://identityresolutiondaily.com/493/the-myth-of-matching-why-we-need-entity-resolution/#comment-638</guid>
		<description>John,

Great post. You may recall the slides I've used at IAIDQ conferences about my name and how it got me into information quality at an early age.

13+ spelling variants, can be male/female, can be miskeyed as Tara, or mangled to be Darren, Daryn, Daryl (also a male/female name), Dora (hence my love of exploring). And let's not get started on my home address as a kid which seems to still confuse data quality tools (here's a hint... St. in an address is not always an abbreviation of "street"). I have other examples...

I think one of the mental gear-shifts that needs to be made when looking at these issues is to remember that data is a representation of a real world thing (in this case a person). It is not the thing itself. When we are elbows deep in the data it can be all too easy to loose sight of that.

Looking forward to the follow ups to this.</description>
		<content:encoded><![CDATA[<p>John,</p>
<p>Great post. You may recall the slides I&#8217;ve used at IAIDQ conferences about my name and how it got me into information quality at an early age.</p>
<p>13+ spelling variants, can be male/female, can be miskeyed as Tara, or mangled to be Darren, Daryn, Daryl (also a male/female name), Dora (hence my love of exploring). And let&#8217;s not get started on my home address as a kid which seems to still confuse data quality tools (here&#8217;s a hint&#8230; St. in an address is not always an abbreviation of &#8220;street&#8221;). I have other examples&#8230;</p>
<p>I think one of the mental gear-shifts that needs to be made when looking at these issues is to remember that data is a representation of a real world thing (in this case a person). It is not the thing itself. When we are elbows deep in the data it can be all too easy to loose sight of that.</p>
<p>Looking forward to the follow ups to this.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jim Harris</title>
		<link>http://identityresolutiondaily.com/493/the-myth-of-matching-why-we-need-entity-resolution/#comment-637</link>
		<author>Jim Harris</author>
		<pubDate>Sat, 28 Mar 2009 19:18:27 +0000</pubDate>
		<guid>http://identityresolutiondaily.com/493/the-myth-of-matching-why-we-need-entity-resolution/#comment-637</guid>
		<description>I just finished publishing a five part series of articles on data matching methodology for dealing with the common data quality problem of identifying duplicate customers.
 
Topics covered in the series:

•  Why a symbiosis of technology and methodology is necessary when approaching the common data quality problem of identifying duplicate customers 
•  How performing a preliminary analysis on a representative sample of real project data prepares effective examples for discussion 
•  Why using a detailed, interrogative analysis of those examples is imperative for defining your business rules 
•  How both false negatives and false positives illustrate the highly subjective nature of this problem 
•  How to document your business rules for identifying duplicate customers 
•  How to set realistic expectations about application development 
•  How to foster a collaboration of the business and technical teams throughout the entire project 
•  How to consolidate identified duplicates by creating a “best of breed” representative record 

Here is the link to article series on my blog:

http://www.ocdqblog.com/home/identifying-duplicate-customers.html

Best Regards...

Jim Harris</description>
		<content:encoded><![CDATA[<p>I just finished publishing a five part series of articles on data matching methodology for dealing with the common data quality problem of identifying duplicate customers.</p>
<p>Topics covered in the series:</p>
<p>•  Why a symbiosis of technology and methodology is necessary when approaching the common data quality problem of identifying duplicate customers<br />
•  How performing a preliminary analysis on a representative sample of real project data prepares effective examples for discussion<br />
•  Why using a detailed, interrogative analysis of those examples is imperative for defining your business rules<br />
•  How both false negatives and false positives illustrate the highly subjective nature of this problem<br />
•  How to document your business rules for identifying duplicate customers<br />
•  How to set realistic expectations about application development<br />
•  How to foster a collaboration of the business and technical teams throughout the entire project<br />
•  How to consolidate identified duplicates by creating a “best of breed” representative record </p>
<p>Here is the link to article series on my blog:</p>
<p><a href="http://www.ocdqblog.com/home/identifying-duplicate-customers.html" rel="nofollow">http://www.ocdqblog.com/home/identifying-duplicate-customers.html</a></p>
<p>Best Regards&#8230;</p>
<p>Jim Harris</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John Talburt</title>
		<link>http://identityresolutiondaily.com/493/the-myth-of-matching-why-we-need-entity-resolution/#comment-636</link>
		<author>John Talburt</author>
		<pubDate>Fri, 27 Mar 2009 16:38:08 +0000</pubDate>
		<guid>http://identityresolutiondaily.com/493/the-myth-of-matching-why-we-need-entity-resolution/#comment-636</guid>
		<description>Dan, thanks for your comment.  I find the confusion of ER with matching fairly common.  I am also seeing interest in finding the "concealed" connections in commercial practice starting to catch up with that of the government security agencies. BTW I am planning a graduate course in ER for this fall, if you or anyone other readers have suggestions for topics, please let me know. -jrt-</description>
		<content:encoded><![CDATA[<p>Dan, thanks for your comment.  I find the confusion of ER with matching fairly common.  I am also seeing interest in finding the &#8220;concealed&#8221; connections in commercial practice starting to catch up with that of the government security agencies. BTW I am planning a graduate course in ER for this fall, if you or anyone other readers have suggestions for topics, please let me know. -jrt-</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dan Power</title>
		<link>http://identityresolutiondaily.com/493/the-myth-of-matching-why-we-need-entity-resolution/#comment-634</link>
		<author>Dan Power</author>
		<pubDate>Thu, 26 Mar 2009 16:22:29 +0000</pubDate>
		<guid>http://identityresolutiondaily.com/493/the-myth-of-matching-why-we-need-entity-resolution/#comment-634</guid>
		<description>Thanks for an interesting, thought-provoking piece! 

The "deliberate attempt to conceal a connection" seems to be common in fraud, law enforcement and homeland security applications. 

It's important to remind people about one of your main points (matching being a necessary but not sufficient part of entity resolution). 

There's a lot more going on in true entity resolution than simple matching!</description>
		<content:encoded><![CDATA[<p>Thanks for an interesting, thought-provoking piece! </p>
<p>The &#8220;deliberate attempt to conceal a connection&#8221; seems to be common in fraud, law enforcement and homeland security applications. </p>
<p>It&#8217;s important to remind people about one of your main points (matching being a necessary but not sufficient part of entity resolution). </p>
<p>There&#8217;s a lot more going on in true entity resolution than simple matching!</p>
]]></content:encoded>
	</item>
</channel>
</rss>

