Trouble with the Texas insurance database
Here in Texas, insured drivers spend “nearly $900 million a year to protect themselves against those without coverage,” according to a Terrence Stutz of the Dallas Morning News. Texas is trying to sniff out those 3-million or so uninsured motorists by creating a massive database, but it’s been slow going.
The delay is primarily because the proposed database, with records on the 16 million motorists in Texas, has yet to be put into action. However, the reasons behind the slow deployment demonstrate some common database concerns: avoiding false-positives and protecting individuals’ privacy.
Some potential good news for the Texas insurance industry is that those are great concerns to have. Jerry Hagins, a spokesman for the Texas Department of Insurance, said recently that “We don’t want people flagged for not having insurance when in fact they do.”
Those “flags” leads to false tickets or even arrests, and that leads to big lawsuits once the error comes into the light. False positives become an especially large concern since counterfeit insurance cards often make in-the-field verification tricky. One major aim of the database would be removing those fraudulent cards from play.
As we wrote about yesterday, one of the biggest factors in a database like this is the sheer volume of individual entries. However, one of the things that makes the data in a database like this one even more slippery is what Jeff Jonas calls “semantic reconciliation.” He explains that semantic reconciliation, the ability to determine that two entities are the same even though they are “described” differently, has to have perfect deployment even “before other analytical processes (e.g., statistical analysis, market segmentation, link analysis, etc.)”
The Texas insurance database (and similar ones in other states) has to be able perform identity resolution to determine - well - who is who. If it can, counterfeit insurance cards are less effective, there’s a vastly decreased risk of false-positives and the database will meet the “goal of at least 95 percent accuracy” that Hagins calls for. Based on the problems the media reports about the database, it seems that kind of data reconciliation is a long way from being ready.
