Big Oil and Big Data
Saturday, February 5th, 2011By Mike Betron, Infoglide Software Director of Marketing
In “Mining the Tar Sands of Big Data”, Matthew Driscoll and Roger Ehrenberg draw an apt parallel between the earth’s vast oil reserves and big data: until recently it wasn’t economically and technically feasible to mine these resources efficiently. In both cases, that’s changing.
The authors trace the growth in the amount of data generated to “advances in three principal areas: sensor networks, cloud computing, and machine learning.” Both physical (e.g., RFID) and software (e.g., tweets) sensors exist, and multiple forms are being deployed in products and processes each day, thus generating a tsunami of data that grows exponentially each year.
In fact, the growth in big data is even affecting the consumption of energy:
Just as these devices have multiplied, so have the data centers that they communicate with. Housed in climate-controlled warehouses, they consume an estimated 2 percent — and represent the fastest growing segment — of the United States energy budget.
We’ve covered and written about the impact and potential of cloud computing here before. By treating computing resources as a utility served up “by the drink”, cloud computing is another enabler of today’s spectacular increase in data generation.
Machine learning is the third promising factor listed by the authors that is related to the big data explosion:
Its algorithms lie at the heart of spam filters, self-driving cars, and movie recommendation systems, including one to which Netflix awarded its million-dollar prize to in 2009. While data storage and distributed computing technologies are being commoditized, machine learning is increasingly a source of competitive advantage among data-driven firms.
Those who want to exploit the availability of big data have another powerful tool at their disposal – entity resolution. The ability to search across multiple databases with disparate forms residing in different locations can tame large amounts of data very quickly, efficiently resolving multiple entities into one and finding hidden connections without human intervention in many application areas, including detecting financial fraud.
By exploiting advancing technologies like entity resolution, systems can give organizations a distinct competitive advantage over those who lag in technology adoption.

