Data Curation is for Big Data what Data Integration is for Small Data
Reporter: Aviva Lev-Ari, PhD, RN
Data Curation is for Big Data what Data Integration is for small data.
Tamr is an exciting new startup which wants to solve the data curation problem. It was co-founded in Fall 2012 as Data Tamer by two serial entrepreneurs – Michael Stonebraker, a legendary database researcher for whom it was a ninth startup, and Andy Palmer, who has been involved in founding and/or funding over 50 innovative companies. With such founders, the company has attracted a lot of financing – over $16 million from investors including Google Ventures and New Enterprise Associates (NEA), and a lot of attention, including a KDnuggets post Data Tamer startup from Michael Stonebraker, Still in Stealth Mode.
On May 19th, Data Tamer has emerged from stealth mode and renamed itself to Tamr.
Last week, I stopped by their offices in the heart of Harvard Square, Cambridge, and received a briefing from Andy Palmer, Tamr CEO, and his young team, including Alan Wagner and Nidhi Aggarwal.
Tamr’s approach to solving the Data Curation problem is designed to scale and to improve with more data. The key ideas are
1. Scalability through automation: The size of the integration problems precludes a human-centric solution. Machine Learning methods are needed.
2. Data Cleaning: Enterprise data sources are inevitably quite dirty.
3. Non-programmer orientation: Current Extract, Transform and Load (ETL) systems have scripting languages that are appropriate for professional programmers. The scale of next generation problems requires that less skilled employees be able to perform integration tasks.
4. Incremental: New data sources must be integrated incrementally as they are uncovered. Data Curation is never finished!
Tamr also smartly combines automation and human expertise.
It starts with using Machine Learning and Data Analysis algorithms to find relationships between data elements and tries to automate most data curation tasks. In cases when machine learning is not enough, it has well-defined processes and UI for asking human experts for help, and uses a smart rewards structure to encourage the experts.
SOURCE
http://www.kdnuggets.com/2014/05/tamr-new-frontier-big-data-curation.html
Leave a Reply