Thursday, 28. March 2019, 10:00-11:00 in CAB E 72
Speaker: Theo Rekatsinas (University of Wisconsin)
Title: A Machine Learning Perspective on Managing Noisy Data
Abstract:
Modern analytics are very dependent on high-effort tasks like data preparation and data cleaning to produce accurate results. It is for this reason that the vast majority of the time devoted on analytics projects is spent on high-effort tasks like data preparation and data cleaning.
This talk describes recent work on making routine data preparation tasks dramatically easier. I will first introduce a noisy channel model to describe the quality of structured data and demonstrate how most work on noisy data management by the database community can be cast as a statistical learning and inference problem. I will then show how this noisy channel model forms the basis of HoloClean, a weakly supervised ML system for automated data cleaning. I will close with additional examples of how a statistical learning view can lead to new insights and solutions to classical database problems such as constraint discovery and consistent query answering.
Short Bio:
Theodoros (Theo) Rekatsinas is an Assistant Professor in the Department of Computer Sciences at the University of Wisconsin-Madison. He is a member of the Database Group. He earned his Ph.D. in Computer Science from the University of Maryland and was a Moore Data Postdoctoral Fellow at Stanford University. His research interests are in data management, with a focus on data integration, data cleaning, and uncertain data. Theo's work has been recognized with an Amazon Research Award in 2018, a Best Paper Award at SDM 2015, and the Larry S. Davis Doctoral Dissertation award in 2015.
----
----