CrowdDB

Description:

The goal of this project is to develop a set of novel techniques that allow to integrate human resources into a database system in order to process some of the impossible queries that Google and Oracle cannot answer today and address some of the notoriously hard database research problems in a very different way as has been done in the past. Specifically, we plan to build an extended relational database system, called CrowdDB. We are working on the following problems:

  • Quality Assurance and Crowd Access Optimization - Ensuring quality is crucial in crowdsourcing due to uncertainty and errors associated with human decisions. A straightforward way of increasing quality is redundancy, i.e. assigning the same task to various workers. Nevertheless, redundancy can be expensive and not efficient if errors of workers are correlated. Hence, efficient crowd access optimization techniques need to be employed. In this work, we introduce a novel crowd model named as the Access Path Model, that leverages the notion of access paths as an alternative way of retrieving information.  For example in order to find a good traditional Swiss restaurant in Zurich one might ask local people, international friends, and\or web recommendations from tourists. We devise various optimization techniques that are based on this model and that can be used in various practical use cases for large-scale crowdsourcing. 
  • Crowdsourcing for Data Integration - There exist a variety of tasks that humans are better at than the computer. Some examples are: finding the same object in pictures that are taken from different angles or ranking records with human subjective measures. Still, people make mistakes and can introduce noise and uncertainty. Here, we examine fault tolerance mechanisms for entity resolution. We also study the trade-offs between cost and quality for general crowdsourcing problems such as sorting and comparisons. Currently, we are also looking at incremental solutions for such data integration and cleaning problems that can adapt and evolve over time.
          

Project Members:

  • Donald Kossmann, Anja Grünheid, Besmira Nushi

 

Master Thesis proposals:

 

Publications and talks

  • Besmira Nushi, Adish Singla, Anja Gruenheid, Andreas Krause and Donald Kossmann. "Crowd Access Path Optimization: Diversity Matters", Conference on Human Computing & Crowdsourcing (HCOMP) 2015

  • Anja Gruenheid, Besmira Nushi, and Donald Kossmann. "Cost-Efficient Querying Strategies for the Crowd." Big Uncertain Data Workshop, SIGMOD 2014 pdf

  • Besmira Nushi, Adish Singla, Anja Gruenheid, Andreas Krause and Donald Kossmann. Quality Assurance and Crowd Access Optimization: Why does diversity matter? ICML Workshop on Crowdsourcing and Human Computing 2014 slides

  • Anja Gruenheid, Donald Kossmann, Besmira Nushi, Yuri Gurevich. "When is A = B?" Bulletin of the EATCS 111 (2013) pdf

  • Anja Gruenheid and Donald Kossmann. "Cost and Quality Trade-Offs in Crowdsourcing." DBCrowd  Workshop VLDB 2013 1025 (2013): 43-46. pdf slides

  • Anja Gruenheid, Donald Kossmann, Sukriti Ramesh, Florian Widmer. "Crowdsourcing Entity Resolution: When is A=B?" Technical Report Nr. 785. (2012) pdf

  • Michael J. Franklin, Donald Kossmann, Tim Kraska, Sukriti Ramesh, and Reynold Xin. "CrowdDB: answering queries with crowdsourcing." In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pp. 61-72. ACM, 2011. pdf

  • Amber Feng, Michael Franklin, Donald Kossmann, Tim Kraska, Samuel Madden, Sukriti Ramesh, Andrew Wang, and Reynold Xin. "Crowddb: Query processing with the vldb crowd." Proceedings of the VLDB Endowment 4, no. 12 (2011). Demo, pdf

 

Available Datasets

  •  Landmarks - Dataset with 266 original pictures separated into 13 landmark categories showing buildings and places in Paris, France, and Barcelona, Spain. 

 

Opportunities:

We are always searching for students interested in writing their Master's Thesis or doing lab projects with us. If you're interested in crowdsourcing or data integration just send us an email!
 

Former members
Sukriti Ramesh

Master's Thesis in this project:
Lynn Aders - Joins based on the Access Path Model for Crowdsourced Databases (2013) pdf
Adiya Abisheva - Crowdsourced Order: Getting Top N Values From the Crowd (2012) pdf
Erfan Zamanian -  Query Optimization in CrowdDB (2012) pdf 
Florian Widmer - Memoization of Crowd-sourced Comparisons (2012) pdf
Sukriti Ramesh - CrowdDB – Answering Queries with Crowdsourcing (2011) pdf