- More info here
The goal of AIM is to bridge the gap between write-optimized storage systems used for Online Transactional Processing (OLTP) and read-optimized structures used in Online Analytical Processing (OLAP) with a single shared data storage. The motivation for this is to allow data to be processed in REAL-TIME, which cannot be achieved by the traditional Data Warehousing approach where two separated storage systems are used and data is regularly shifted from the (write-optimized) OLTP to the (read-only) OLAP storage. More...
Anzere is a storage system which replicates a user's personal data (photos, music, etc.) across an ensemble of physical and virtual devices owned (or rented on demand from cloud infrastructures) by a single user. With Anzere we show how to flexibly replicate data at scale in response to a complex, user-specified set of replication policies. Anzere is built on the Rhizoma platform, and includes an overlay network, monitoring infrastructure, CLP solver, data replication based on PRACTI, and Paxos for consistency. Anzere currently runs on mobile phones, laptops and desktops, and VMs on PlanetLab and Amazon EC2. More...
VF2x - Distributed network testbeds like GENI aim to support a potentially large number of experiments simultaneously on a complex, widely distributed physical network by mapping each requested network onto a share or “slice” of physical hosts, switches and links. A significant challenge is network mapping: how to allocate virtual nodes, switches and links from the physical infrastructure so as to accurately emulate the requested network configurations.
- More info here
The limitations of today's computing architectures are well known: high power consumption, heat dissipation, network and I/O bottlenecks, and the memory wall. Field-programmable gate arrays (FPGAs), user-configurable hardware chips, are promising candidates to overcome these limitations. With tailor-made and software-configured hardware circuits it is possible to process data at very high throughput rates and with extremely low latency. Yet, FPGAs consume orders of magnitude less power than conventional systems. Thanks to their high configurability, they can be used as co-processors in heterogeneous multi-core architectures, and/or directly be placed in critical data paths to reduce the load that hits the system CPU. More...
Cloud computing has changed the view on data management by focusing primarily on cost, flexibility and availability instead of consistency and performance at any price as traditional DBMS do. As a result, cloud data storages run on commodity hardware, are designed to be scalable, easy to maintain and highly fault-tolerant often providing relaxed consistency guarantees. The success of key-value stores like Amazon's S3 or the variety of open-source systems reflect this shift. Existing solutions, however, still lack substantial functionality provided by a traditional DBMS (e.g., support for transactions and a declarative query language) and are tailored to specific scenarios creating a jungle of services. That is, users have to decide for a specific service and are later locked into this service, preventing the evolution of the application, leading to misuse of services and expensive migrations to other services. With Cloudy we have started to build our own highly scalable database, which provides a completely modularized architecture and is not tailored to a specific use case. For example, Cloudy supports stream processing, as well as SQL and simple key-value requests. More...
The goal of this project is to develop a set of novel techniques that allow to integrate human resources into a database system in order to process some of the impossible queries that Google and Oracle cannot answer today and address some of the notoriously hard database research problems in a very different way as has been done in the past. Specifically, we plan to build an extended relational database system, called CrowdDB. More...
The transport mechanisms offered by modern network cards that support remote direct memory access (RDMA) significantly shift the priorities in distributed systems. Complex and sophisticated machinery designed only to avoid network traffic can now be replaced by schemes that can use the available bandwidth to their advantage. One such scheme is Data Cyclotron, a research effort that we pursue jointly with the database group at CWI Amsterdam. Based on a simple ring-shaped topology, Data Cyclotron offers ad-hoc querying over data of arbitrary shape and arbitrary size. More...
The purpose of this project is to investigate the possibilities and the usefulness of approaches to describe the format of the data stream, both in terms of structural (e.g., event sequence) as well as dynamic (e.g., rates) aspects. The information derived from these descriptions should be used to enable optimisations in a declarative stream processor. These optimisations are targeted to reduce response times, memory overhead and processing cost.
Project members: Peter Fischer, Kyumars Sheikh Esmaili, Donald Kossmann
The DejaVu project explores scalable complex event processing techniques for streams of events. The goal is to provide a system that can seamlessly integrate pattern detection over live and historical streams of events behind a common, declarative interface. We are investigating various optimization ideas for efficient data access and query execution. More...
flowSGI combines the paradigm of Fluid Computing with the dynamics of the OSGi service platform. Data and applications can be shared and kept synchronized among different peers, including small mobile devices. Offline operations on data are permitted; changes are reconciled as soon as a network connection is available again. flowSGi includes the following subprojects:
By building a platform for personal health data, this project wants to empower the citizens by giving them control over their own health data. The platform facilitates exchange of and granting access to personal health records. This makes it possible to aggregate data from diverse data sources, thereby generating a more complete picture of personal health than ever before. For a single person, this enables personalized medicine. And when data of many citizens is aggregated and anonymized, medical research can be conducted at a scale that has not been there before. More...
We are building a data appliance for Rack-scale Computers (RaSC) that leverages the benefits of cross-layer optimization and provides support for heterogeneous workloads. To achieve that we separate the storage from the data processing layer. The two layers communicate over a scalable interconnect fabric, at the moment focusing on RDMA over InfiniBand. More...
iMeMex
iMeMex is a first implementation of a Personal DataSpace Management System (PDSMS). It allows users to handle all their data available from different graph-based data formats such as email, files&folders, RSS, and databases. Moreover, iMeMex allows users to define information-integration semantics in a pay-as-you-go fashion. Project members: Lukas Blunshi
JOpera is a rich composition environment for many kinds of services (including Web services). It supports the visual design and autonomic enactment of complex distributed business protocols and conversations. More info here
Today, with the growing use of mobile devices constantly connected to the Internet, the nature of user-generated data has changed: it has become more real-time. People share their thoughts and discuss breaking news on Twitter and Facebook; they share their current locations and activities on location-based social networks such as Foursquare. The difference is that, today, people share more often and the lifespan of the data has become shorter.
Analyzing this data leads to new requirements for analytical systems: real-time processing and database intensive workloads. Driven by these requirements, we have developed Limmat. Limmat extends a key-value store architecture with push-based processing, transactional task execution, and synchronization. We modified the MapReduce programming model to support push-style data processing.:
Project members: Donald Kossmann, Maxim Grinev, Maria Grineva, Martin Hentschel
Mapping Data to Queries (MDQ) is a radically different approach to process data with many different schemas. MDQ differs from traditional approaches to data integration by integrating data at the latest possible point in time, at runtime of a query. This opens up a great potential for optimization because at query runtime both, the data and the query, are known and we can exploit this knowledge to only apply fewer mapping rules that traditional approaches. Consequently, MDQ scales well with the number of schemas and outperforms traditional approaches by orders of magnitude in extreme cases.
Project members: Martin Hentschel, Laura Haas (IBM Almaden) Donald Kossmann, and Renée Miller (University of Toronto)
MAND
One key difference between Mobile Ad Hoc Networks (MANETs) and other network types is that MANETs lack central infrastructure components necessary to build common directory-based services such as SLP, SIP, DNS, etc. This poses complicated challenges to protocol design and software architecture for such networks. MAND (Mobile Ad hoc Network Directory) is an infrastructure for the distribution, storage, and lookup of key/value pairs (tuples) in ad hoc networks. The key insight in MAND is to piggyback tuples and requests on the messages that routing protocols exchange to build and maintain routes in the ad hoc network. Using MAND we have build AdSocial, a social networking application running in ad hoc networks of Nokia N810 handheld devices.
Project members: Oriana Riva, Gustavo Alonso
MASTER (http://www.master-fp7.eu/) is a collaborative project funded under the EU 7th Research Framework Programme. It aims to provide methodologies and infrastructures which facilitate monitoring,enforcement, and auditing of security compliance, especially where highly dynamic service oriented architectures are used to support business process enactment in single, multi-domain, and iterated contexts. ETH's role in MASTER is investigate the use of event processing systems for monitoring purposes, in particular the aspects of expressiveness, dependability and lifecycle management
Project Members: Tahmineh Sanamrad
Former Project members: Peter Fischer, Kyumars Sheikh Esmaili
Modularization of Database Engines
The architecture of current Data Management Systems is mostly monolithic, highly intertwined, and has not really changed since the relational model was first proposed and implemented. Advances in hardware and computing platforms are making it almost impossible to continue operating large data management systems with such an architecture. Exploiting multi-core or cluster based systems requires to increase parallelism and have a far more loosely couple architecture. In this project we explore alternative architectures for database systems, architectures that are better suited to the new hardware platforms. Currently we are focusing our efforts on exploiting modular software design as the basis for a component based database engine that can be dynamically adapted and configured. To see the impact of modularization on current
database engines and processors we are working on refactoring an existing open source database engine, port the modules to R-OSGi, and evaluating the performance and functionality of the resulting system.
Project members: Ionut Subasu, Jan S. Rellermeyer, Gustavo Alonso
With the exponential growth of moving objects data to the Gigabyte range, it has become critical to develop effective techniques for indexing, updating, and querying these massive data sets. To meet the high update rate as well as low query response time requirements of moving object applications, this project takes a novel approach in moving object indexing. The resulting technique MOVIES aims to be the first to support at the same time high query rates and high update rates.
Project members: Lukas Blunschi
Multicore computers pose a substantial challenge to infrastructure software such as operating systems or databases. These platforms typically evolve slower than the underlying hardware but with multicore they face structural limitations that can be solved only with radical architectural changes. In this paper we argue that, as has been suggested for operating systems, databases could treat multicore architectures as a distributed system rather than trying to hide the parallel nature of the hardware. We first analyze the limitations of database engines when running on multicores using MySQL and PostgreSQL as examples. We then show how to deploy several replicated engines within a single multicore machine to achieve better scalability and stability than a single database engine operating on all cores. When combined with options like virtualization and the ability to tune the system configuration to the load and number of available cores, the approach we propose becomes an appealing alternative to having to entirely redesign the database engine. More...
In this project we demonstrate through experimental analysis of different algorithms and architectures that hardware still matters. Parallel hash join algorithms that are hardware conscious perform better than hardware-oblivious approaches. Through the analysis, we shed light on how modern hardware affects the implementation of data operators and provide the fastest implementation of parallel radix join to date, reaching close to 200 million tuples per second. More...
Rhizoma is a constraint-based runtime system for distributed applications which is self-hosting. The application manages itself to the extent of acquiring and releasing resources (in particular, virtual machines) in response to failures, offered load, or changing policy. Operators developing and deploying application using Rhizoma specify desired application deployment using a form of constrained logic programming, and the Rhizoma runtime uses this to drive resource requests continuously during the lifetime of the application.
R-OSGI is a transparent extension to the OSGi standard to implement seamless interaction with remote services. In contrast to other protocols and systems, R-OSGi preserves the semantics of OSGi services and deals with the implications on the module layer. Since R-OSGi is itself an OSGi bundle, it can be added to any OSGi application and turn it into a distributed system. R-OSGi is protocol and transport independent and facilitates spontaneous interaction with devices through service discovery and the AlfredO extension.
Project members: Michael Duller, Gustavo Alonso
There are many academic and commercial stream processing engines (SPEs) today, each of them with its own execution semantics. This variation may lead to seemingly inexplicable differences in query results. SECRET takes up this challenge. It is a descriptive model that allows users to analyze the behavior of systems and understand the results of window-based queries (with time- and tuple-based windows) for a broad range of heterogeneous SPEs. More
SwissQM is a stack-based virtual machine for wireless sensor networks and a gateway component. The gateway provides a declarative interface to the sensor network. Submitted queries are compiled into short bytecode sequences that are executed by the sensor nodes. SwissQM not only eases the use of sensor networks for field researchers but is also intended as flexible research platform. SwissQM is freely available for download (under GPL license).
Project members: Michael Duller, Jan Rellermeyer, Gustavo Alonso
Project page: http://www.swissqm.inf.ethz.ch
TELL
TELL is a feature rich platform for efficiently running mixed workloads. It is fast, modular and scalable, and, most of all, OPEN SOURCE. More...
In the last decade, XML has become very popular, as it provides a practical way to format semi-structured information. Since then, two programming worlds have prospered in two separate directions: on the one side, object-oriented languages (C++, Java, C#...) and on the other side, languages that handle information available in the XML format: type definition (DTD, XML Schema) and querying (Quilt, XQuery). This leads to a fundamental impedance mismatch between objects and XML content.
We are extending XQuery so as to give it object-oriented features seamlessly, which increases its modularity and scalability potential. This new language, Unity, can be seen as a progeny of Java, of XQuery and of XML Schema at the same time: it is object-oriented and manages XML natively. The fundamental idea behind Unity is to have code in the schema. We built a cross-compiler from Unity to XQuery.
Project members: Peter Fischer, Ghislain Fourny, Donald Kossmann
Most data stream processing systems model streams as append-only sequences of data elements. In this model, the application expects to receive a query answer on the complete stream. However, there are many situations in which each data element in the stream is in fact an update to a previous one, and therefore, the most recent value is all that really matters to the application. In UpStream, we explore how to efficiently process continuous queries under such an update-based stream data model. More...
XQuery is a powerful language for processing XML. In the context of the Web, XML data is often provided through Web services. Several XQuery implementations already support access to such services (for both WSDL and REST). We are trying to propose an extension to XQuery 1.0 that allows interoperability with WSDL services in a simple and transparent way. The main challenge in this project is mapping different type schemas.
Project members: Donald Kossmann
Former Project members: Peter Fischer, Kyumars Sheykh Esmaili
Over the years, the browser has become a complete runtime environment for client-side programs. The main scripting language used towards this purpose is JavaScript, which was designed so as to program the browser. A lot of extensions and new layers have been built on top of it to allow e.g. DOM navigation and manipulation. However, JavaScript has become a victim of its own success and is used way beyond its possibilities, leading to increased code complexity. We suggest to reduce programming complexity by proposing XQuery as a client-side programming language. We wrote an extension for Microsoft Internet Explorer, based on the Zorba XQuery engine, which allows execution of XQuery script in the browser. An extension for Firefox is on the way as well.
Project members: Peter Fischer, Dana Florescu (Oracle), Ghislain Fourny (28msec), Donald Kossmann
Project page: http://www.xqib.org
XTream
The XTream project extends and generalizes the concept of data streams to a broader range of applications that encompass a wide variety of data sources and devices and that have significantly different requirements than traditional applications on data streams. As part of the project, we are building a lightweight and extensible platform for highly distributed data stream processing.
Project members: Michael Duller, Gustavo Alonso, Timothy Roscoe, Nesime Tatbul
Zorba and MXQuery are XQuery processors written in C++ and Java, respectively. Both systems implement the whole XQuery family of standards (XQuery 1.0, XQuery Updates, XQuery Scripting, XQuery Fulltext) with some extensions (e.g., REST, Web Services, windows and streaming capabilities, group by, etc.). Both engines can be embedded into other software systems. For instance, Zorba has been embedded in Web browsers and both Zorba and MXQuery have been embedded in an Eclipse plug-in. Currently, there are efforts to integrate Zorba into a database engine and therefore, have an integrated database and application server. The overall goal of the project is to make declarative database application program ubiquituous and to simplify the database application programming stack by providing a uniform programming environment for all application layers (presentation, application logic, and database backend).
Project members: Peter Fischer, Kyumars Sheykh Esmaili, Dana Florescu (Oracle), Ghislain Fourny (28msec), Donald Kossmann
Project pages: