Past Systems Group Projects


AEOLUS - Algorithmic Principles for Building Efficient Overlay Computers.

 - More info here


Analytics in Motion (AIM)

The goal of AIM is to bridge the gap between write-optimized storage systems used for Online Transactional Processing (OLTP) and read-optimized structures used in Online Analytical Processing (OLAP) with a single shared data storage. The motivation for this is to allow data to be processed in REAL-TIME, which cannot be achieved by the traditional Data Warehousing approach where two separated storage systems are used and data is regularly shifted from the (write-optimized) OLTP to the (read-only) OLAP storage. More...


Anzere personal storage systems

Anzere is a storage system which replicates a user's personal data (photos, music, etc.) across an ensemble of physical and virtual devices owned (or rented on demand from cloud infrastructures) by a single user. With Anzere we show how to flexibly replicate data at scale in response to a complex, user-specified set of replication policies. Anzere is built on the Rhizoma platform, and includes an overlay network, monitoring infrastructure, CLP solver, data replication based on PRACTI, and Paxos for consistency. Anzere currently runs on mobile phones, laptops and desktops, and VMs on PlanetLab and Amazon EC2. More... 


Arosa resource allocator for testbeds (VF2x)

VF2x - Distributed network testbeds like GENI aim to support a potentially large number of experiments simultaneously on a complex, widely distributed physical network by mapping each requested network onto a share or “slice” of physical hosts, switches and links. A significant challenge is network mapping: how to allocate virtual nodes, switches and links from the physical infrastructure so as to accurately emulate the requested network configurations.

More...


ASAP - QoS in multi-hop wireless networks

 - More info here


Avalanche: Data Processing on Bare Metal

The limitations of today's computing architectures are well known: high power consumption, heat dissipation, network and I/O bottlenecks, and the memory wall. Field-programmable gate arrays (FPGAs), user-configurable hardware chips, are promising candidates to overcome these limitations. With tailor-made and software-configured hardware circuits it is possible to process data at very high throughput rates and with extremely low latency. Yet, FPGAs consume orders of magnitude less power than conventional systems. Thanks to their high configurability, they can be used as co-processors in heterogeneous multi-core architectures, and/or directly be placed in critical data paths to reduce the load that hits the system CPU. More...


Cloudy/Smoky

Cloud computing has changed the view on data management by focusing primarily on cost, flexibility and availability instead of consistency and performance at any price as traditional DBMS do. As a result, cloud data storages run on commodity hardware, are designed to be scalable, easy to maintain and highly fault-tolerant often providing relaxed consistency guarantees. The success of key-value stores like Amazon's S3 or the variety of open-source systems reflect this shift. Existing solutions, however, still lack substantial functionality provided by a traditional DBMS (e.g., support for transactions and a declarative query language) and are tailored to specific scenarios creating a jungle of services. That is, users have to decide for a specific service and are later locked into this service, preventing the evolution of the application, leading to misuse of services and expensive migrations to other services. With Cloudy we have started to build our own highly scalable database, which provides a completely modularized architecture and is not tailored to a specific use case. For example, Cloudy supports stream processing, as well as SQL and simple key-value requests. More...


CrowdDB: Integrating Human Input into Databases

The goal of this project is to develop a set of novel techniques that allow to integrate human resources into a database system in order to process some of the impossible queries that Google and Oracle cannot answer today and address some of the notoriously hard database research problems in a very different way as has been done in the past. Specifically, we plan to build an extended relational database system, called CrowdDB.  More...


Data Cyclotron

The transport mechanisms offered by modern network cards that support remote direct memory access (RDMA) significantly shift the priorities in distributed systems. Complex and sophisticated machinery designed only to avoid network traffic can now be replaced by schemes that can use the available bandwidth to their advantage. One such scheme is Data Cyclotron, a research effort that we pursue jointly with the database group at CWI Amsterdam. Based on a simple ring-shaped topology, Data Cyclotron offers ad-hoc querying over data of arbitrary shape and arbitrary size. More... 


Describing Streams

The purpose of this project is to investigate the possibilities and the usefulness of approaches to describe the format of the data stream, both in terms of structural (e.g., event sequence) as well as dynamic (e.g., rates) aspects. The information derived from these descriptions should be used to enable optimisations in a declarative stream processor. These optimisations are targeted to reduce response times, memory overhead and processing cost.

Project members: Peter Fischer, Kyumars Sheikh Esmaili, Donald Kossmann


DejaVu

The DejaVu project explores scalable complex event processing techniques for streams of events. The goal is to provide a system that can seamlessly integrate pattern detection over live and historical streams of events behind a common, declarative interface. We are investigating various optimization ideas for efficient data access and query execution. More...


ECC Projects


 

flowSGI

flowSGI combines the paradigm of Fluid Computing with the dynamics of the OSGi service platform. Data and applications can be shared and kept synchronized among different peers, including small mobile devices. Offline operations on data are permitted; changes are reconciled as soon as a network connection is available again. flowSGi includes the following subprojects:

  • Concierge OSGi Concierge is an implementation of the OSGi R3 technology, optimized for mobile and embedded devices. It runs on all J2SE and J2ME CDC VMs and shows a good performance even on not so optimized virtual machines. With a footprint of only 85 kBytes, it's one of the smallest OSGi implementations available. Project members: Gustavo Alonso
     
  • jSLP is a pure Java implementation of RFC 2608: Service Location Protocol. It provides service discovery on packet level and can run either in managed environments, or in ad-hoc networks using multicast requests. Project members: Gustavo Alonso
  • R-OSGI 

Health Data Cooperative (HDC)

By building a platform for personal health data, this project wants to empower the citizens by giving them control over their own health data. The platform facilitates exchange of and granting access to personal health records. This makes it possible to aggregate data from diverse data sources, thereby generating a more complete picture of personal health than ever before. For a single person, this enables personalized medicine. And when data of many citizens is aggregated and anonymized, medical research can be conducted at a scale that has not been there before. More...


Rack-scale data processing system

We are building a data appliance for Rack-scale Computers (RaSC) that leverages the benefits of cross-layer optimization and provides support for heterogeneous workloads. To achieve that we separate the storage from the data processing layer. The two layers communicate over a scalable interconnect fabric, at the moment focusing on RDMA over InfiniBand. More... 


iMeMex

iMeMex is a first implementation of a Personal DataSpace Management System (PDSMS). It allows users to handle all their data available from different graph-based data formats such as email, files&folders, RSS, and databases. Moreover, iMeMex allows users to define information-integration semantics in a pay-as-you-go fashion. Project members: Lukas Blunshi


JOpera

JOpera is a rich composition environment for many kinds of services (including Web services). It supports the visual design and autonomic enactment of complex distributed business protocols and conversations. More info here


Limmat: Analytics for the Real-Time Web

Today, with the growing use of mobile devices constantly connected to the Internet, the nature of user-generated data has changed: it has become more real-time. People share their thoughts and discuss breaking news on Twitter and Facebook; they share their current locations and activities on location-based social networks such as Foursquare. The difference is that, today, people share more often and the lifespan of the data has become shorter.   

Analyzing this data leads to new requirements for analytical systems: real-time processing and database intensive workloads. Driven by these requirements, we have developed Limmat. Limmat extends a key-value store architecture with push-based processing, transactional task execution, and synchronization. We modified the MapReduce programming model to support push-style data processing.

 

Project members: Donald Kossmann, Maxim Grinev, Maria Grineva, Martin Hentschel

 


Mapping Data to Queries 

Mapping Data to Queries (MDQ) is a radically different approach to process data with many different schemas. MDQ differs from traditional approaches to data integration by integrating data at the latest possible point in time, at runtime of a query. This opens up a great potential for optimization because at query runtime both, the data and the query, are known and we can exploit this knowledge to only apply fewer mapping rules that traditional approaches. Consequently, MDQ scales well with the number of schemas and outperforms traditional approaches by orders of magnitude in extreme cases. 

Project members: Martin Hentschel, Laura Haas (IBM Almaden) Donald Kossmann, and Renée Miller (University of Toronto) 


MAND

One key difference between Mobile Ad Hoc Networks (MANETs) and other network types is that MANETs lack central infrastructure components necessary to build common directory-based services such as SLP, SIP, DNS, etc. This poses complicated challenges to protocol design and software architecture for such networks. MAND (Mobile Ad hoc Network Directory) is an infrastructure for the distribution, storage, and lookup of key/value pairs (tuples) in ad hoc networks. The key insight in MAND is to piggyback tuples and requests on the messages that routing protocols exchange to build and maintain routes in the ad hoc network. Using MAND we have build AdSocial, a social networking application running in ad hoc networks of Nokia N810 handheld devices.

Project members: Oriana Riva, Gustavo Alonso

 


 

Managing Assurance, Security and Trust for Services  (MASTER)

MASTER (http://www.master-fp7.eu/) is a collaborative project funded under the EU 7th Research Framework Programme. It aims to provide methodologies and infrastructures which facilitate monitoring,enforcement, and auditing of security compliance, especially where highly dynamic service oriented architectures are used to support business process enactment in single, multi-domain, and iterated contexts. ETH's role in MASTER is investigate the use of event processing systems for monitoring purposes, in particular the aspects of expressiveness, dependability and lifecycle management


Project Members: Tahmineh Sanamrad

Former Project members: Peter Fischer,  Kyumars Sheikh Esmaili


Modularization of Database Engines

The architecture of current Data Management Systems is mostly monolithic, highly intertwined, and has not really changed since the relational model was first proposed and implemented. Advances in hardware and computing platforms are making it almost impossible to continue operating large data management systems with such an architecture. Exploiting multi-core or cluster based systems requires to increase parallelism and have a far more loosely couple architecture. In this project we explore alternative architectures for database systems, architectures that are better suited to the new hardware platforms. Currently we are focusing our efforts on exploiting modular software design as the basis for a component based database engine that can be dynamically adapted and configured. To see the impact of modularization on current
database engines and processors we are working on refactoring an existing open source database engine, port the modules to R-OSGi, and evaluating the performance and functionality of the resulting system.

Project members: Ionut Subasu, Jan S. Rellermeyer, Gustavo Alonso

 


MOVIES

With the exponential growth of moving objects data to the Gigabyte range, it has become critical to develop effective techniques for indexing, updating, and querying these massive data sets. To meet the high update rate as well as low query response time requirements of moving object applications, this project takes a novel approach in moving object indexing. The resulting technique MOVIES aims to be the first to support at the same time high query rates and high update rates.

Project members: Lukas Blunschi

 



Multimed

Multicore computers pose a substantial challenge to infrastructure software such as operating systems or databases. These platforms typically evolve slower than the underlying hardware but with multicore they face structural limitations that can be solved only with radical architectural changes. In this paper we argue that, as has been suggested for operating systems, databases could treat multicore architectures as a distributed system rather than trying to hide the parallel nature of the hardware. We first analyze the limitations of database engines when running on multicores using MySQL and PostgreSQL as examples. We then show how to deploy several replicated engines within a single multicore machine to achieve better scalability and stability than a single database engine operating on all cores. When combined with options like virtualization and the ability to tune the system configuration to the load and number of available cores, the approach we propose becomes an appealing alternative to having to entirely redesign the database engine. More...


Paralel Joins

In this project we demonstrate through experimental analysis of different algorithms and architectures that hardware still matters. Parallel hash join algorithms that are hardware conscious perform better than hardware-oblivious approaches. Through the analysis, we shed light on how modern hardware affects the implementation of data operators and provide the fastest implementation of parallel radix join to date, reaching close to 200 million tuples per second. More...


Rhizoma runtime for self-managing overlays

Rhizoma is a constraint-based runtime system for distributed applications which is self-hosting. The application manages itself to the extent of acquiring and releasing resources (in particular, virtual machines) in response to failures, offered load, or changing policy. Operators developing and deploying application using Rhizoma specify desired application deployment using a form of constrained logic programming, and the Rhizoma runtime uses this to drive resource requests continuously during the lifetime of the application.

More... 


R-OSGI

R-OSGI is a transparent extension to the OSGi standard to implement seamless interaction with remote services. In contrast to other protocols and systems, R-OSGi preserves the semantics of OSGi services and deals with the implications on the module layer. Since R-OSGi is itself an OSGi bundle, it can be added to any OSGi application and turn it into a distributed system. R-OSGi is protocol and transport independent and facilitates spontaneous interaction with devices through service discovery and the AlfredO extension.

Project members: Michael Duller, Gustavo Alonso


SECRET

There are many academic and commercial stream processing engines (SPEs) today, each of them with its own execution semantics. This variation may lead to seemingly inexplicable differences in query results. SECRET takes up this challenge. It is a descriptive model that allows users to analyze the behavior of systems and understand the results of window-based queries (with time- and tuple-based windows) for a broad range of heterogeneous SPEs. More


SwissQM

SwissQM is a stack-based virtual machine for wireless sensor networks and a gateway component. The gateway provides a declarative interface to the sensor network. Submitted queries are compiled into short bytecode sequences that are executed by the sensor nodes. SwissQM not only eases the use of sensor networks for field researchers but is also intended as flexible research platform. SwissQM is freely available for download (under GPL license).

Project members: Michael Duller, Jan Rellermeyer, Gustavo Alonso

Project page: http://www.swissqm.inf.ethz.ch


TELL

TELL is a feature rich platform for efficiently running mixed workloads. It is fast, modular and scalable, and, most of all, OPEN SOURCE. More...


Unity: Code in the Schema

In the last decade, XML has become very popular, as it provides a practical way to format semi-structured information. Since then, two programming worlds have prospered in two separate directions: on the one side, object-oriented languages (C++, Java, C#...) and on the other side, languages that handle information available in the XML format: type definition (DTD, XML Schema) and querying (Quilt, XQuery). This leads to a fundamental impedance mismatch between objects and XML content.
We are extending XQuery so as to give it object-oriented features seamlessly, which increases its modularity and scalability potential. This new language, Unity, can be seen as a progeny of Java, of XQuery and of XML Schema at the same time: it is object-oriented and manages XML natively. The fundamental idea behind Unity is to have code in the schema. We built a cross-compiler from Unity to XQuery.

Project members: Peter Fischer, Ghislain Fourny, Donald Kossmann


UpStream

Most data stream processing systems model streams as append-only sequences of data elements. In this model, the application expects to receive a query answer on the complete stream. However, there are many situations in which each data element in the stream is in fact an update to a previous one, and therefore, the most recent value is all that really matters to the application. In UpStream, we explore how to efficiently process continuous queries under such an update-based stream data model. More...


Web Service Facility in XQuery

XQuery is a powerful language for processing XML. In the context of the Web, XML data is often provided through Web services. Several XQuery implementations already support access to such services (for both WSDL and REST). We are trying to propose an extension to XQuery 1.0 that allows interoperability with WSDL services in a simple and transparent way. The main challenge in this project is mapping different type schemas.

Project members: Donald Kossmann

Former Project members: Peter Fischer, Kyumars Sheykh Esmaili

 


XQuery in the Browser 

 

Over the years, the browser has become a complete runtime environment for client-side programs. The main scripting language used towards this purpose is JavaScript, which was designed so as to program the browser. A lot of extensions and new layers have been built on top of it to allow e.g. DOM navigation and manipulation. However, JavaScript has become a victim of its own success and is used way beyond its possibilities, leading to increased code complexity. We suggest to reduce programming complexity by proposing XQuery as a client-side programming language. We wrote an extension for Microsoft Internet Explorer, based on the Zorba XQuery engine, which allows execution of XQuery script in the browser. An extension for Firefox is on the way as well. 

Project members: Peter Fischer, Dana Florescu (Oracle), Ghislain Fourny (28msec), Donald Kossmann 

Project page: http://www.xqib.org
 

 


XTream  

The XTream project extends and generalizes the concept of data streams to a broader range of applications that encompass a wide variety of data sources and devices and that have significantly different requirements than traditional applications on data streams. As part of the project, we are building a lightweight and extensible platform for highly distributed data stream processing.

Project members: Michael Duller, Gustavo Alonso, Timothy Roscoe, Nesime Tatbul


 Zorba and MXQuery  

 

Zorba and MXQuery are XQuery processors written in C++ and Java, respectively. Both systems implement the whole XQuery family of standards (XQuery 1.0, XQuery Updates, XQuery Scripting, XQuery Fulltext) with some extensions (e.g., REST, Web Services, windows and streaming capabilities, group by, etc.). Both engines can be embedded into other software systems. For instance, Zorba has been embedded in Web browsers and both Zorba and MXQuery have been embedded in an Eclipse plug-in. Currently, there are efforts to integrate Zorba into a database engine and therefore, have an integrated database and application server.  The overall goal of the project is to make declarative database application program ubiquituous and to simplify the database application programming stack by providing a uniform programming environment for all application layers (presentation, application logic, and database backend).  

Project members: Peter Fischer, Kyumars Sheykh Esmaili, Dana Florescu (Oracle), Ghislain Fourny (28msec), Donald Kossmann

Project pages: