The limitations of today's computing architectures are well known: high power consumption, heat dissipation, network and I/O bottlenecks, and the memory wall. Field-programmable gate arrays (FPGAs), user-configurable hardware chips, are promising candidates to overcome these limitations. With tailor-made and software-configured hardware circuits it is possible to process data at very high throughput rates and with extremely low latency. Yet, FPGAs consume orders of magnitude less power than conventional systems. Thanks to their high configurability, they can be used as co-processors in heterogeneous multi-core architectures, and/or directly be placed in critical data paths to reduce the load that hits the system CPU. More...
During the last few years, there have been radical changes in the hardware and the processor architecture landscape. Intel’s co-founder Gordon E. Moore’s famous conjecture that the number of transistors on integrated circuits doubles roughly every two years still holds today. However, multi-core processors caused a paradigm shift in hardware where increasing number of transistors are invested in parallelism and advanced features on chip designs. In addition to increasing core counts, sophisticated features like advanced instruction sets, SIMD instructions, simultaneous multi-threading (SMT), larger and multiple layers of caches are being introduced in the newer processor designs. All of these game-changing technological trends constitute the driving force behind how new data processing systems and algorithms should be designed and tailored today, albeit making it even more challenging. Moreover, as there are no established designs for the contemporary multicores, there is also no clear consensus on how to design data processing algorithms to take advantage of available features as well as suggest new features to the hardware architects. In this research area, we design novel hardware-conscious algorithms and systems that are targeted to the recent features in hardware to extract the performance premises of modern multi-core machines and which provide insights for upcoming generations of multicores. More...
We are building a data appliance for Rack-scale Computers (RaSC) that leverages the benefits of cross-layer optimization and provides support for heterogeneous workloads. To achieve that we separate the storage from the data processing layer. The two layers communicate over a scalable interconnect fabric, at the moment focusing on RDMA over InfiniBand. More...