Thursday, 12 April 2018, 11:00-12:00 in CAB E 72
Speaker: Christoph Hagleitner (IBM Research, Rüschlikon)
Title: Heterogeneous Computing Systems for Datacenter and HPC Applications
Abstract:
For several decades, the technology roadmap has been driven by technology scaling, but it is evident that this will not be sufficient to economically realize large-scale computing applications. Furthermore, the convergence of data-science and HPC leads to significant changes of the workload characteristic of the emerging exascale HPC applications when compared to "classic" HPC applications. Therefore, the innovations that sustain the roadmap towards exascale computing applications come from heterogeneous, dense, workload-optimized systems. In this presentation, I will discuss the current status and show several projects from within IBM Research - Zurich that advance the roadmap towards tomorrows large-scale computing applications.
Short Bio:
Christoph Hagleitner leads the "Heterogeneous Cognitive Computing Systems" group at the IBM Research – Zurich Lab (ZRL) in Ruschlikon, Switzerland. The group focuses on heterogeneous computing systems for cloud datacenters and HPC. Applications include big-data analytics and cognitive computing. He obtained a diploma degree in Electrical Engineering from ETH, Zurich, Switzerland in 1997 and and a Ph.D. degree for a thesis on CMOS-integrated Microsensors from ETH, Zurich, Switzerland in 2002. In 2003 he joined IBM Research to work on the system architecture of a novel probe-storage device (“millipede”-project). In 2008, he started to build up a new research group in the area of accelerator technologies. The team initially focused on on-chip accelerator cores and gradually expanded its research to heterogeneous systems and their applications.
COMPASS: Computing Platforms Seminar Series
Speaker: Jane Hung (MIT)
Title: The Challenges and Promises of Large-Scale Biological Imaging
Abstract:
Microscopy images contain rich information about the state of cells, tissues, and organisms and are an important part of experiments to address a multitude of basic biological questions and health problems. The Broad Institute of MIT and Harvard’s Imaging Platform works with dozens of collaborators around the world to design and execute large-scale microscopy-based experiments in order to identify the causes and potential cures of disease. These experiments, though carried out in a non-profit environment, have led to the discovery of drugs effective in animal models of disease, and the uncovering of mechanisms underlying other diseases and biological processes.
Most recently, we have been working on software to support the increased physiological complexity of modern screening systems, for example, using whole organisms and co-cultured cell types. As well, our machine learning tools allow a biologists’ intuition to guide the computer to measure subtle phenotypes. We are also working to use patterns of morphological features to group samples by similarity, in order to identify drug targets and gene function. Ultimately, we aim to make microscopy images as computable as other sources of genomic and chemical information.
Short Bio:
Jane received her Ph.D. in the Department of Chemical Engineering at MIT and is interested in how accessible software can make processes more efficient. She had her first computer vision experience at an internship at Novartis in Basel working on automated drug manufacturing monitoring. From there, she joined Anne Carpenter's biological imaging analysis lab at the Broad Institute. She has worked on machine learning-based software application CellProfiler Analyst in collaboration with David Dao as well as deep learning-based object detection software Keras R-CNN in collaboration with Allen Goodman.
---
COMPASS: Computing Platforms Seminar Series
Thursday, 26 April 2018, 11:00-12:00 in CAB E 72
Speaker: Spyros Blanas (Ohio State University, USA)
Title: Scaling database systems to high-performance computers
Abstract:
Processing massive datasets quickly requires warehouse-scale computers. Furthermore, many massive datasets are multi-dimensional arrays which are stored in formats like HDF5 and NetCDF that cannot be directly queried using SQL. Parallel array database systems like SciDB cannot scale in this environment that offers fast networking but very limited I/O bandwidth to shared, cold storage: merely loading multi-TB array datasets in SciDB would take days--an unacceptably long time for many applications.
In this talk, we will present ArrayBridge, a common interoperability layer for array file formats. ArrayBridge allows scientists to use SciDB, TensorFlow and HDF5-based code in the same file-centric analysis pipeline without converting between file formats. Under the hood, ArrayBridge manages I/O to leverage the massive concurrency of warehouse-scale parallel file systems without modifying the HDF5 API and breaking backwards compatibility with legacy applications. Once the data has been loaded in memory, the bottleneck in many array-centric queries becomes the speed of data repartitioning between different nodes. We will present an RDMA-aware data shuffling abstraction that directly converses with the network adapter in InfiniBand verbs and can repartition data up to 4X faster than MPI. We conclude by highlighting research opportunities that need to be overcome for data processing to scale to warehouse-scale computers.
Short Bio:
Spyros Blanas is an assistant professor in the Department of Computer Science and Engineering at The Ohio State University. His research interest is high-performance database systems, and his current goal is to build a database system for high-end computing facilities. He has received the IEEE TCDE Rising Star Award and a Google Research Faculty award. He received his Ph.D. at the University of Wisconsin–Madison and part of his Ph.D. dissertation was commercialized in Microsoft's flagship data management product, SQL Server, as the Hekaton in-memory transaction processing engine.
---
Date: 30 April 2018
Time: 16:15 - 17:15
Place: ETH Zurich, main campus CAB G 61
Speaker: Prof. Wen-Mei Hwu, University of Illinois at Urbana-Champaign
Host: Prof. Onur Mutlu
ABSTRACT:
We have been experiencing two very important developments in computing. On the one hand, a tremendous amount of resources have been invested into innovative applications such as first-principle based models, deep learning and cognitive computing. On the other hand, the industry has been taking a technological path where traditional scaling is coming to an end and application performance and power efficiency vary by more than two orders of magnitude depending on their parallelism, heterogeneity, and locality. A “perfect storm” is beginning to form from the fact that data movement has become the dominating factor for both power and performance of high-valued applications. It will be critical to match the compute throughput to the data access bandwidth and to locate the compute at where the data is. Much has been and continuously needs to be learned about of algorithms, languages, compilers and hardware architecture in this movement. What are the killer applications that may become the new diver for future technology development? How hard is it to program existing systems to address the date movement issues today? How will we program future systems? How will innovations in memory devices present further opportunities and challenges in designing new systems? What is the impact on long-term software engineering cost on applications (and legacy applications in particular)? In this talk, I will present some lessons learned as we design the IBM-Illinois C3SR Erudite system inside this perfect storm.
BIOGRAPHY:
Wen-mei W. Hwu is a Professor and holds the Sanders-AMD Endowed Chair in the Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign. He is also Chief Scientist of UIUC Parallel Computing Institute and director of the IMPACT research group (www.crhc.uiuc.edu/Impact). He co-directs the IBM-Illinois Center for Cognitive Computing Systems Research (C3SR) and serves as one of the principal investigators of the NSF Blue Waters Petascale supercomputer. For his contributions, he received the ACM SigArch Maurice Wilkes Award, the ACM Grace Murray Hopper Award, the IEEE Computer Society Charles Babbage Award, the ISCA Influential Paper Award, the IEEE Computer Society B. R. Rau Award and the Distinguished Alumni Award in Computer Science of the University of California, Berkeley. He is a fellow of IEEE and ACM. Dr. Hwu received his Ph.D. degree in Computer Science from the University of California, Berkeley.
---
D-INFK Distinguished Colloquium
COMPASS: Computing Platforms Seminar Series
Speaker: Bastian Hossbach (Oracle Labs)
Title: Modern programming languages and code generation in the Oracle Database
Abstract:
In this talk, we will present the Oracle Database Multilingual Engine (MLE). MLE is an experimental feature for the Oracle Database that enables developers to write stored procedures and user-defined functions in modern programming languages such as JavaScript and Python. Special attention was payed to embrace the rich ecosystems of tools and libraries developed for those languages in order to make the developer's experience as familiar as possible. We will show several demos of MLE in action and discuss the challenges of integrating a language runtime with a database system. Under the hood, MLE is powered by the speculative JIT compiler Graal. Having a modern JIT compiler inside a database system not only allows for efficiently running user-defined code, but also for runtime compilation and specialization of SQL expressions and other parts of a query plan to speed up overall query execution.
Short Bio:
Since 2015, Bastian is a researcher at Oracle Labs in Zurich, Switzerland. He is currently working on a high-performance query execution engine for database management systems that is capable of executing query plans combined with user-defined scripts written in a variety of languages (e.g., JavaScript, Python). Bastian received a PhD degree in computer science from the University of Marburg, Germany, in 2015. Prior to Oracle Labs, he has been involved in several projects in the areas of data analytics, data processing and IT security.
Wednesday, 16 May 2018, 11:00-12:00 in CAB E 72
Speaker: Carsten Binnig (TU Darmstadt)
Title: Towards Interactive Data Exploration
Abstract:
Technology has been the key enabler of the current Big Data movement. Without open-source tools like R and Hadoop, as well as the advent of cheap, abundant computing and storage in the cloud, the ongoing trend toward datafication of almost every research field and industry could never have occurred. However, the current Big Data tool set is ill-suited for interactive data exploration of new data making the knowledge discovery process a major bottleneck in our data-driven society.
In this talk, I will first give an overview of challenges for interactive data exploration on large data sets and then present current research results that revisit the design of existing data management systems, from the query interface over the execution models to the storage and the underlying hardware to enable interactive data exploration.
Short Bio:
Carsten Binnig is a Full Professor in the Computer Science department at TU Darmstadt and an Adjunct Associate Professor in the Computer Science department at Brown University. Carsten received his PhD at the University of Heidelberg in 2008. Afterwards, he spent time as a postdoctoral researcher in the Systems Group at ETH Zurich and at SAP working on in-memory databases. Currently, his research focus is on the design of data management systems for modern hardware as well as modern workloads such as interactive data exploration and machine learning. He has recently been awarded a Google Faculty Award and a VLDB Best Demo Award for his research.
CAB E 72
Rodrigo Bruno (INESC-ID Lisboa): "Taming Long Tail Latencies with Allocation Context aware Pretenuring"
Abstract:
Latency sensitive services such as credit-card fraud detection and website targeted advertisement rely on Big Data platforms which run on top of memory managed runtimes, such as the Java Virtual Machine. These platforms, however, suffer from unpredictable and unacceptably high pause times due to inadequate memory management decisions This problem has been previously identified, and results show that current memory management techniques are ill-suited for applications that hold in memory massive amounts of long-lived objects (which is the case for a wide spectrum of Big Data applications). Previous works reduce such application pauses by allocating objects in off-heap, in special allocation regions/generations, or by using ultra-low latency Garbage Collectors (GC). However, all these solutions either require a combination of programmer effort and knowledge, source code access or off-line profiling, or impose a significant impact on application throughput in order to reduce application pauses (which is the case of ultra-low collectors). To solve this problem, we propose ROLP, a runtime object lifetime profiler that profiles application code at runtime. ROLP is targeted to help pretenuring GC algorithms deciding where to allocate an object in order to reduce overall fragmentation and GC effort, thus reducing application pauses. ROLP is implemented for the OpenJDK 8 and was evaluated with a recently proposed pretenuring collector (NG2C). Results show long tail latencies reductions of up to 51% for Lucene (search engine), 85% for GraphChi (graph engine), and 69% for Cassandra (key-value store). This is achieved with negligible throughput (< 6%) and memory overhead, and with zero programmer effort and no source code access.
Short Bio:
Rodrigo Bruno received his BSc (2012) and MSc (2014) degrees in Information Systems and Computer Engineering from Instituto Superior Técnico (IST), University of Lisbon, where he is now pursuing a PhD degree. At the same time, Rodrigo is a researcher at the Distributed Systems Group in INESC-ID Lisboa, and a teaching assistant at IST. His research is mostly focused on Garbage Collection algorithms for large scale latency sensitive applications. In the course of the last years, Rodrigo has collaborated and interned with several companies (such as Microsoft Research and Google) and also contributed to several opensource projects. ;
COMPASS: Computing Platforms Seminar Series
Speaker: Cagri Balkesen (Oracle Labs)
Title: RAPID: In-Memory Analytical Query Processing Engine with Extreme Performance per Watt
Abstract:
Today, an ever increasing amount of transistors are packed into processor designs with extra features to support a broad range of applications. As a consequence, processors are becoming more and more complex and power hungry. At the same time, they only sustain an average performance for a wide variety of applications while not providing the best performance for specific applications. In this paper, we demonstrate through a carefully designed modern data processing system called RAPID and a simple, low-power processor specially tailored for data processing that at least an order of magnitude performance/power improvement in SQL processing can be achieved over a modern system running on today's complex processors. RAPID is designed from the ground up with hardware/ software co-design in mind to provide architecture-conscious extreme performance while consuming less power in comparison to the modern database systems. The paper presents in detail the design and implementation of RAPID, a relational, columnar, in-memory query processing engine supporting analytical query workloads.
Short bio:
Cagri completed his PhD in 2014 in the Systems Group at ETH Zurich supervised by Prof. Gustavo Alonso. His broader research interests are data processing on modern computing architectures as well as data stream processing. He holds a MSc in Computer Science of ETH Zurich and a BSc in Computer Engineering of the Middle East Technical University (METU) in Turkey. His PhD thesis at ETH Zurich addresses the design and implementation of in-memory joins on modern hardware architectures with massive multi-core parallelism and the paradigm shift towards in-memory processing. His work on main-memory hash joins received the Best-Paper Runner-Up award at IEEE ICDE 2013. Cagri was a recipient of Excellence Scholarship from ETH Zurich and he holds several US-patents based on his work at IBM and Oracle Labs.
-->
COMPASS: Computing Platforms Seminar Series
Friday, 15 Juni 2018, 11:00-12:00 in CAB E 72
Speaker: Nitin Agrawal (Samsung Research)
Title: Low-Latency Analytics on Colossal Data Streams with SummaryStore
Abstract:
Data empowers learning; but soon, we may have too much of it, from sensors, machines, and personal devices, to store and analyze in a timely and cost-effective manner. In this talk I will present SummaryStore, an approximate storage system designed to support analytical and machine learning workloads such as forecasting, anomaly detection, and traffic monitoring, on large volumes of time-series data (∼petabyte per node). SummaryStore contributes time-decayed summaries, a novel abstraction for aggressively summarizing data streams while preserving accuracy. I'll also briefly discuss other research opportunities in this area and future work. This work was presented at SOSP '17; more details are available at http://pages.cs.wisc.edu/~nitina/summarystore/
Short Bio:
Nitin Agrawal heads the systems research lab at Samsung’s Artificial Intelligence Center in Mountain View, CA. His research lies broadly in systems, with an emphasis on storage, mobile, and distributed systems, and has received multiple best–paper awards, led to commercial & academic impact, an outstanding patent award, and widespread media attention. He served as the program committee chair for USENIX FAST ’18 and earned his doctorate from the University of Wisconsin-Madison in 2009.
CAB E 72
Talk by Ana Klimovic (Stanford University): Elastic Ephemeral Storage for Serverless Computing
Abstract:
Serverless computing is an increasingly popular cloud service, enabling users to launch thousands of short-lived tasks ("lambdas") with high elasticity and fine-grain resource billing. High elasticity and granular resource allocation make serverless computing appealing for interactive data analytics. However, a key challenge is sharing intermediate data between tasks in analytics jobs. Exchanging data directly between short-lived lambdas is difficult, thus the natural approach is to store ephemeral data in a common remote data store. Unfortunately, existing storage systems are not designed to meet the elasticity, performance and granular cost requirements of serverless applications. We first characterize the ephemeral I/O requirements of serverless analytics applications. We then present our design and implementation of a distributed data store that elastically and automatically scales to rightsize storage cluster resources across multiple dimensions (storage capacity, CPU cores and network bandwidth). We show the system cost-effectively satisfies dynamic application I/O requirements. Short
Bio:
Ana Klimovic is a final year Ph.D. student at Stanford University, advised by Professor Christos Kozyrakis. Her research interests are in computer systems and architecture. She is particularly interested in building high performance, resource efficient storage and computing systems for large-scale datacenters. Ana has interned at Facebook and Microsoft Research. Before coming to Stanford, Ana graduated from the Engineering Science undergraduate program at the University of Toronto. She is a Microsoft Research Ph.D. Fellow, Stanford Graduate Fellow and Accel Innovation Scholar. --
COMPASS: Computing Platforms Seminar Series
Friday, 6 July 2018, 15:00-16:00
Speaker: Martin Burtscher (Texas State University)
Title: Automatic Hierarchical Parallelization of Linear Recurrences
Abstract:
Many important computations from various fields are instances of linear recurrences. Prominent examples include prefix sums in parallel processing and recursive filters in digital signal processing. Later result values depend on earlier result values in recurrences, making it a challenge to compute them in parallel. We present a brand-new work-, space-, and communication-efficient algorithm to compute linear recurrences that is based on Fibonacci numbers, amenable to automatic parallelization, and suitable for GPUs. We implemented our approach in a small compiler that translates recurrences expressed in signature notation into CUDA code. Moreover, we discuss the domain-specific optimizations performed by our compiler to produce state-of-the-art implementations of linear recurrences. Compared to the fastest prior GPU codes, all of which only support certain types of recurrences, our automatically parallelized code performs on par or better in most cases. In fact, for standard prefix sums and single-stage IIR filters, it reaches the throughput of memory copy for large inputs, which cannot be surpassed. On higher-order prefix sums, it performs nearly as well as the fastest handwritten code. On tuple-based prefix sums and 1D recursive filters, it outperforms the fastest preexisting implementations.
Shirt Bio:
Martin Burtscher is a Professor in the Department of Computer Science at Texas State University. He received the BS/MS degree from ETH Zurich and the PhD degree from the University of Colorado at Boulder. Martin's current research focuses on parallelization of complex programs for GPUs as well as on automatic synthesis of data-compression algorithms. He has co-authored over 100 peer-reviewed scientific publications. Martin is a distinguished member of the ACM and a senior member of the IEEE.
HG D 22
Simon Gerber, PhD Defense
Title: Authorization, Protection, and Allocation of Memory in a Large System
COMPASS: Computing Platforms Seminar Series
Wednesday, 15 August 2018, 11:00-12:00 in CAB E 72
Speaker: Leonid Yavits (Technion - Israel Institute of Technology)
Title: Resistive CAM based architectures: Resistive Associative In-Storage Processor and Resistive Address Decoder
Abstract:
I will present two Resistive CAM (RCAM) based architectures:
Typical processing in storage architecture places processing cores inside storage system and allows near-data processing. A RCAM based Resistive Associative In-Storage Processor functions simultaneously as a storage and a massively parallel SIMD accelerator. It confines the computing to the storage arrays, thus implementing in-data rather than near-data processing. Resistive Associative In-Storage Processor outperforms the fastest state of art accelerators, achieving speedup of 9.7x, 5.1x, 3.5x and 2.9x for k-means, k-nearest neighbors, Smith-Waterman sequence alignment and fully connected layer of DNN, respectively.
Address decoders are typically hardwired. Replacing wires by resistive elements allows storing address alongside data and comparing it to input address, thus transforming address decoder into CAM and enabling fully associative access. Applications of resistive address decoder include fully associative TLB, cache and virtually addressable memory.
Bio:
Leonid Yavits received his MSc and PhD in Electrical Engineering from the Technion. After graduating, he co-founded VisionTech where he co-designed the world's first single chip MPEG2 codec. Following VisionTech’s acquisition by Broadcom, he managed Broadcom Israel R&D and co-developed a number of video compression products. Later Leonid co-founded Horizon Semiconductors where he co-designed a Set Top Box-on-chip for cable and satellite TV. Horizon's chip was among world's earliest heterogeneous MPSoC.
Leonid is a postdoc fellow in Electrical Engineering in the Technion. He co-authored a number of patents and research papers. His research interests include non von Neumann computer architectures; processing in memory and resistive memory based computing; architectures for computational biology and bioinformatics tasks. Leonid's research work has earned several awards; among them: IEEE Computer Architecture Letter journal Best Paper Awards for 2015 and 2017.
CAB E 72
Talk by Vasileios Tsoutsouras (Institute of Communication and Computer Systems (ICCS), Greece:
Title: Design Methodologies for Resource Management of Many-core Computing Systems
Abstract:
The complexity and elevated requirements of modern applications have driven the development of computing systems characterized by high number of processing cores, heterogeneity and complex communication interconnect. Inevitably, in order for these systems to yield their maximum performance, novel dynamic resource management mechanisms are required. Towards this direction, this presentation outlines the building blocks and design decisions of a run-time resource manager targeting many-core computing systems with Network-on-Chip (NoC) interconnection. Due to the high complexity and fast response requirements of dynamically mapping many concurrently running applications, a novel run-time resource management framework is introduced, aiming at providing a scalable solution based on distributed decision-making mechanisms.
This Distributed Run-Time Resource Management (DRTRM) framework is implemented and evaluated on top of Intel SCC, an actual many-core, NoC based computing platform. Motivated by the unpredictable workload dynamics and application requests, an impact analysis of their arrival rate on DRTRM is performed, showing that a fast and resource hungry scenario of incoming applications can be the breaking point not only for conventional centralized managers but also for distributed ones. In addition, the distribution of decisions in DRTRM complicates the enforcement of a system-wide mitigation scheme, as it requires the consensus of many agents. This issue is efficiently addressed by proposing an admission control policy that retains distributed features by taking advantage of the resource allocation hierarchy in DRTRM and enforcing Voltage and Frequency Scaling on few, specific distributed agents. This policy is implemented and evaluated as an extension of DRTRM, showing that it can relieve the congestion of applications under stressful conditions and also provides energy consumption gains.
Last, the increased probability of manifested hardware errors is addressed, a side-effect of the tight integration of many processing elements on the same system, which jeopardizes the Service Quality provided to the end user. By adhering to the concept of dynamic recovery from the manifested errors, SoftRM is introduced, a DRTRM augmented with fault tolerant features. SoftRM extends the well-known Paxos consensus algorithm concepts, providing dynamic self-organization and workload-aware error mitigation. SoftRM policies also refrain from the provisioning of spare cores for fault tolerance, thus maximizing system throughput both in the existence and absence of errors in the processing elements of the SoC.
Short Bio:
Vasileios Tsoutsouras received his Diploma and Ph.D. degree in Electrical and Computer Engineering from the Microprocessors and Digital Systems Laboratory of the National Technical University of Athens, Greece in 2013 and 2018, respectively. The main topics of his research include dynamic resource management of many-core computing systems, Edge computing in Internet of Things architectures and HW/SW co-design. He has published over 20 technical and research papers in scientific books, international conferences and journals. Since 2013, he has also worked as a research associate of the Institute of Communication and Computer Systems (ICCS) in 2 European founded projects regarding run-time resource management of medical embedded devices and Cloud infrastructure.
Thursday, 20 September 2018, 10:00-11:00 in CAB E 72
Speaker: Patrick Stüdi (IBM Research)
Title: Data processing at the speed of 100 Gbps using Apache Crail (Incubating)
Abstract:
Once the staple of HPC clusters, today high-performance network and storage devices are everywhere. For a fraction of the cost, one can rent 40/100 Gbps RDMA networks and high-end NVMe flash devices supporting millions of IOPS, 10s of GB/s bandwidth and less than 100 microseconds of latencies. But how does one leverage the speed of high-throughput low-latency I/O hardware in distributed data processing systems like Spark, Flink or Tensorflow?
In this talk, I will introduce Apache Crail (Incubating) a fast, distributed data store that is designed specifically for high-performance network and storage devices. Crail's focus is on ephemeral data, such as shuffle data or temporary data sets in complex job pipelines, with the goal to enable data sharing at the speed of the hardware in an accessible way. From a user perspective, Crail offers a hierarchical storage namespace implemented over distributed or disaggregated DRAM and Flash. At its core, Crail supports multiple storage back ends (DRAM, NVMe Flash, and 3D XPoint) and networking protocols (RDMA and TPC/sockets). In the talk I will discuss the design of Crail, its use cases and the performance results on a 100Gbps cluster.
Bio:
Patrick is a member of the research staff at IBM research Zurich. His research interests are in distributed systems, networking and operating systems. Patrick graduated with a PhD from ETH Zurich in 2008 and spent two years (2008-2010) as a Postdoc at Microsoft Research Silicon Valley. The general theme of his work is to explore how modern networking and storage hardware can be exploited in distributed systems. Patrick is the creator of several open source projects such as DiSNI (RDMA for Java), DaRPC (Low latency RPC) and co-founder of Apache Crail (Incubating).
---
---
Tuesday, 25 September 2018, 14:00-15:00 in CAB E 72
Speaker: Nandita Vijaykumar (Carnegie Mellon University, Pittsburgh, PA, USA)
Title: Expressive Memory: Rethinking the Hardware-Software Contract with Rich Cross-Layer Abstractions
Abstract:
Recent years have seen the rapid evolution and advancements at all levels of the computing stack, from application to hardware. Key abstractions and interfaces among the levels, however, have largely stayed the same: hardware and software, for instance, still primarily interact with traditional abstractions (e.g., virtual memory, instruction set architecture (ISA)). These interfaces are narrow, as hardware is unaware of key program semantics and programmer intent; and rigid, in terms of the fixed roles played by hardware and software. This fundamentally constrains the performance, programmability, and portability we can attain.
In this talk, I will make a case for rethinking the semantic contract between hardware and software and discuss how designing richer hardware-software abstractions can fundamentally change how we optimize for performance today. I will introduce two of our recent works in ISCA 2018 that explore the design and benefits of such cross-layer abstractions in two different contexts. I will first introduce Expressive Memory (XMem), a new cross-layer interface that communicates higher-level program semantics from the application to the underlying OS and hardware architecture. XMem thus enables the OS/architecture to identify the program’s data structures and be aware of each data structure’s access semantics, data types, etc. We demonstrate that this key, otherwise unavailable, information enables intelligent and much more powerful optimizations in operating systems and hardware architecture that significantly improves overall performance, programmability, and portability.
I will also briefly introduce the Locality Descriptor, a cross-layer abstraction to express and exploit data locality in throughput-oriented architectures, such as modern GPUs. I will discuss how a challenging aspect of programming GPUs can be made much simpler with a rich cross-layer programming abstraction that simultaneously enhances performance and portability.
Bio:
Nandita Vijaykumar is a Ph.D. candidate at Carnegie Mellon University, advised by Prof. Onur Mutlu and Prof. Phil Gibbons. Her research focuses on the interaction between programming models, system software, and hardware architecture, and explores how richer cross-layer abstractions can enhance performance, programmability, and portability. She is excited about rethinking the roles played by different levels of the stack in the modern era of rapidly evolving, specialized, and data-centric computing landscapes. Her industrial experience includes a full-time position at AMD and research internships at Microsoft Research, Nvidia Research, AMD, and Intel Labs. She is currently a visiting student at ETH Zurich.
---
---
Thursday, 4 October 2018, 10:00-11:00 in CAB E 72
Speaker: Philippe Bonnet (IT University, Copenhagen, Denmark)
Title: Near-Data Processing with Open-Channel SSDs
Abstract:
The advent of microsecond-scale SSDs makes it necessary to streamline the I/O software stack. At the same time, the increasing performance gap between storage and CPU makes it necessary to reduce the CPU overhead associated to storage management. The convergence of these two trends calls for a profound redesign of the I/O stack. In this talk, I will present recent work we have done based on a near-data processing architecture, where low-level storage management and front-end SSD management are combined at a middle tier between host CPU and Open-Channel SSDs. I will first review recent developments in the area of Open-Channel SSDs, then detail our work on two systems (ELEOS & LightLSM), and conclude with lessons learned and open issues.
Bio:
Philippe Bonnet is professor in the data systems group at the IT University of Copenhagen. He is a Marie Curie fellow. He held positions at ECRC, INRIA, Cornell and University of Copenhagen. Recently, Philippe led the Danish CLyDE project that promoted open-channel SSDs, resulting in patents as well as a contribution to the Linux kernel (lightnvm).
---
---
Speaker: Mihnea Andrei (SAP HANA)
Title: Snapshot isolation in HANA - the evolution towards production-grade HTAP
Abstract:
Pioneered by SAP HANA, Hybrid Transactional Analytical Processing (HTAP) has become today an industry trend, adopted by most major database vendors. In the past, Online Transactional Processing (OLTP) and Online Analytical Processing (OLAP) were executed separately, on dedicated systems, each optimized for the respective workload. Initially, the optimizations pertained more to the logical and physical data models; but later specialized database engines emerged, as row vs. column stores, native cubes, etc. Despite the performance advantage of this "one side does not fit all" approach, it has the practical disadvantage of putting on the user to absorb the complexity of operating two different systems, and their data movement. This involves many moving parts, each prone to failure, with subtle difference in the semantics of the data models and processing. Conversely, HTAP systems absorb this complexity and offer both of OLTP and OLAP processing within the same system, with the same transaction domain guarantees, on the same data set, and with excellent performance. HTAP was made possible through in-memory computing, itself based by new software technology leveraging modern hardware, as very large DRAM capacity, large number of processing cores, etc. - and more recently NVRAM.
In this talk, I start with an introduction to the internals of HANA, covering both the concept of HTAP and the underlying technologies used in HANA to enable it within a single system. The second part of the talk focuses on HANA's productive implementation of snapshot isolation. After a refresh on transactional consistency principles and techniques, I present HANA’s original OLAP oriented implementation. The main part of the talk coverts the implementation's evolution to the current version, which enables HTAP. I conclude with a post-mortem analysis of the good and bad, and give an outlook to future work.
Short Bio:
Mihnea is a Chief Architect at SAP, working on the HANA database system. He covers a broad area, spanning core database technologies: in-memory, NVRAM, and on-disk stores; transaction processing; query optimization and execution, FPGA; distributed processing and RDMA, federation; and, more recently, DBaaS. His industry focus on databases started in 1993, when he joined Sybase, a Bay Area database editor. Mihnea worked on both of the state-of-the-art row oriented and column oriented Sybase database systems, which covered separately OLTP and OLAP workloads. Prior, he completed his DEA in Machine Learning in 1990, at the Université Paris 6, supervised by Prof. Jean-Gabriel Ganascia; and his MSc in Computer Science in 1988, at the Bucharest Polytechnic Institute.
----
----
Colloquium Talk by Arvind Mithal (MIT)
Title: The Riscy Expedition
16:15 - CAB G 61
Host: Onur Mutlu
Claude Barthels - PhD Defense
Title: Scalable Query and Transaction Processing over High-Performance Networks
Location: HG D 22.
Systems Group Industry Retreat 2019 will take place from 6 to 9 of January 2019...
Systems Group Industry Retreat 2019 will take place from 6 to 9 of January 2019...
Systems Group Industry Retreat 2019 will take place from 6 to 9 of January 2019...
Systems Group Industry Retreat 2019 will take place from 6 to 9 of January 2019...
Thursday, 31. January 2019, 10:00-11:00 in
CAB E 72
Speaker: Irene Zhang (Microsoft Research, Redmond)
Title: Demikernel: An Operating System Architecture for Hardware-Accelerated Datacenter Servers
Abstract:
As I/O devices become faster, the CPU is increasingly a bottleneck in today's datacenter servers. As a result, servers now integrate a variety of I/O accelerators -- I/O devices with an attached computational unit -- to offload functionality from the CPU (e.g., RDMA, DPDK and SPDK devices). More specifically, many of these devices improve performance by eliminating the operating system kernel from the I/O processing path. This change has left a gap in the datacenter systems stack: there is no longer a general-purpose, device-independent I/O abstraction. Instead, programmers build their applications against low-level device-specific interfaces, which are difficult to use and not portable.
This talk presents the Demikernel, a new operating system architecture for datacenter servers. Demikernel operating systems are split into a control-path kernel and a data-path library OS, which provides a new device-agnostic I/O abstraction for datacenter servers. Each Demikernel library OS implements this I/O abstraction in a device-specific way by offloading some functions to the device and implementing the remainder on the CPU. In this way, datacenter applications can use a high-level interface for I/O that works across a range of I/O accelerators without application modification.
Bio:
Irene Zhang is a researcher at Microsoft Research Redmond. Her current research focuses on new operating systems for datacenter servers and mobile devices. She recently received her PhD from the University of Washington, advised by Hank Levy and Arvind Krishnamurthy. Her thesis focused on distributed programming systems for wide-area applications. She is this year's recipient of the Dennis M. Ritchie SIGOPS dissertation award.
----
----
PhD Defense: Reto Achermann
07.02.2020 14:00
Title: On Memory Addressing
Committee: Timothy Roscoe (ETH Zurich) David Basin (ETH Zurich) Gernot Heiser (UNSW and Data61) David Cock (ETH Zurich)
Speaker: Tom Anderson (University of Washington)
Title: A Case for An Open Source CS Curriculum
Host: Timothy Roscoe
Abstract:
Despite rapidly increasing enrollment in CS courses, the academic CS community is failing to keep pace with demand for trained CS students. Further, the knowledge of how to teach students up to the state of the art is increasingly segregated into a small cohort of schools who mostly cater to students from families in the top 10% of the income distribution. Even in the best case, those schools lack the aggregate capacity to teach more than a small fraction of the nation's need for engineers and computer scientists. MOOCs can help, but they are mainly effective at retraining existing college graduates. In practice, most low and middle income students need a human teacher. In this talk I argue for building an open source CS curriculum, with autograded projects, instructional software, textbooks, and slideware, as an aid for teachers who want to improve the education in advanced CS topics at schools attended by the children of the 90%. I will give as an example our work on replicating teaching advanced operating systems and distributed systems.
Bio:
Tom Anderson is the Warren Francis and Wilma Kolm Bradley Chair in the Paul G. Allen School of Computer Science and Engineering at the University of Washington. His research interests span all aspects of building practical, robust, and efficient computer systems, including distributed systems, operating systems, computer networks, multiprocessors, and security. He is a member of the National Academy of Engineering and the American Academy of Arts and Sciences, as well as winner of the USENIX Lifetime Achievement Award, the USENIX STUG Award, the IEEE Koji Kobayashi Computer and Communications Award, the ACM SIGOPS Mark Weiser Award, and the IEEE Communications Society William R. Bennett Prize. He is also an ACM Fellow, past program chair of SIGCOMM and SOSP, and he has co-authored twenty-one award papers and one widely used undergraduate textbook: http://ospp.cs.washington.edu/.
Thursday, 21. February 2019, 10:00-11:00 in
CAB E 72
Speaker: Thomas Würthinger (Oracle Labs)
Title: Bringing the Code to the Data with GraalVM
Abstract:
High-performance language runtimes often execute isolated from datastores. Encoding logic in the form of stored procedures requires relying on different execution engines and sometimes even different languages. Our vision of the future of execution runtimes is GraalVM: an integrated, polyglot, high-performance execution environment that can not only run stand-alone but also efficiently embedded in other systems. It supports shared tooling independent of the specific language and specific embedding. We designed the GraalVM runtime with complete separation of logical and physical data layout in mind. This allows direct access to custom data formats without marshalling overheads. GraalVM supports dynamic languages such as JavaScript, Ruby, Python and R. Additionally, even lower level languages such as C, C++, Go, and Rust are integrated into the ecosystem via LLVM bitcode and can execute in a sandboxed and secure manner. We believe this language-level virtualisation will provide major benefits for system performance and developer productivity.
Bio:
Thomas Wuerthinger is researcher at Oracle Labs Switzerland. His research interests include Virtual Machines, Feedback-directed Runtime Optimizations, and Static Program Analysis. His current focus is the Graal project that aims at developing a new dynamic compiler for Java. Additionally, he is the architect of the Truffle self-optimizing runtime system, which uses partial evaluation for automatically deriving high-performance compiled code from AST interpreters. Before joining Oracle Labs, he has worked on the IdealGraphVisualizer, the Crankshaft/V8 optimizing compiler, and the Dynamic Code Evolution VM. He received a PhD degree from the Johannes Kepler University Linz.
----
----
Thursday, 28. February 2019, 10:00-11:00 in CAB E 72
Speaker: Alberto Lerner (University of Fribourg, Switzerland)
Title: The Case for Network-Accelerated Query Processing
Abstract:
The fastest plans in MPP databases are usually those with the least amount of data movement across nodes, as data is not processed while in transit. The network switches that connect MPP nodes are hard-wired to perform packet-forwarding logic only. However, in a recent paradigm shift, network devices are becoming “programmable.” The quotes here are cautionary. Switches are not becoming general purpose computers (just yet). But now the set of tasks they can perform can be encoded in software.
In this talk we explore this programmability to accelerate OLAP queries. We found that we can offload onto the switch some very common and expensive query patterns. Moving data through networking equipment can hence for the first time contribute to query execution. Our preliminary results show that we can improve response times on even the best agreed upon plans by more than 2x using 25 Gbps networks. We also see the promise of linear performance improvement with faster speeds. The use of programmable switches can open new possibilities of architecting rack- and datacenter-sized database systems, with implications across the stack.
Bio:
Alberto Lerner is a Senior Researcher at the eXascale Infolab at the University of Fribourg, Switzerland. His interests include systems that explore closely coupling of hardware and software in order to realize untapped performance and/or functionality. Previously, he spent years in the industry consulting for large, data-hungry verticals such as finance and advertisement. He had also been part of the teams behind a few different database engines: IBM's DB2, working on robustness aspects of the query optimizer, Google's Bigtable, on elasticity aspects, and MongoDB, on general architecture. Alberto received his Ph.D. from ENST - Paris (now ParisTech), having done his thesis research work at INRIA/Rocquencourt and NYU. He's also done post-doctoral work at IBM Research (both at T.J. Watson and Almaden).
----
----
Thursday, 21. March 2019, 13:30-14:30 in CAB E 72
Speaker: Marko Vukolic (IBM Research)
Title: Hyperledger Fabric: a Distributed Operating System for Permissioned Blockchains
Abstract:
Fabric is a modular and extensible open-source system for deploying and operating permissioned blockchains and one of the Hyperledger projects hosted by the Linux Foundation (www.hyperledger.org). Fabric supports modular consensus protocols, which allows the system to be tailored to particular use cases and trust models. Fabric is also the first blockchain system that runs distributed applications written in standard, general-purpose programming languages, without systemic dependency on a native cryptocurrency. This stands in sharp contrast to existing blockchain platforms that require "smart-contracts" to be written in domain-specific languages or rely on a cryptocurrency. Fabric realizes the permissioned model using a portable notion of membership, which may be integrated with industry-standard identity management. To support such flexibility, Fabric introduces an entirely novel blockchain design and revamps the way blockchains cope with non-determinism, resource exhaustion, and performance attacks. Although not yet performance-optimized, Fabric achieves, in certain popular deployment configurations, end-to-end throughput of more than 3500 transactions per second (of a Bitcoin-inspired digital currency), with sub-second latency, scaling well to over 100 peers. In this talk we discuss Hyperledger Fabric architecture, detailing the rationale behind various design decisions. We also briefly discuss distributed ledger technology (DLT) use cases to which Hyperledger Fabric is relevant, including financial industry, manufacturing industry, supply chain management, government use cases and many more.
Short Biography:
Dr. Marko Vukolić is a Research Staff Member at Blockchain and Industry Platforms group at IBM Research - Zurich. Previously, he was a faculty at EURECOM and a visiting faculty at ETH Zurich. He received his PhD in distributed systems from EPFL in 2008 and his dipl. ing. degree in telecommunications from University of Belgrade in 2001. His research interests lie in the broad area of distributed systems, including blockchain and distributed ledgers, cloud computing security, distributed storage and fault-tolerance.
----
----
Thursday, 28. March 2019, 10:00-11:00 in CAB E 72
Speaker: Theo Rekatsinas (University of Wisconsin)
Title: A Machine Learning Perspective on Managing Noisy Data
Abstract:
Modern analytics are very dependent on high-effort tasks like data preparation and data cleaning to produce accurate results. It is for this reason that the vast majority of the time devoted on analytics projects is spent on high-effort tasks like data preparation and data cleaning.
This talk describes recent work on making routine data preparation tasks dramatically easier. I will first introduce a noisy channel model to describe the quality of structured data and demonstrate how most work on noisy data management by the database community can be cast as a statistical learning and inference problem. I will then show how this noisy channel model forms the basis of HoloClean, a weakly supervised ML system for automated data cleaning. I will close with additional examples of how a statistical learning view can lead to new insights and solutions to classical database problems such as constraint discovery and consistent query answering.
Short Bio:
Theodoros (Theo) Rekatsinas is an Assistant Professor in the Department of Computer Sciences at the University of Wisconsin-Madison. He is a member of the Database Group. He earned his Ph.D. in Computer Science from the University of Maryland and was a Moore Data Postdoctoral Fellow at Stanford University. His research interests are in data management, with a focus on data integration, data cleaning, and uncertain data. Theo's work has been recognized with an Amazon Research Award in 2018, a Best Paper Award at SDM 2015, and the Larry S. Davis Doctoral Dissertation award in 2015.
----
----