COMPASS: Computing Platforms Seminar Series
Thursday, 26 April 2018, 11:00-12:00 in CAB E 72
Speaker: Spyros Blanas (Ohio State University, USA)
Title: Scaling database systems to high-performance computers
Abstract:
Processing massive datasets quickly requires warehouse-scale computers. Furthermore, many massive datasets are multi-dimensional arrays which are stored in formats like HDF5 and NetCDF that cannot be directly queried using SQL. Parallel array database systems like SciDB cannot scale in this environment that offers fast networking but very limited I/O bandwidth to shared, cold storage: merely loading multi-TB array datasets in SciDB would take days--an unacceptably long time for many applications.
In this talk, we will present ArrayBridge, a common interoperability layer for array file formats. ArrayBridge allows scientists to use SciDB, TensorFlow and HDF5-based code in the same file-centric analysis pipeline without converting between file formats. Under the hood, ArrayBridge manages I/O to leverage the massive concurrency of warehouse-scale parallel file systems without modifying the HDF5 API and breaking backwards compatibility with legacy applications. Once the data has been loaded in memory, the bottleneck in many array-centric queries becomes the speed of data repartitioning between different nodes. We will present an RDMA-aware data shuffling abstraction that directly converses with the network adapter in InfiniBand verbs and can repartition data up to 4X faster than MPI. We conclude by highlighting research opportunities that need to be overcome for data processing to scale to warehouse-scale computers.
Short Bio:
Spyros Blanas is an assistant professor in the Department of Computer Science and Engineering at The Ohio State University. His research interest is high-performance database systems, and his current goal is to build a database system for high-end computing facilities. He has received the IEEE TCDE Rising Star Award and a Google Research Faculty award. He received his Ph.D. at the University of Wisconsin–Madison and part of his Ph.D. dissertation was commercialized in Microsoft's flagship data management product, SQL Server, as the Hekaton in-memory transaction processing engine.
---