Courses

Datacenter Network Monitoring and Management

Course Description:

The seminar focuses on understanding sources of failure and preformance issues ina datacenter network as well as on solutions how to monitor these. The goal is to have a broad understanding of tooling and practices in monitoring network health. We will work to establish a problem-solving mindset and build skills to tackle netwrk performance problems.

The course is build around a set of papers which cover the landscape of monitoring solutions in datacenter managament. As grading references there are three reports to be submitted, each report coveing a subset of papers. Together we will work on establishng a clear report structure and set of questions to answer. 

Schedule:

Date Topic Documments
Feb-18 Introduction  Slides
Feb 25 Failures [1,2,3] 
Mar 03 SNMP [4] Slides
Mar 10 Self study   Report A instructions
Mar 17

Counters

Report A submission

 [5] Paper notes

FlowRadar notes

Mar 24 Applications  [6] Paper notes
Mar 31 Tomography  [7] Paper notes
Apr 07 Self study  Report B instructions
Apr 14 Easter break  
Apr 21

Probing 1

 [8,9]
Apr 28 Probing 2  [10] Paper notes Slides
May 05

Probing 3

Report B submission

 [11] Paper notes Slides

 

May 12

Self study 

 Report C instructions
May 19 Presentation discussion  [10 Slide notes] [11 Slide notes]
May 26

Mirroring & Triggers

 [12] Paper notes

 [13] Paper notes

June 2

Report C submission

 

Support papers:

How to read a paper pdf
How to read a research paper pdf 

Mark Handley animation YouTube video

Overview of discussed papers 

Papers: 

[1] Understanding and Mitigating Packet Corruption in Data Center Networks (2017) Paper Sections: 2,3,4

[2] Understanding Network Failures in Data Centers: Measurement, Analysis, and Implications (2011) Paper Sections: 2,4

[3] Evolve or Die: High-Availability Design Principles Drawn from Google’s Network Infrastructure (2016) Paper Sections: 4,5,6

[4] SNMP tutorial link 

[5] LossRadar: Fast Detection of Lost Packets in Data Center Networks (2016) Paper Sections: 1-6 and 8

[6] Passive realtime datacenter fault detection and localization (2017) Paper Sections: 1-5

[7] Netscope: practical network loss tomography(2010) Paper Sections: full paper

[8] Pingmesh: A Large-Scale System for Data Center Network Latency Measurement and Analysis (2015) Paper Sections: 1-3, 5 and 6.2

[9] NetNORAD: Troubleshooting networks via end-to-end probing (2016) Paper Sections: full blog

[10] NetBouncer: Active Device and Link Failure Localization in Data Center Networks (2019) Paper Sections: see shared doc

[11] Measuring and Troubleshooting Large Operational Multipath Networks with Gray Box Testing (2015) Paper Sections: see shared doc

[12] Packet-Level Telemetry in Large Datacenter Networks (2015) Paper Sections: 1,3-4, 7.1 and 7.2

[13] Trumpet: Timely and Precise Triggers in Data Centers (2016) Paper Sections: 2-4,5.1-5.5, only scan 6

[14] NetPilot: Automating Datacenter Network Failure Mitigation (2016) Paper Sections: 1-4

Optional reading 

FlowRadar: A Better NetFlow for Data Centers (2016) Paper Similar to FlowRadar 

deTector: a Topology-aware Monitoring System for Data Center Networks (2017) Paper Solution using probing 

Scalable Near Real-Time Failure Localization of Data Center Networks (2014) Paper Solution using probing

Big Data – Fall 2019

Latest information | Overview | Lecture and exercise times | Course material | People | Q&A

 

Latest information

  • Please sign up on myStudies for your exercise session.
  • You can do the exercises of the first week at home: follow the instructions to set up the accounts on Moodle, our online class room on Azure, and the Jupyter notebook server. If you need help, come to one of the support sessions in ML F 36 on Wednesday 18/10 or in CAB G 52 on Friday 20/10. Otherwise, enjoy your free time.

 

Lecture recording

The lectures are recorded by ETH multimedia services and are accessible through the video portal after some days (with your nETHZ credentials and only after the first recording has been uploaded).

 

TA session

We offer several slots. Check lecture and exercise times to see which slot is yours.

The lecture has a 2A components, meaning that you will have a lot of practical exposure to technology. The TAs will have plenty of practical exercises for you and will help you getting your computers set up in the exercise sessions. Please do not forget to bring your laptops with you to make the most out of it.

 

ETH EduApp

We will try to make the course a bit interactive, using the ETH ticker application during lectures. You can access it as a web app or install it on your smartphone (learn how).

We recommend trying out the ticker app already now so that the answering process goes smoothly in the lectures. One question about your background is already available as a test so that you can familiarize yourself with the app and answer your first question.

Hardware Acceleration for Data Processing (HADP) - Fall 2019

Course Material | Talks | Schedule | Seminar Hours | People


Overview

The seminar is intended to cover recent results in the increasingly important field of hardware acceleration for data science, both in dedicated machines or in data centers. The seminar aims at students interested in the system aspects of data processing who are willing to bridge the gap across traditional disciplines: machine learning, databases, systems, and computer architecture. The seminar should be of special interest to students interested in completing a master thesis or even a doctoral dissertation in related topics.

The seminar will start on September 17th with an overview of the general topics and the intended format of the seminar. Students are expected to present one paper in a 30 minute talk and complete a report (max 4 pages, excluding references) on the main idea of the paper and how they relate to the other papers presented at the seminar and the discussions around those papers. The presentation will be given during the semester in the allocated time slot. The report is due on the last day of the semester (20.12.2019)

Attendance to the seminar is mandatory to complete the credit requirements. Active participation is also expected, including having read every paper to be presented in advance and contributing to the questions and discussions of each paper during the seminar.


News

1) The first introductory class will take place on 17th September 2019 at 13:15 in ML J 34.1 followed by the opening talk at 14:15.

2) Selection of papers (3 papers max) and presentation dates (3 slots max) are expected to be ready by 24th September 2019. Please send your preferences to user_name[at]inf.ethz.ch, where user_name is amit[dot]kulkarni.

3) The deadline for report submission is on 20th December 2019. Please send in your report to the two email ids: user_name[at]inf.ethz.ch, where user_name is amit[dot]kulkarni and fabio[dot]maschi.

4) We received the reports from all students and the reports are intact.


Talks

Speaker Title Date
Prof. Gustavo Alonso Introduction to the seminar 17 Sep 13:15
Cedric Renggli SparCML: High-Performance Sparse Communication for Machine Learning 17 Sep 14:15
Dr. Muhsen Owaida Lowering the Latency of Data Processing Pipelines Through FPGA based Hardware Acceleration 24 Sep 13:15
David Dao Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms 24 Sep 14:15

Schedule

Name Paper Date
Pavllo Dario Crossbow: Scaling Deep Learning with Small Batch Sizes on Multi-GPU Servers 1 Oct 13:15
Athanasiadis Ioannis Accelerating Pattern Matching Queries in Hybrid CPU-FPGA Architectures 8 Oct 13:15
Aeschbacher Tobias Fine-Grained, Secure and Efficient Data Provenance on Blockchain Systems 8 Oct 14:15
Pascal Oberholzer HetExchange: Encapsulating heterogeneous CPU-GPU parallelism in JIT compiled engines 22 Oct 13:15
Jiang Tianjian A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services 22 Oct 14:15
Breitwieser Lukas KV-Direct: High-Performance In-Memory Key-Value Store with Programmable NIC 29 Oct 13:15
Bonaert Gregory A Cloud-Scale Acceleration Architecture 29 Oct 14:15
Severin Kistler Azure Accelerated Networking: SmartNICs in the Public Cloud 12 Nov 13:15
Sikonja Rok Efficiently Searching In-Memory Sorted Arrays: Revenge of the Interpolation Search? 12 Nov 14:15
Jaggi Akshay Analyzing Efficient Stream Processing on Modern Hardware 19 Nov 13:15
Onus Viviane FPGA-based High-Performance Parallel Architecture for Homomorphic Computing on Encrypted Data 19 Nov 14:15
Kolar Luka Speculative Distributed CSV Data Parsing for Big Data Analytics 26 Nov 13:15
Martsenko Kristina In-RDBMS Hardware Acceleration of Advanced Analytics 26 Nov 14:15
Pasquale Davide Schiavone Federated Learning: Challenges, Methods, and Future Directions 3 Dec 13:15
Alessandro Novello Orthogonal Security With Cipherbase 3 Dec 14:15

Seminar Hours

Tuesdays, 13:00-15:00 in ML J 34.1


People

Lecturers:

Teaching Assistants:

 

Advanced Systems Lab - Fall 2019

Course Organization & Materials

Find below the dates and details of tutorials (T) and exercises (E):

Date/Time Type Description Materials
 17. Sep. The first tutorial session will take place on Sept. 17th.  introTR
 19. Sep. The first exercise session will introduce the lab project. (CAB G52)  slides
 24. Sep. The second exercise session will take place in G61 instead of the tutorial. Exercise session will introduce Azure.   slides
 26. Sep. Exercise session on scripting and plotting. (CAB G52)  slides
 1. Oct. T The second tutorial session will take place on Oct. 1st.  slides
 3. Oct.  Exercise session on good and bad practices in Java middleware development.  slides
 8. Oct.  The third tutorial session will take place on Oct. 8th.  slides
 10. Oct. E  Exercise session on baseline measurements.  slides
 15. Oct. T The fourth tutorial session will take place on Oct. 15th.  slides
 17. Oct.  Exercise session on 2k analysis.  slides
 22. Oct. T  The fifth tutorial session will take place on Oct. 22nd.  slides
 24. Oct. E  Exercise session on Queueing theory.  slides
 31. Oct.  Exercise session on Queueing Networks.  slides
       
       
       
       

 

Project Details

Project Description: project

Report: report, report.tex

Programming: project-structure

Azure: template
 

 Project Deadline: Monday 16th December 2019, 17:00


Literature

"Art of Computer Systems Performance Analysis" - Raj Jain
John Wiley & Sons Inc; Auflage: 2 Rev ed. (21. September 2015)

"The Art of Computer Systems Performance Analysis" - Raj Jain
Wiley Professional Computing, 1991

From the 1st edition of particular relevance are the following chapters:

  • Chapters 1, 2, 3 (General introduction, Common terminology)
  • Chapters 4, 5, 6 (Workloads)
  • Chapter 10 (Data presentation)
  • Chapters 12, 13, 14 (Probability and statistics)
  • Chapters 16, 17, 18, 20, 21, 22 (Experimental design)
  • Chapters 30, 31, 32, 33, 36 (Queueing theory)

Lecturer

Gustavo Alonso

 TAs

Kaan Kara

Rodrigo Bruno

Office Hours: Thursdays 17:00-18:30


Course Hours

Tutorials: Tuesday, 17:00 – 19:00, CAB G 61.

Exercises: Thursday, 17:00 - 19:00 CAB G 52.

General Contact: sg-asl [at] lists.inf.ethz.ch


 

Office Hours

(To be announced later) 

Time
Assistant
   
   
   
   
   

 


 

FAQ

Q: The provided RunMW.java file contains an argument for sharded reads. Do we need to implement shaded reads in the Middleware?

A: No, just ignore this flag, i.e., always set it to false.

Q: How do we perform experiments with multiple servers/middlewares? (Confused by the sentence in the report outline: "All clients are connected to a single memcached instance.")

A: To benchmark multiple servers/middlewares you need to start multiple memtier processes.

Q: How do we implement the 'installShutdownHook' method in RunMW.java if we were told not to modify RunMW.java?

A: You can modify RunMW.java to properly implement the shutdown hook.

Q: Memtier configure installation step fails due to a missing ssl library. How should I proceed?

A: Compile Memtier with no ssl support ('./configure --disable-tls').

Q: Are four measurement points enough to reach conclusions about the system?

A: Yes, the given value range is enough to reach meaningful conclusions about the system. The given value range is the only ones that need to be plotted in the report. However, doing more (shorter) experiments to understand the behavior of the system to help with your explanations is encouraged.

Q: In Section 2.3, when selecting maximum throughput, should I consider requests/second or Bytes/second?

A: This is up to you. You need to clearly state which unit you pick when selecting the maximum throughput. The rest of the table has to be filled correspondingly.

Q: Does the title page (containing only the date, name, legi number etc.) count towards 35-page limit?

A: No, it does not.

 

Errata/Updates

  • Report Outline: In Section 2.3 of the project description, it is mentioned that students should collect statistics to produce a histogram. Ignore this paragraph as later sections do not ask for such histogram.
  • In the provided RunMW.java file there is a call to Thread.sleep right after the call to install the Shutdown Hook. This Thread.sleep call should be removed.
  • Report Outline: The sentence "All clients are connected to a single memcached instance" has been changed to "Each memtier instance is connected to a single middleware instance." to avoid confusion.
  • Report Outline: In Section 3.1.2, the number of worder threads was wrong (did not match the table). This has been fixed.
  • Report Outline: In Section 2.3, in the table, "Minumum Reponse Time" has been changed to "Corresponding Response Time".




 

 

Information Retrieval - Spring 2019

OUT OF DATE

 

People | Course MaterialCourse Catalogue

Overview:

This course gives an introduction to information retrieval with a focus on text documents and unstructured data. Main topics comprise document modelling, various retrieval techniques, indexing techniques, query frameworks, optimization, evaluation and feedback.

 

Lecture

Friday, 13:00 to 15:00 in HG E 5
Exercise

Friday, 15:00 to 16:00 in one of the following:
CAB G 52 - Surnames starting with A-Ma (TA: Andrea)
CAB G 56 - Surnames starting with Me-Z (TA: Valentino) 

 

 

  

Objective

We keep accumulating data at an unprecedented pace, much faster than we can process it. While Big Data techniques contribute solutions accounting for structured or semi-structured shapes such as tables, trees, graphs and cubes, the study of unstructured data is a field of its own: Information Retrieval.

After this course, you will have in-depth understanding of broadly established techniques in order to model, index and query unstructured data (aka, text), including the vector space model, boolean queries, terms, posting lists, dealing with errors and imprecision.

You will know how to make queries faster and how to make queries work on very large datasets. You will be capable of evaluating the quality of an information retrieval engine. Finally, you will also have knowledge about alternate models (structured data, probabilistic retrieval, language models) as well as basic search algorithms on the web such as Google's PageRank.

 

Literature

C. D. Manning, P. Raghavan, H. Schütze, Introduction to Information Retrieval, Cambridge University Press.

Prerequisites / Notice

Prior knowledge in elementary set theory, logics, linear algebra, data structures, abstract data types, algorithms, and probability theory (at the Bachelor's level) is required, as well as programming skills (we will use Python).

Piazza

We will use Piazza as the official forum for questions.
You can sign up here
Please post your questions here instead of sending e-mails. Many questions you ask are very good questions and/or questions that many others have, and that are thus of interest to everybody. 

 

 

 

Data Stream Processing and Analytics - Spring 2019

OverviewLectures | ExercisesReadings

Course Information

Code: 263-3826-00, 6 ECTS credits
Language of instruction: English

Lecturer: Vasiliki Kalavri (kalavriv@inf.ethz.ch) CAB E 73.1
Teaching Assistants: Zaheer Chothia (zchothia@inf.ethz.ch) and Michal Wawrzoniak (michal.wawrzoniak@inf.ethz.ch).

Lectures: Mondays, 10-12, CHN E42
Exercises: Mondays, 13-15, CHN F46
Submissions: Moodle

What is this course about? Lecture 0 is now available.

This course is generously supported by a Google Cloud Platform Education Grant.


Overview

Modern data-driven applications require continuous, low-latency processing of large-scale, rapid data events such as videos, images, emails, chats, clicks, search queries, financial transactions, traffic records, sensor measurements, etc. Extracting knowledge from these data streams is particularly challenging due to their high speed and massive volume.

Distributed stream processing has recently become highly popular across industry and academia due to its capabilities to both improve established data processing tasks and to facilitate novel applications with real-time requirements. In this course, we will study the design and architecture of modern distributed streaming systems as well as fundamental algorithms for analyzing data streams.

Specifically, we will cover the following topics:

  • Distributed streaming systems design and architectures
  • Fault-tolerance and processing guarantees
  • State management
  • Windowing semantics and optimizations
  • Basic data stream mining algorithms (e.g. sampling, counting, filtering)
  • Query languages and libraries for stream processing (e.g. Complex Event Processing, online machine learning)
  • Streaming applications and use-cases 
  • Modern streaming systems: Apache Flink, Apache Beam, Apache Kafka, Timely dataflow

Recommended readings


Exercise Sessions

The exercise sessions will be a mixture of (1) reviews, discussions, and evaluation of research papers on data stream processing, and (2) programming assignments on implementing data stream mining algorithms and analysis tasks.


Examination

The course consists of lectures, exercises, and a final semester project. There will be no formal examination at the end of the course. Students are continuously graded based on their participation in class (10%), weekly assignments (50%), and semester project (40%).

 

COMPUTER SYSTEMS - FALL 2018

What's new?

  • Exercise session registration is available.
  • No exercise session on September 21st. First exercise session will be on September 28th. 
  • Added mailing list for corrections, typos, and suggestion.
  • No lecture on September 28th. 
  • Bonus assignment guidelines accessible under course material. First deadline on November 1st. 
  • Small update to Bonus assignment guidelines. We will not discard any questions about the first half of the lecture for the second deadline, but we encourage you to submit a question that covers a topic of the second half. 
  • Chapter 14: Added definition (Def. 14.39) clarifying the difference between global and local clock skew.
  • Fixed Mistake in Virtual Machine & Network Stack exercise (last question). 
  • Uploaded some old exams for the Distrbuted Systems part (in course material).
  • Added some additional remarks for the Distributed Systems part (exam preparation)
  • There is no lecture or exercise session on December 21st.
  • Uploaded Q&A session questions and answers.


Overview

This course is about real computer systems, and the principles on which they are designed and built. We cover both modern OSes and the large-scale distributed systems that power today's online services. We illustrate the ideas with real-world examples, but emphasize common theoretical results, practical tradeoffs, and design principles that apply across many different scales and technologies. 

Since this is a new course, we are still "debugging" it and if you find any typos, corrections or have suggenstions on how to improve the script you can send them to the following mailing list: compsys-errata@lists.inf.ethz.ch

Lecturer

Staff

  • Manuel Eichelberger (manuelei at ethz) ETZ G97 (En/De)
  • Roni Häcki (roni.haecki at inf) CAB E69 (En/De)
  • Vasiliki Kalavri (vasiliki.kalavri at inf) CAB E73.1 (En)

Hilfsassistenten

  • Daniel Gstöhl (gstoehld at student.ethz.ch) 
  • Jonas Gude (jgude at student.ethz.ch)
  • Jakob Meier (jakmeier at student.ethz.ch)
  • Claudio Ferrari (ferraric at student.ethz.ch)
  • Amray Schwabe (schwabea at student.ethz.ch)

Course Hours

Lecture

  • Mon 10-12h, CAB G  61
  • Fri 10-12h, CAB G 61

Exercise

Time Room Language TA
 Fri 13-15  CHN D 48  en/de  Daniel  Gstöhl
 Fri 13-15  ETZ F 91  en/de  Jonas Gude
 Fri 13-15  ETZ K 91  en/de  Jakob Meier
 Fri 13-15  HG D 3.1  en/de  Claudio Ferrari
 Fri 13-15  HG D 3.3  en/de  Amray Schwabe

 

 

Hardware Acceleration for Data Processing (HADP) - Fall 2018

 Course Material | Talks | Schedule | Seminar Hours | People


Overview

The seminar is intended to cover recent results in the increasingly important field of hardware acceleration for data science, both in dedicated machines or in data centers. The seminar aims at students interested in the system aspects of data processing who are willing to bridge the gap across traditional disciplines: machine learning, databases, systems, and computer architecture. The seminar should be of special interest to students interested in completing a master thesis or even a doctoral dissertation in related topics.

The seminar will start on September 18th with an overview of the general topics and the intended format of the seminar. Students are expected to present one paper in a 30 minute talk and complete a 4 page report on the main idea of the paper and how they relate to the other papers presented at the seminar and the discussions around those papers. The presentation will be given during the semester in the allocated time slot. The report is due on the last day of the semester.

Attendance to the seminar is mandatory to complete the credit requirements. Active participation is also expected, including having read every paper to be presented in advance and contributing to the questions and discussions of each paper during the seminar.


NEWS 

     1) The first introductory class will take place on 18th September 2018 at 13:15 in ML J 34.1 followed by the opening talk at 13:45.

    2) Selection of papers (3 papers max) and presentation dates (3 slots max) are expected to be ready by 25th September 2018. Please send your preferences to user_name[at]inf.ethz.ch, where username is amit[dot]kulkarni  

    3) A preliminary schedule for the presentations is available under Schedule section.

    4) The first seminar talk starts on 9th October 2018, 13:15. The presentation duration will be of 30 mins + 15 mins Q & A.

    5) Deadline for the report submission is on 11th January 2019.


Talks

Speaker Title Date/Time
Prof. Gustavo Alonso  Introduction to the seminar

 18/09/2018,

13:15

David Sidler

Accelerating String Matching Queries
in Hybrid CPU-FPGA Architectures

18/09/2018,

13:45

Dr. Tal Ben-Nun  Demystifying
Parallel and Distributed
Deep Learning

 25/09/2018,

13:15

Cedric Renggli  SparCML: High-Performance Sparse
Communication for Machine Learning

  25/09/2018,

14:15

Prof. Torsten Hoefler  How to survive in this seminar?

 02/10/2018,

13:15

Prof. Ce Zhang  System Relaxations for First Order Methods:
A 45 Minutes Crash Course 
02/10/2018,

14:15 

 


 Schedule

NAME PAPER DATE MENTOR
 Skanda Koppula  “ImageNet Training in Minutes” 9 Oct 
13:15 
 Prof. Torsten Hoefler
 Emanuele Esposito  “RAPID: In-Memory Analytical
Query Processing Engine with
Extreme Performance per Watt”
16
Oct
13:15 
 Prof. Gustavo Alonso
 Nikolas Göbel  “Ray: A Distributed Framework for
Emerging AI Applications”
16
Oct
14:15 
 Prof. Ce Zhang
 Gishor Sivanrupan “Live Video Analytics at Scale
with Approximation and Delay-Tolerance”
23
Oct
13:15 
Dr. Mushen Owaida
 Taha Shahroodi  “A Many-core Architecture for
In-Memory Data Processing”
30
Oct
13:15 
 Prof. Gustavo Alonso
 Nikita Lazarev  “ESE: Efficient Speech Recognition
Engine with Sparse LSTM on FPGA”
13 Nov
13:15 
 Dr. Mushen Owaida
 Thomas Lang  “UDP: A Programmable Accelerator
for Extract-Transform-Load Workloads and More”
20 Nov
13:15 
 Prof. Gustavo Alonso
 Alain Denzler  “Toward Standardized Near-Data
Processing with Unrestricted Data
Placement for GPUs”
20 Nov
14:15 
 Dr. Mushen Owaida
 Florian Tschopp  “GraphR: Accelerating Graph
Processing Using ReRAM”
27 Nov
13:15 
 Dr. Mushen Owaida
 Nicolas Winkler  “Caribou: Intelligent Distributed Storage” 27 Nov 
14:15 
 Prof. Gustavo Alonso
       
       

 Seminar Hours

Tuesdays, 13:00-15:00 in ML J 34.1


 People

Lecturers:

Facilitator / TA:

 

 

Advanced Systems Lab - Fall 2018

Course Organization & Materials

Find below the dates and details of tutorials (T) and exercises (E):

Date/Time Type Description Materials
 18. Sep.  The first tutorial session will take place on Sept. 18th in CAB G61  slides
 20. Sep.  The first exercise session will take place on Sept. 20th in CAB G61. It gives an overview of the project.  slides
 25. Sep.  The second exercise session will take place on Sept. 25th in CAB G61. It focuses on Microsoft Azure, Bash and Git.  azure, bash
 27. Sep. The third exercise session will take place in individual groups. Please look up your assigned group in the table below.   slides
 2. Oct.  Tutorial session on the life cycle of an experiment.   slides, throughput
 4. Oct. E  Exercise session on good and bad practices in Java middleware development.  slides
 9. Oct.  There will be no tutorial session on October 9th.  
 11. Oct.  The exercise session will cover GnuPlot, and Baseline experiments without the Middleware.  slides
 16. Oct.   Tutorial session on planning experiments.  slides
 18. Oct. The exercise session will cover good and bad practices when generating plots, and Baseline experiments with the Middleware.  slides
 23. Oct.  Tutorial session on queueing theory. slides
 25. Oct.  Exercise session on 2K experiments. slides
 30. Oct.  Tutorial session on System Analysis. slides
 1st. Nov.  E  Exercise session on Queueing Theory slides
 6th Nov. T   All the Tutorials are finished for this semester.  
 8th Nov E  Exercise session on Network of Queues  slides
 15th Nov. E  The rest of the exercise sessions will be Q/A. No material presented.  

 

Project Details

Project Description: [Project Description]

Report: [Report Outline (pdf), (tex)]

Programming: [Java Main Class] [ANT Build File] [Bash Script Examples]

Azure: [Education Hub] [VM TemplateCaution: When creating the VMs, the machines are started automatically. Stop them if you do not run experiments right away.
 

Project Deadline: December 17th, 17:00, 2018. 

NOTE: THE DEADLINE TO DE-REGISTER FROM THE COURSE IS ON 14th OCTOBER 2018.


Literature

"Art of Computer Systems Performance Analysis" - Raj Jain
John Wiley & Sons Inc; Auflage: 2 Rev ed. (21. September 2015)

"The Art of Computer Systems Performance Analysis" - Raj Jain
Wiley Professional Computing, 1991

From the 1st edition of particular relevance are the following chapters:

  • Chapters 1, 2, 3 (General introduction, Common terminology)
  • Chapters 4, 5, 6 (Workloads)
  • Chapter 10 (Data presentation)
  • Chapters 12, 13, 14 (Probability and statistics)
  • Chapters 16, 17, 18, 20, 21, 22 (Experimental design)
  • Chapters 30, 31, 32, 33, 36 (Queueing theory)

 


Lecturer

Gustavo Alonso

 


Course Hours

Tutorials: Tuesday, 17:00 – 19:00, CAB G 61.

Exercises: Thursday, 17:00 - 19:00

General Contact: sg-asl [at] lists.inf.ethz.ch


Exercise Sessions

Exercises sessions are held on Thursday from 17:00 - 19:00 in small groups. In the exercise sessions, we answer high-level questions related to the project and the report.

 Assistant
 Room  Email Last names assigned
 Michel Mueller  CHN D42  muellmic [at] inf.ethz.ch  A-C

 Muhsen Owaida

 CHN D44  mewaida [at] inf.ethz.ch  D-He
 Alba Ríos Rodríguez  CHN D46  rialba [at] student.ethz.ch  Ho-L
 David Sidler  CAB G56  dasidler [at] inf.ethz.ch  M-Sa
 Kaan Kara   CAB G52  kkara [at] inf.ethz.ch  Sc-Z

 


Office Hours

Office hours are indented to provide you advice that will help you to complete the project and the report. To make an appointment, contact your teaching assistant by email.

  • Make sure you come prepared with concrete and well formulated questions. If possible, include them in your email.
  • We will not complete the assignment for you and neither recommend nor make design decisions on your behalf.
  • We will not debug your code, provide technical support for your setup/scripts/data analysis, or give hints about whether what you have done so far is enough.
  • We will not grade your project in advance, so please avoid questions that try to determine whether what you have done is correct or sufficient for a passing grade.
Time
Assistant
 Friday, 9:00-10:00, 13:00-14:00  Michel Mueller
 Thursday, 15:00 - 17:00   Muhsen Owaida
 Thursday, 9:00-11:00  Alba Ríos Rodríguez
 Thursday, 16:00-17:00, Friday 09:00-10:00  David Sidler
 Friday, 9:00-10:00, 13:00-14:00  Kaan Kara

 


FAQ / Tips

Q: Which Java version am I allowed to use?

A: Use Java 8.

Q: With how many threads should I run memcached?

A: Run memcached with a single thread.

Q: Can I adapt the Ant file to include the log4j library?

A: Yes, you can alter the Ant file as long as we are still able to build your middleware from a clean checkout. Note: You cannot use external libraries other than log4j.

Q: Where can I see how much money I spend on Azure?

A: Go to aka.ms/startedu and click on the "Courses" tab. Next click on the course "ASL 2018", then you should be able to see your lab and the credit assigned and consumed so far.

Q: I think that I am getting charged by Azure despite shutting down my VMs?

A: If you use the Azure console interface you have to use the "deallocate" command to deallocate the VMs, if you only use the "stop" command the VMs are not deallocated you are still charged for them. If in doubt, check in the online interface if the VMs are "stopped (deallocated)".

Q: How do I construct the histograms?

A: To construct the histograms from the middleware, you need to record the response time of every request with the precision of 100us. Note that the bucket size is not necessarily 100us, it can be larger. However, you need to have at least 10 buckets per histogram. To construct the histograms on the clients side, use the CDF generated by memtier and decide on the bucket size yourself. BUT make sure all the histograms from both the clients and middleware HAS THE SAME BUCKET SIZE.

Q: I get the error "read error: Connection reset by peer". How to solve it?

A:

- Make sure your Middleware has started and ready to receive connection requests before you start your memtier clients.

- Increase the backlog queue size (this is the allowed number of new connections waiting to be accepted). See this thread.

- Check if the network thread can handle the amount of incoming connections and does not run out of file descriptors. 

Q: For Section 6 (2K Analysis), do I have to perform two separate analyses for throughput and response time?

A: Yes, you need solve the linear equation system separately for throughput and response time.

Q: For Section 7, which exact configuration should I use for network of queues?

A: You need to consider both 1 and 2 middleware configuration. You need to analyze both read only and write only workloads. You can set the number of worker threads to the constant that delivers highest throughput.

Q: It is asked in the report outline to consider under-saturated, saturated and over-saturated states of our experiment runs. What if we don't observe these states clearly?

A: You may or may not observe some of these states. You still need to argue which of these states the system is in throughout your experiments.

Q: How do I select the maximum throughput as it is asked in the report outline?

A: It is important when determining the maximum throughput of your system to take the response time in consideration and how it is affected by the increase in load. Clearly explain the reasoning behind the selection.

Q: Do I need to plot the interactive law verification in all my figures?

A: No, this is not necessary. However, it is necessary to verify it and state that it holds in the report.

Q: What do I have to consider when calculating the interactive law for the middleware?

A: When calculating the throughput based on the response time measured in the middlware you have to adapt either a) N (number of clients/requests) or b) Z (think time). a) When only considering the middleware and servers as the system, the number of requests in the system (N) is smaller than the number of clients. Given your middleware measurements, you can determine the number of requests in the system. For this approach the think time is still ~0. b) You can take the RTT between client and middleware as the think time (Z). The RTT can be measured for instance with ping. For this approach N is still the number of clients.

Errata / Updates

  • Project Description: Section 3 5. 2k Analysis. Changed parameters to be consistent with Report Outline.
  • Report Outline: Table in Section 6 changed the entries for "Instances of memtier per machine" and "Threads per memtier instance". Reason: typo
  • Report Outline: Section 3 & 3.1 & 3.2 changed first sentence from "Connect one load generator machine.." to "Connect three load generator machines..".
  • Report Outline: Section 5 clarified the use and meaning of the --ratio parameter.
  • Azure Slides: Slide 16 changed from "openjdk-7-jdk" to "openjdk-8-jdk"

 

Big Data for Engineers 2018 Schedule

Lecture

Date Topic Slides Material
 20.02 1. Introduction    
 27.02  2. Lessons learnt