Tutorial: BigDataBench 4.0

--- a Scalable and Unified Big Data and AI (Artificial Intelligence) Benchmark suite

This tutorial is aimed at presenting BigDataBench 4.0---a scalable and unified big data and AI (artificial intelligence) benchmark suite. We are glad to introduce the following four interesting topics:

(1) The challenges in defining modern datacenter workloads;

(2) The eight data motifs that serves as the primitives in characterizing comprehensive big data and AI workloads;

(3) BigDataBench 4.0: a Scalable and Unified Big Data and AI benchmark suite;

(4) The Big Data and AI proxy benchmarks that shortens the execution time by 100s times.

Location and Date

We will give a tutorial on BigDataBench at HPCA 2019 in Washington D.C., USA.

February 16, 2019 (Saturday),09:00 - 12:00 (Half Day)

ROOM:TBD

Organizers and Presenters

Organizer: Jianfeng Zhan ICT, Chinese Academy of Sciences, and University of Chinese Academy of Sciences
Presenter: Jianfeng Zhan ICT, Chinese Academy of Sciences, and University of Chinese Academy of Sciences
Presenter: Chen Zheng ICT, Chinese Academy of Sciences, and University of Chinese Academy of Sciences
Presenter: Wanling Gao ICT, Chinese Academy of Sciences, and University of Chinese Academy of Sciences

Abstract

As a multi-discipline, i.e., system, architecture, data management and machine learning, research and engineering effort from both industry and academia, BigDataBench (IISWC' 13, HPCA' 14, PACT' 16, TPDS' 17, PACT' 18, IISWC' 18) is a scalable and unified big data and AI benchmark suite. It is roughly estimated that there are 400+ published paper citing or using BigDataBench since 2014.

In this tutorial, first, we will present the challenges in defining modern datacenter workloads that are the foundation of system and architecture research;Second, we report our summary work on identifying eight data motifs from comprehensive big data and AI workloads, each of which captures the common requirements of each class of unit of computation while being reasonably divorced from individual implementations. Third, we present BigDataBench --- a scalable and unified Big Data and AI benchmark suite. Our benchmark suite includes micro benchmarks, each of which is a single data motif, components benchmarks, which consists of the data motif combinations, and end-to-end application benchmarks, which are the combinations of component benchmarks.

The current version — BigDataBench 4.0 is significant upgrade, and provides 13 representative real-world data sets and 47 benchmarks, including online services, offline analytics, graph analytics, AI, data warehouse, NoSQL, and streaming workloads. Finally, we propose using the combinations of the eight data motifs with different weights to mimic the benchmarks in BigDataBench. Our proxy benchmarks shorten the execution time by 100s times on the real systems while they are qualified for both earlier architecture design and later system evaluation across different architectures.

Schedule

08:30-09:30 Introduction of BigDataBench 4.0 & Benchmarking Methodology [PDF]
09:30-10:00 How to use BigDataBench 4.0 [PDF]
10:00-10:30 — Coffee break —
10:30-11:15 Big data and AI proxy benchmarks for simulation [PDF]

Publications

BigDataBench: a Scalable and Unified Big Data and AI Benchmark Suite. [PDF]
Wanling Gao, Jianfeng Zhan, Lei Wang, Chunjie Luo, Daoyi Zheng, Rui Ren, Chen Zheng, Gang Lu, Jingwei Li, Zheng Cao, Shujie Zhang, and Haoning Tang. Technical Report, arXiv preprint arXiv:1802.08254, January 27, 2018.

BOPS, Not FLOPS! A New Metric and Roofline Performance Model For Datacenter Computing. [PDF]
Lei Wang, Jianfeng Zhan, Wanling Gao, Zihan Jiang, Rui Ren, Xiwen He, Chunjie Luo, Gang Lu, Jingwei Li. Technical Report, arXiv preprint arXiv:1801.09212, May 3, 2018.

Data Motifs: A Lens Towards Fully Understanding Big Data and AI Workloads. [PDF]
Wanling Gao, Jianfeng Zhan, Lei Wang, Chunjie Luo, Daoyi Zheng, Fei Tang, Biwei Xie, Chen Zheng, Xu Wen, Xiwen He, Hainan Ye and Rui Ren. The 27th International Conference on Parallel Architectures and Compilation Techniques (PACT 2018).

BigDataBench: a Big Data Benchmark Suite from Internet Services. [PDF]
Lei Wang, Jianfeng Zhan, Chunjie Luo, Yuqing Zhu, Qiang Yang, Yongqiang He, WanlingGao, Zhen Jia, Yingjie Shi, Shujie Zhang, Cheng Zhen, Gang Lu, Kent Zhan, Xiaona Li, and Bizhu Qiu. The 20th IEEE International Symposium On High Performance Computer Architecture (HPCA-2014), February 15-19, 2014, Orlando, Florida, USA.

Data Motif-based Proxy Benchmarks for Big Data and AI Workloads. [PDF]
Wanling Gao, Jianfeng Zhan, Lei Wang, Chunjie Luo, Zhen Jia, Daoyi Zheng, Chen Zheng, Xiwen He, Hainan Ye, Haibin Wang, and Rui Ren. 2018 IEEE International Symposium on Workload Characterization (IISWC 2018).

Understanding Big Data Analytics Workloads on Modern Processors. [PDF]
Zhen Jia, Jianfeng Zhan, Lei Wang, Chunjie Luo, Wanling Gao, Yi Jin, Rui Han and Lixin Zhang. IEEE Transactions on Parallel and Distributed Systems, 28(6), 1797-1810, 2017.

Understanding Processors Design Decisions for Data Analytics in Homogeneous Data Centers. [PDF]
Zhen Jia, Wanling Gao, Yingjie Shi, Sally A. McKee, Jianfeng Zhan, Lei Wang, Lixin Zhang. IEEE Transactions on Big Data, 2017.

A Dwarf-based Scalable Big Data Benchmarking Methodology. [PDF]
Wanling Gao, Lei Wang, Jianfeng Zhan, Chunjie Luo, Daoyi Zheng, Zhen Jia, Biwei Xie, Chen Zheng, Qiang Yang, and Haibin Wang. arXiv preprint arXiv: 1711.03229

Characterizing data analysis workloads in data centers. [PDF]
Zhen Jia, Lei Wang, Jianfeng Zhan, Lixin Zhang, Chunjie Luo. 2013 IEEE International Symposium on Workload Characterization (IISWC 2013) (Best paper award).

Characterizing and Subsetting Big Data Workloads.[PDF]
Zhen Jia, Lei Wang, Jianfeng Zhan, Lixin Zhang, Chunjie Luo, Ninghui Sun. 2014 IEEE International Symposium on Workload Characterization (IISWC 2014)

Identifying Dwarfs Workloads in Big Data Analytics.  [PDF]
W Gao, C Luo, J Zhan, H Ye, X He, L Wang, Y Zhu, X Tian. 
arXiv preprint arXiv:1505.06872

BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking. [PDF]
Zijian Ming, Chunjie Luo, Wanling Gao, Rui Han, Qiang Yang, Lei Wang, and Jianfeng Zhan. In Advancing Big Data Benchmarks (pp. 138-154). Springer International Publishing.

Biographies

Jianfeng Zhan
Jianfeng Zhan is a Professor of Computer Science and Engineering at Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences. Since 2002, he has been working on datacenter computing benchmarks, OS, resource management, programming models, performance optimization, and system availability. He has published over 100 papers in major journals and international conferences related to these research areas, and filed 40 patents. From 2004 to 2010, he leaded the R&D efforts of innovative cluster and cloud systems software for the dawning-series super computers (which ranked top 2 and top 10 on the top 500 list in 2010 and 2004, respectively). Among them, GridView was transferred to Sugon, which is a premier supercomputing company in China, and becomes its popular software product. Currently, he is leading the research efforts for modern datacenter software stacks, including BigDataBench---an open source big data and AI benchmark suite, and RainForest--- an operating system for datacenter computing. He received the second-class Chinese National Technology Promotion Prize in 2006, the Distinguished Achievement Award of the Chinese Academy of Sciences in 2005, IISWC Best paper award in 2013, and Huawei Contribution Prize in 2013, respectively. More details about Prof. Zhan are available at http://prof.ict.ac.cn/jfzhan

Chen Zheng
Chen Zheng is a post doc researcher at the Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences. His research focuses on Operating System, Virtualization, benchmarks, and data center workload characterization. He received his PHD degree in 2017 from Institute of Computing Technology in China.

Wanling Gao
Wanling Gao is a Ph.D candidate in computer science at the Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences. Her research interests focus on big data benchmark and big data analytics. She received her B.S. degree in 2012 from Huazhong University of Science and Technology.

Relate Links

BigDatabench http://prof.ict.ac.cn/BigDataBench/