Home

News: Bench19 Call for Benchmarks (Deadline Extended to August 30, Denver, US). 2019 AI Competitions (500K RMB prize!), New papers on AIBench (TR, Bench18), HPC AI500, AIoT Bench, Edge AIBench. AI algorithm and system testbed online!

Summary

As architecture, system, data management, and machine learning communities pay greater attention to innovative big data and AI or machine learning algorithms, architecture, and systems, the pressure of benchmarking rises.

First, modern industry scale applications like Internet services adopt a microservice-based architecture, expand and change very fast, and it is not scalable or even impossible to create a new benchmark or proxy for every possible workload, and hence we need understand what are the most time-consuming classes of unit of computation among big data and AI workloads. Meanwhile, for co-design of software and hardware, we need simple but elegant abstractions that help achieve both efficiency and general-purpose.

Second, a realistic benchmark suite should have the ability to run not only collectively as a whole end-to-end application to discover the time breakdown of different modules but also individually as a micro or component benchmark for fine tuning hot spot functions or kernels. In addition, from an architectural perspective, porting a full-scale application to a new architecture at an earlier stage is difficult and even impossible, while using micro or component benchmarks alone are insufficient to discover the time breakdown of different modules and locate the bottleneck within a realistic application scenario at a later stage. Third, data sets have great impacts on system and microarchitectural characteristics (our CGO 18 paper), so diverse data inputs should be considered. Last but not least, the benchmarks should be consistent across different communities.

We specify the common requirements of Big Data and AI only algorithmically in a paper-and-pencil approach, reasonably divorced from individual implementations. We capture the differences and collaborations among IoT, edge, datacenter and HPC in handling Big Data and AI workloads. We consider each big data and AI workload as a pipeline of one or more classes of units of computation performed on initial or intermediate data inputs, each of which we call a data motif. For the first time, among a wide variety of big data and AI workloads, we identify eight data motifs (our PACT 18 paper) ---including Matrix, Sampling, Logic, Transform, Set, Graph, Sort, and Statistic computation, each of which captures the common requirements of each class of unit of computation. Other than creating a new benchmark or proxy for every possible workload, we propose using data motif-based benchmarks---the combination of eight data motifs---to represent diversity of big data and AI workloads.

To achieve the consistency of benchmarks across different communities, we absorb state-of-the-art and state-of-the-practice algorithms from the machine learning communities that considers the model’s prediction accuracy. For the benchmarking requirements of system and data management communities, we provide diverse implementations using the state-of-the-art and state-of-the-practice techniques.

Consequently, we release an open-source big data benchmark suite---BigDataBench and four open-source AI benchmark suites for datacenters, HPC, edge, and IoT, including AIBench, HPC AI500, Edge AIBench, and AIoT Bench, respectively.

BigDataBench is widely used in both academia and industry. The current version BigDataBench 5.0 provides 13 representative real-world data sets and 24 benchmarks. The benchmarks cover six workload types including online services, offline analytics, graph analytics, data warehouse, NoSQL, and streaming from three important application domains, Internet services (including search engines, social networks, e-commerce), multimeida processing, and bioinformatics. Our benchmark suite includes micro benchmarks, each of which is a single data motif, components benchmarks, which consist of the data motif combinations, and end-to-end application benchmarks, which are the combinations of component benchmarks. Meanwhile, data varieties are considered with the whole spectrum of data types including structured, semi-structured, and unstructured data. Currently, the included data sources are text, graph, table, and image data. Using real data sets as the seed, the data generators---BDGS---generate synthetic data by scaling the seed data while keeping the data characteristics of raw data.

AIBench is the first industry standard Internet service AI benchmark suite. AIBench provides a highly extensible, configurable, and flexible benchmark framework that contains loosely coupled modules like data input, prominent AI problem domain, online inference (i.e., AI-as-a-service), offline training, and automatic deployment tool modules, which we call the AIBench framework. We identify sixteen prominent AI problem domains, including image classification, image generation, text-to-text translation, image-to-text, image-to-image, speech-to-text, face embedding, 3D face recognition, object detection, video prediction, image compression, recommendation, 3D object reconstruction, text summarization, spatial transformer, and learning to rank, each of which forms an AI component benchmark, from three most important Internet services domains: search engine, social network, and e-commerce. In total, the current version AIBench 1.0 provides 16 representative data sets, 12 micro benchmarks, and 16 component benchmarks. The benchmarks are implemented not only based on main-stream deep learning frameworks like TensorFlow and PyTorch, but also based on traditional programming model like Pthreads, to conduct an apple-to-apple comparison.

Modern Internet service providers widely combine big data and AI techniques to augment services. Meanwhile, mixed workloads are widely deployed to improve system utilization and save cost, and the throughput of latency-critical workloads is dominated by their worst-case performance---tail latency. To model this important application scenario, we propose an end-to-end application benchmark---DCMix to generate mixed workloads, including Big Data and AI workloads, whose latencies range from microseconds to minutes with four mixed execution modes.

Modern Internet services workloads are notoriously complex in terms of industry-scale architecture fueled with big data and machine learning algorithms. On the basis of the AIBench framework, abstracting the real-world data sets and workloads from Alibaba, we design and implement the first end-to-end Internet service AI benchmark, which contains the primary modules in the critical paths of an industry scale application and is scalable to deploy on different cluster scales.

As the first HPC AI benchmark suite using real world data sets and HPC AI applications, HPC AI500 covers three real-world HPC applications, including extreme weather analysis, high energy physics, and cosmology. In total, it includes 3 representative scientific data sets, 3 micro benchmarks, and 4 component benchmarks. The micro benchmarks are implemented using CUDA and MKL, while the component benchmarks are implemented using TensorFlow and Pytorch.

As the first End-to-end Edge AI benchmark suite, Edge AIBench covers four application scenarios including ICU Patient Monitor, Surveillance Camera, Smart Home, and Autonomous Vehicle. In total, it provides 5 representative real-world data sets and 16 benchmarks, consisting of 8 micro benchmarks and 8 component benchmarks.

Towards benchmarking Mobile and Embedded device Intelligence, AIOT Bench provides 3 representative real-world data sets and 12 benchmarks. The benchmarks cover 3 application domains, including image recognition, speech recognition and natural language processing. It consists of 9 micro benchmarks and 3 component benchmarks. It covers different platforms, including Android devices and Raspberry Pi. It covers different development tools, including TensorFlow and Caffe2.

For the architecture community, whatever early in the architecture design process or later in the system evaluation, it is time-consuming to run a comprehensive benchmark suite. The complex software stacks of the big data and AI workloads aggravate this issue. To tackle this challenge, we propose the data motif-based simulation benchmarks (our IISWC 18 paper) for architecture communities, which speed up runtime 100 times while preserving system and micro-architectural characteristic accuracy. Also, we propose another methodology to reduce the benchmarking cost, we select a small number of representative benchmarks, called the BigDataBench subset according to workload characteristics from an architecture perspective. We provide the BigDataBench architecture subset (our IISWC 14 paper) on the MARSSx86, gem5, and Simics simulator versions, respectively.

Together with several industry partners, including Telecom Research Institute Technology, Huawei, Intel (China), Microsoft (China), IBM CDL, Baidu, Sina, INSPUR, ZTE and etc. We also release China's first industry standard big data benchmark suitte---BigDataBench-DCA.

Contributors

Prof. Jianfeng Zhan, ICT, Chinese Acadmey of Sciences, and BenchCouncil    
Dr. Wanling Gao, ICT, Chinese Acadmey of Sciences    
Dr. Lei Wang, ICT, Chinese Academy of Sciences    
Dr. Chen Zheng, ICT, Chinese Academy of Sciences, and BenchCouncil    
Dr. Xinhui Tian, ICT, Chinese Academy of Sciences
Rui Ren, ICT, Chinese Academy of Sciences
Dr. Rui Han, ICT, Chinese Academy of Sciences
Chunjie Luo, ICT, Chinese Academy of Sciences
Fan Zhang, ICT, Chinese Academy of Sciences
Cheng Huang, ICT, Chinese Academy of Sciences
Xingwang Xiong, ICT, Chinese Academy of Sciences
Jianan Chen, ICT, Chinese Academy of Sciences
Tianshu Hao, ICT, Chinese Academy of Sciences
Zihan Jiang, ICT, Chinese Academy of Sciences
Fanda Fan, ICT, Chinese Academy of Sciences
Mengjia Du, ICT, Chinese Academy of Sciences
Yunyou Huang, ICT, Chinese Academy of Sciences
Xu Wen, ICT, Chinese Academy of Sciences
Xiwen He, ICT, Chinese Academy of Sciences
Tong Wu, China National Institute of Metrology
Runsong Zhou, China Software Testing Center
Dr. Zheng Cao, Alibaba     
Hainan Ye, Beijing Academy of Frontier Sciences and BenchCouncil     
Dr. Zhen Jia, Princeton University and BenchCouncil
Daoyi Zheng, Baidu     
Shujie Zhang, Huawei     
Haoning Tang, Tencent     
Dr. Yingjie Shi
Zijian Ming, Tencent     
Yuanqing Guo, Sohu    
Yongqiang He, Dropbox
Kent Zhan, Tencent (Previously), WUBA(Currently)    
Xiaona Li, Baidu    
Bizhu Qiu, Yahoo!
Qiang Yang, BAFST    
Jingwei Li, BAFST    
Dr. Gang Lu, BAFST
Xinlong Lin, BAFST    
Jiahui Dai, Beijing Academy of Frontier Sciences and BenchCouncil     
Dr. Biwei Xie, China RISC-V Alliance
Wei Li, Cambricon
Xiaoyu Wang, Intellifusion
Dr. Kai Hwang, Chinese University of Hongkong at Shenzhen
Dr. Zujie Ren, Zhejiang Lab
Dr. Yuchen Zhang, State University of New York at Buffalo
Dr. Xiaoyi Lu, Department of Computer Science and Engineering, The Ohio State University
Dr. Yunquan Zhang, National Supercomputing Center in Jinan, China
Dr. Shengzhong Feng, National Supercomputing Center in Shenzhen, China
Dr. Kenli Li, National Supercomputing Center in Changsha, China
Dr. Weijia Xu, Texas Advanced Computing Center, The Texas University at Austin

Numbers

We release two performance numbers using AIBench and AIoT Bench. The results will be updated per month.

Benchmark Methodology

We specify the common requirements of Big Data and AI only algorithmically in a paper-and-pencil approach, reasonably divorced from individual implementations. We capture the differences and collaborations among IoT, edge, datacenter and HPC in handling Big Data and AI workloads. We consider each big data and AI workload as a pipeline of one or more classes of units of computation performed on initial or intermediate data inputs, each of which we call a data motif. Other than creating a new benchmark or proxy for every possible workload, we propose using data motif-based benchmarks—the combination of eight data motifs—to represent diversity of big data and AI workloads. Figure 1 summarizes our data motif-based scalable benchmarking methodology.

Figure 1 BenchCouncil Benchmarking Methodology.

Benchmark Models

We provide three benchmark models for evaluating hardware, software system, and algorithms, respectively.

(1) The BenchCouncil intact Model Division. This model is for hardware benchmarking. The users should run the implementation on their hardware directly without modification. The only allowed tuning includes hardware, OS and compiler settings.
(2) The BenchCouncil constrained Model Division. This model is for software system benchmarking. The division specifies the model to be used and restricts the values of hyper parameters, e.g., batch size and learning rate. The users can implement the algorithms on their software platforms or frameworks by themselves.
(3) The BenchCouncil free Model Division. This model is for algorithm benchmarking. The users are specified with using the same data set, with the emphasis being on advancing the state-of-the-art of algorithms.

Metrics

For the BenchCouncil intact Model Division, the metrics include the wall clock time and energy efficiency to run benchmarks.

For the BenchCouncil constrained model division, the metrics include the wall clock time and energy efficiency to run benchmarks. In addition, the values of hyper parameters should be reported for audition.

For the BenchCouncil free model division, the metrics include the accuracy, and the wall clock time and energy efficiency to run benchmarks.

Q &A

More questions & answers are available from the handbook of BigDataBench.

Contacts (Email)

  • gaowanling@ict.ac.cn
  • luochunjie@ict.ac.cn
  • wangle_2011@ict.ac.cn
  • zhanjianfeng@ict.ac.cn

License

BigDataBench and other benchmark suites are available for the researchers interested in big data or AI. Software components of BigDataBench, AIBench, and other benchmark suitesare all available as open-source software and governed by their own licensing terms. Researchers intending to use BigDataBench are required to fully understand and abide by the licensing terms of the various components. BigDataBench is open-source under the Apache License, Version 2.0. Please use all files in compliance with the License. Our BigDataBench Software components are all available as open-source software and governed by their own licensing terms. If you want to use our BigDataBench you must understand and comply with their licenses. Software developed externally (not by BigDataBench group)

Software developed internally (by BigDataBench group) BigDataBench_4.0 License BigDataBench_4.0 Suite Copyright (c) 2013-2018, ICT Chinese Academy of Sciences All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistribution of source code must comply with the license and notice disclaimers
  • Redistribution in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimers in the documentation and/or other materials provided by the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE ICT CHINESE ACADEMY OF SCIENCES BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.