What Is DCBench
DCBench is a benchmark suite for data center workloads. The first release provides 19 representative workloads come from data center systems. The benchmark suite provides diverse kinds of workloads (on-line & off-line) and with different programming models (MPI, MapReduce, and etc.) and programming languages.
Who can use DCBench
DCBench is available for researchers interested in pursuing research in the field of data centers. Software components of DCBench are all available as open-source softwares and governed by their own licensing terms. Researchers intending to use DCBench are required to fully understand and abide by the licensing terms of the various components. For now, DCBench is open-source under the Aapche License, Version 2.0. Please use all files in compliance with the License.
A benchmark suite must have a target class of machines and a target class of applications. For DCBench, the target class of machine is data center, and the target class of applications is the application running on the data center systems. So, such a benchmark suite should meet the following key features.
A benchmark suite should include representative workloads in the target class machines. There are many benchmark suites for special fields, such as SPEC CPU for processors, SPEC Web for Web servers. The workloads in those benchmarks are all representative in their own field. In data centers, applications range from simple reporting to deep data mining and only those workloads can reflect the real performance of the target class of machines.
Diverse Programming Models
In data centers, there are a large amount of programming models, e.g., MapReduce, Dryad, for developers or users to write applications, since there is no one-fit-all solution. Different programming models will have great effects on the performance behaviors. So the benchmark suite should be based on different programming models, which can characterize the target class of machines more comprehensively.
The target class of machine is data center. Data center is a large distributed environment, e.g., Google’s data center has about 450000 nodes. In general, most of workloads in data centers are distributed on several nodes. An application running on a single computer cannot represent the applications in real world data centers.
Employ State-of-art Techniques
In data centers, workloads change frequently, which is called workload churns. So the benchmark suite should include recently used and emerging techniques in different domains.
The workloads in data center can be classified into two categories, on-line workloads and off-line workloads. The on-line workloads are services, which is driven by the request. The off-line workloads are data processing or data analysis workloads, which is driven by the input data set. Our benchmark suite includes both of them.
|Support Vector Machine||MapReduce||Java||Implemented by ourselves|
|Recommendation||Item based Collaborative Filtering||MapReduce||Java||Mahout|
Association rule mining
|Frequent pattern growth||MapReduce||Java||Mahout|
|Segmentation||Hidden Markov model||MapReduce||Java||Implemented by ourselves|
|Warehouse operation||Database operations||MapReduce||Java||Hive-bench|
|Feature reduction||Principal Component Analysis||MPI||C++||IBM PML|
|Kernel Principal Component Analysis||MPI||C++||IBM PML|
|Vector calculate||Paper similarity analysis||All-Pairs||C&C++||Implemented by ourselves|
|Graph mining||Breadth-first search||MPI||C++||Graph500|
|Interactive real-time application||Media streaming||C/S||Java||Cloudsuite|
The whole DCBench manual can download: User’s Manual
For the benchmark suite is consisted by several parts. Users can download them separately.
Hadoop based benchmarks in DCBench, including 14 hadoop workloads and corresponding data set. [download]
A mini version of this part is also provided. This version includes a small part of data set. Users can only use the small data set if there are multiple data set mentioned in the user manual. [download]
MPI based benchmarks in DCBench, including 5 benchmarks writing by MPI [download]
Allpairs based benchmark in DCBench. It is a vector calculation application. [download]
A search server service benchmark bases on Nutch. [user manual]
Search benchmark [download]
Index and segment data package [download] (users should download it and put it into Search home directory before deploying the benchmark)
5. Media streaming
6. XCP image
We also provide some XCP (Xeon Cloud Platform) images. Users can download and deploy them on their own XCP environments. How to deploy and use it can by find in [user manual]
Zhen Jia, Lei Wang, Jianfeng Zhan, Lixin Zhang, Chunjie Luo.
2013 IEEE International Symposium on Workload Characterization (IISWC 2013) (Best paper award)
Characterization of Real Workloads of Web Search Engines.[PDF]
Huafeng xi, Jianfeng Zhan, et al.
2011 IEEE International Symposium on Workload Characterization（IISWC 2011）
DCBench: A benchmark suite for data center.[PDF]
Zhen Jia, Jianfeng Zhan, Lei Wang, Lixin Zhang and etc.
The 19th IEEE International Symposium on High Performance Computer Architecture (HPCA 2013) Tutorial
- The paper, which characterized DCBench, has been awarded as the best paper in IISWC 2013. [PDF] [Slides]
- DCBench 1.0 Release
- The paper, Characterizing data analysis workloads in data centers, has been chosen as best paper nominee by IISWC 2013
- A tutorial at HPCA 2013(2013-02-24)
- A paper, which characterize the workloads in DCBench, is received by IISWC 2013
- Ph.D candidate Zhen Jia gave a presentation at the 2nd BPOE[PPT]