What Is DCBench

DCBench is a benchmark suite for data center workloads. The first release provides 19 representative workloads come from data center systems. The benchmark suite provides diverse kinds of workloads (on-line & off-line) and with different programming models (MPI, MapReduce, and etc.) and programming languages.

Who can use DCBench

DCBench is available for researchers interested in pursuing research in the field of data centers. Software components of DCBench are all available as open-source softwares and governed by their own licensing terms. Researchers intending to use DCBench are required to fully understand and abide by the licensing terms of the various components. For now, DCBench is open-source under the Aapche License, Version 2.0. Please use all files in compliance with the License.

Key Features

A benchmark suite must have a target class of machines and a target class of applications. For DCBench, the target class of machine is data center, and the target class of applications is the application running on the data center systems. So, such a benchmark suite should meet the following key features.

Representative Workloads

A benchmark suite should include representative workloads in the target class machines. There are many benchmark suites for special fields, such as SPEC CPU for processors, SPEC Web for Web servers. The workloads in those benchmarks are all representative in their own field. In data centers, applications range from simple reporting to deep data mining and only those workloads can reflect the real performance of the target class of machines.

Diverse Programming Models

In data centers, there are a large amount of programming models, e.g., MapReduce, Dryad, for developers or users to write applications, since there is no one-fit-all solution. Different programming models will have great effects on the performance behaviors. So the benchmark suite should be based on different programming models, which can characterize the target class of machines more comprehensively.

Distributed

The target class of machine is data center. Data center is a large distributed environment, e.g., Google’s data center has about 450000 nodes. In general, most of workloads in data centers are distributed on several nodes. An application running on a single computer cannot represent the applications in real world data centers.

Employ State-of-art Techniques

In data centers, workloads change frequently, which is called workload churns. So the benchmark suite should include recently used and emerging techniques in different domains.

 

Workloads

The workloads in data center can be classified into two categories, on-line workloads and off-line workloads. The on-line workloads are services, which is driven by the request. The off-line workloads are data processing or data analysis workloads, which is driven by the input data set. Our benchmark suite includes both of them.

Category Workloads Programming model language source
Basic operation Sort MapReduce Java Hadoop
Wordcount MapReduce Java Hadoop
Grep MapReduce Java Hadoop
Classification Naïve Bayes MapReduce Java Mahout
Support Vector Machine MapReduce Java Implemented by ourselves
Cluster K-means MapReduce Java Mahout
MPI C++ IBM PML
Fuzzy k-means MapReduce Java Mahout
MPI C++ IBM PML
Recommendation Item based Collaborative Filtering MapReduce Java Mahout

Association rule mining

Frequent pattern growth MapReduce Java Mahout
Segmentation Hidden Markov model MapReduce Java Implemented by ourselves
Warehouse operation Database operations MapReduce Java Hive-bench
Feature  reduction Principal Component Analysis MPI C++ IBM PML
Kernel Principal Component Analysis MPI C++ IBM PML
Vector calculate Paper similarity analysis All-Pairs C&C++ Implemented by ourselves
Graph mining Breadth-first search MPI C++ Graph500
Pagerank MapReduce Java Mahout
Service Search engine[1] C/S Java Nutch
Auction C/S Java Rubis
Interactive real-time application Media streaming[2] C/S Java Cloudsuite

 

Downloads

The whole DCBench manual can  download: User’s Manual

For the benchmark suite is consisted by several parts. Users can download them separately.


1. DCBench-hadoop

Hadoop based benchmarks in DCBench, including 14 hadoop workloads and corresponding data set.  [download]

A mini version of this part is also provided. This version includes a small part of data set. Users can only use the small data set if there are multiple data set mentioned in the user manual. [download]

2. DCBench-MPI

MPI based benchmarks in DCBench, including 5 benchmarks writing by MPI [download]

3. DCBench-All-pairs

Allpairs based benchmark in DCBench. It is a vector calculation application. [download]

4. Search

A search server service benchmark bases on Nutch. [user manual]

Search benchmark [download]

Index and segment data package [download] (users should download it and put it into Search home directory before deploying the benchmark)

5. Media streaming

A Media streaming benchmark.  (We get Media streaming benchmark from here, user can find more information from the website) [download] [user manual]

6. XCP image

We also provide some XCP (Xeon Cloud Platform) images. Users can download and deploy them on their own XCP environments. How to deploy and use it can by find in  [user manual]

XCP images: search frontendsearch backendMedia streaming

 

Publications

Characterizing data analysis workloads in data centers.[PDF] [Slides] 

Zhen Jia, Lei Wang, Jianfeng Zhan, Lixin Zhang, Chunjie Luo.

2013 IEEE International Symposium on Workload Characterization (IISWC 2013) (Best paper award)

Characterization of Real Workloads of Web Search Engines.[PDF]

Huafeng xi, Jianfeng Zhan, et al.

2011 IEEE International Symposium on Workload Characterization(IISWC 2011)

DCBench: A benchmark suite for data center.[PDF]

Zhen Jia, Jianfeng Zhan, Lei Wang, Lixin Zhang and etc.

The 19th IEEE International Symposium on High Performance Computer Architecture (HPCA 2013) Tutorial

 

News

  • The paper, which characterized DCBench, has been awarded as the best paper in IISWC 2013. [PDF] [Slides]
  • DCBench 1.0 Release
  • The paper, Characterizing data analysis workloads in data centers, has been chosen as best paper nominee by IISWC 2013
  • A tutorial at HPCA 2013(2013-02-24)
  • A paper, which characterize the workloads in DCBench, is received by IISWC 2013
  • Ph.D candidate Zhen Jia gave a presentation at the 2nd BPOE[PPT]

People

Contact Us

Email:

jiazhen@ict.ac.cn

wl@ncic.ac.cn

 


[1] For search We also have a XCP (Xen Cloud Platform) version image.

[2] For media streaming we also have a XCP (Xen Cloud Platform) version image.