Downloads

Downloading user manuals

BigDataBench 3.0  user manual  [user manual 3.0]

Downloading raw data sets

Table 1: The Summary of Data Sets

Data sets Download Description
1 Wikipedia Entries Wiki.bz2 Size:[9.8GB]
2 Amazon Movie Reviews AMR.tar.gz Size:[3.1GB]
3 Google Web Graph GWG.bz2 Size:[23MB]
4 Facebook Social Network FSN.bz2 Size:[220KB]
5 E-commerce Transaction Data ECT.tar.gz Size:[3MB]
6 ProfSearch Person Resumes PPR.tar.gz Size:[182MB]
7 CALDA Data (synthetic data) Hive_benchmark.tar.gz Size:[257KB]
8 TPC-DS Web Data (synthetic data) TPCDS.tar.gz Size:[384KB]

Downloading software packages

We provide two options: download the full software package one time or download components one by one. Please note that you need to download and deploy prerequisite software packages before using BigDataBench.  Please refer to the user manual. The following packages should be installed firstly, and the running platform is Linux.

Software Version Download
Hadoop 1.0.2 http://hadoop.apache.org/#Download+Hadoop
HBase 0.94.5 http://www.apache.org/dyn/closer.cgi/hbase/
Cassandra 1.2.3 http://cassandra.apache.org/download/
MongoDB 2.4.1 http://www.mongodb.org/downloads
Mahout 0.8 https://cwiki.apache.org/confluence/display/MAHOUT/Downloads
Hive 0.9.0 https://cwiki.apache.org/confluence/display/Hive/GettingStarted #GettingStarted-InstallationandConfiguration
Spark 0.8.0 http://spark.incubator.apache.org/
Shark 0.8.0 http://shark.cs.berkeley.edu/
Impala 1.1.1 http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_install.html
MPICH 2.0 http://www.mpich.org/downloads/
Boost 1_43_0 http://www.boost.org/doc/libs/1_43_0/more/getting_started/unix-variants.html
Scala 2.9.3 http://www.scala-lang.org/download/2.9.3.html
GCC 4.8.2 http://gcc.gnu.org/releases.html
GSL 1.16 http://www.gnu.org/software/gsl/

Full downloading

Full software packages of different implementations are available from the following links:

Separate downloading

You may download different components of BigDataBench from the following Tables.

BDGSBig Data Generator Suite in BigDataBench

  Name Description
BDGS generates big data on the basis of six raw data sets Text BigDataGeneratorSuite.tar.gzSize: 95MB
Graph
Table

 

BigDataBench workloads.  Please note that each shell script for generating data and running workloads is included in the distribution.

 

Application Scenarios Application Type Workloads Description
 Cloud OLTP  Micro Benchmarks Read BasicDatastoreOperations.tar.gzSize: 95MB
Write
Scan
Applications Search Server Available soon
Offline Analytics Micro Benchmarks Sort MicroBenchmarks.tar.gz

  • Hadoop version, size: 4.8MB
  • MPI  version, size:1.4MB
  • Spark version, size:300KB
Grep
WordCount
BFS MPI Version: BFS_MPI.tar.gzSize: 4.7MB
Analytics Workloads Index SearchEngine.tar.gz

PageRank
Kmeans SNS.tar.gz

Connected Components
Collaborative Filtering E-commerce.tar.gz

Naive Bayes
OLAP and Interactive Analytics Micro Benchmarks Project InteractiveMicroBenchmark.tar.gz

Filter
OrderBy
Cross Product
Union
Difference
Aggregation
Analytics Workloads Join Query InteractiveQuery.tar.gz

Select Query
Aggregation Query
Eight TPC-DS Web Queries OLAP.tar.gz