DOWNLOAD

Download User Manual, Technical Report and Specification

BigDataBench 3.2 User Manual  [BigDataBench-UserManual]

BigDataBench JStorm User Manual [BigDataBench-JStorm-UserManual]

BigDataBench Spark Streaming User Manual [BigDataBench-SparkStreaming-UserManual]

BigDataBench 3.2 Technical Report  [BigDataBench-TechnicalReport]

BigDataBench 3.2 Specification  [BigDataBench-specification]

Download data sets

Table 1: The Summary of Data Sets

data sets  data size Scalable data set
1 Wikipedia Entries 4,300,000English articles(unstructuredtext) Text Generator of BDGS
2 Amazon Movie Reviews 7,911,684 reviews(semi-structured text) Text Generator of BDGS
3 Google Web Graph 875713 nodes, 5105039 edges(unstructured graph) Graph Generator of BDGS
4 Facebook
Social Network
4039 nodes, 88234 edges (unstructured graph) Graph Generator of BDGS
5 E-commerce Transaction Data table1:4 columns,38658 rows.
table2: 6columns, 242735 rows(structured table)
Table Generator of BDGS
6 ProfSearch Person Resumes 278956 resumes(semi-structured table) Table Generator of BDGS
7 ImageNet (1GB,10GB) ILSVRC2014 DET image dataset(unstructured image) Ongoing development
8 English broadcasting audio files
(1GB,10GB)
Sampled at 16 kHz, 16-bit linear sampling(unstructured audio) Ongoing development
9 DVD Input Streams(712M) 110 input streams,resolution:704*480(unstructured video) Ongoing development
10 Image scene (1GB,10GB,100GB) 39 image scene description files(unstructured text) Ongoing development
11 Genome sequence data Cfa data format(unstructured text) 4 volumes of data sets
12 Assembly of the human genome Fa data format(unstructured text) 4 volumes of data sets
13 SoGou  Data
(Search Data processed from SogouT)
the corpus and search query data from
So-Gou Labs(unstructured text)
Ongoing development
14 MNIST handwritten digits database which has 60,000
training examples and 10,000 test examples(unstructured image)
Ongoing development
15 MovieLens Dataset User’s score data for movies, which has 9,518,231
training examples and 386,835 test examples(semi-structured text)
Ongoing development

Download software

We provide two options: download the full software package one time or download components one by one. Please note that you need to download and deploy prerequisite software packages before using BigDataBench.  Please refer to the user manual. The following packages should be installed firstly, and the running platform is Linux.

Software Version Download
Hadoop 1.0.2 http://hadoop.apache.org/#Download+Hadoop
HBase 0.94.5 http://www.apache.org/dyn/closer.cgi/hbase/
Cassandra 1.2.3 http://cassandra.apache.org/download/
MongoDB 2.4.1 http://www.mongodb.org/downloads
Mahout 0.8 https://cwiki.apache.org/confluence/display/MAHOUT/Downloads
Hive 0.9.0 https://cwiki.apache.org/confluence/display/Hive/GettingStarted #GettingStarted-InstallationandConfiguration
Spark 0.8.0 http://spark.incubator.apache.org/
Shark 0.8.0 http://shark.cs.berkeley.edu/
Impala 1.1.1 http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_install.html
JStorm 0.9.6.3 https://github.com/alibaba/jstorm/wiki/Downloads
Flink 0.10.1 https://flink.apache.org/downloads.html
MPICH 2.0 http://www.mpich.org/downloads/
Boost 1_43_0 http://www.boost.org/doc/libs/1_43_0/more/getting_started/unix-variants.html
Scala 2.9.3 http://www.scala-lang.org/download/2.9.3.html
GCC 4.8.2 http://gcc.gnu.org/releases.html
GSL 1.16 http://www.gnu.org/software/gsl/

Full download

Full software packages of different implementations are available from the following links:

Separate download

You may download different components of BigDataBench from the following Tables.

BDGSBig Data Generator Suite in BigDataBench

  Name Description
BDGS generates big data on the basis of six raw data sets Text BigDataGeneratorSuite.tar.gz
Size: 95MB
Graph
Table

BigDataBench workloads.  Please note that each shell script for generating data and running workloads is included in the distribution.

Domains Operations
or Algorithm
Types Data Sets Software Stacks ID
SearchEngine Read Cloud OLTP ProfSearch Resumes BasicDatastoreOperations.tar.gz
Size:95MBBasicDatastoreOperations-Source-Code.tar.gz
Size: 212KB
W1-11-1
Write W1-11-2
Scan W1-11-3
Grep Streaming Random Generate SearchEngine-SparkStreaming.tar.gz
Size: 22MB
W1-1
Grep Offline Ancalytics Wikipedia SearchEngine Offline-Hadoop-1.2.tar.gz
Size: 49MBSearchEngine Offline-Hadoop-1.2-Source-Code.tar.gz
Size: 48KBSearchEngine Offline-Hadoop-2.7.tar.gz
Size: 49MBSearchEngine Offline-Hadoop-2.7-Source-Code.tar.gz
Size: 48KBSearchEngine Offline-Spark-0.8.tar.gz
Size:18MBSearchEngine Offline-Spark-0.8-Source-Code.tar.gz
Size:8KBSearchEngine Offline-Spark-1.4.tar.gz
Size:18MB

SearchEngine Offline-Spark-1.4-Source-Code.tar.gz
Size:8KB

SearchEngine Offline-MPI.tar.gz
Size:20MB

SearchEngine Offline-Flink.tar.gz
Size:230MB

W1-1
WordCount W1-2
Index W1-4
PageRank W1-5
Sort W1-7
Nutch Server Online Service SoGou Search.tar.gz
Size:385MB
W1-6-1
Search Streaming Search Data SearchEngine-JStorm.tar.gz
Size:36MB
W1-6-2
SocialNetwork Rolling Top Words Streaming Random Generate SocialNetwork-JStorm.tar.gz
Size:20MBSocialNetwork-SparkStreaming-topN.tar.gz
Size:10MB
W2-1
Kmeans Streaming Random Generate SocialNetwork-SparkStreaming-Kmeans.tar.gz
Size:5.4MB
W-2-8-2
Kmeans Offline Analytics Facebook Social Network SocialNetwork Offline-Hadoop-1.2.tar.gz
Size:6.1MBSocialNetwork Offline-Hadoop-1.2-Source-Code.tar.gz
Size:52KBSocialNetwork Offline-Hadoop-2.7.tar.gz
Size:6.1MBSocialNetwork Offline-Hadoop-2.7-Source-Code.tar.gz
Size:52KBSocialNetwork Offline-Spark-0.8.tar.gz
Size:8.7MBSocialNetwork Offline-Spark-0.8-Source-Code.tar.gz
Size:8KBSocialNetwork Offline-Spark-1.4.tar.gz
Size:8.7MB

SocialNetwork Offline-Spark-1.4-Source-Code.tar.gz
Size:8KB

SocialNetwork Offline-MPI.tar.gz
Size:11MB

SocialNetwork Offline-Flink.tar.gz
Size:658MB

W2-8-2
CC Grpah SocialNetwork-Graph.tar.gz
Size:100MB(GraphX, GraphLab, Flink)BFS-MPI.tar.gz
Size:4.8MB
W2-8-1
Label Propagation W2-8-3
Triangle Count W2-8-4
BFS W2-9
E-commerce CF Streaming MovieLens Dataset E-commerce-JStorm.tar.gz
Size:23MB
W3-4-1
CF Offline Analytics Amazon Movie Review E-commerce_Offline-Hadoop-1.2.tar.gz
Size:95MBE-commerce_Offline-Hadoop-1.2-Source-Code.tar.gz
Size:24KBE-commerce_Offline-Hadoop-2.7.tar.gz
Size:95MBE-commerce_Offline-Hadoop-2.7-Source-Code.tar.gz
Size:24KBE-commerce_Offline-Spark-0.8.tar.gz
Size:12MBE-commerce_Offline-Spark-0.8-Source-Code.tar.gz
Size:8KBE-commerce_Offline-Spark-1.4.tar.gz
Size:12MBE-commerce_Offline-Spark-1.4-Source-Code.tar.gz
Size:8kbE-commerce_Offline-MPI.tar.gz
Size:13MB

E-commerce Offline-Flink.tar.gz
Size:230MB

W3-4-2
Bayes W3-5
Project Data Warehouse  E-commerce E-commerce_Query-Hive.tar.gz
Size:21MBE-commerce_Query-Shark.tar.gz
Size:21MBE-commerce_Query-Impala.tar.gz
Size:21MB
W3-6-1
Filter W3-6-2
Cross Product W3-6-3
OrderBy W3-6-4
Union W3-6-5
Difference W3-6-6
Aggregation W3-6-7
Join Query W3-3
Select Query W3-1
Aggregation Query W3-2
Multimedia BasicMPEG OfflineAnalytics Stream Data MPEG.tar.gz
Size:1.2MB
W4-1
SIFT ImageNet Multimedia Offline-MPI.tar.gz
Size:60MB
W4-2-1
Speech Recognition Audio files W4-3
Ray Tracing Scene description files W4-4
Image Segmentation ImageNet W4-5
Face Detection W4-6
DBN MNIST W4-2-2
Bioinformation SAND Offline Analytics Genome sequence Data SAND W5-1
BLAST Assembly of the
human genome
BLAST W5-2