Download

Download User Manual, Technical Report and Specification

BigDataBench 3.1 User Manual [BigDataBench-UserManual]

BigDataBench 3.1 Technical Report [BigDataBench-TechnicalReport]

BigDataBench 3.1 Specification [BigDataBench-specification]

Download data sets

Table 1: The Summary of Data Sets

data sets data size Scalable data set
1 Wikipedia Entries 4,300,000English articles´╝łunstructuredtext) Text Generator of BDGS
2 Amazon Movie Reviews 7,911,684 reviews(semi-structured text) Text Generator of BDGS
3 Google Web Graph 875713 nodes, 5105039 edges(unstructured graph) Graph Generator of BDGS
4 Facebook Social Network 4039 nodes, 88234 edges (unstructured graph) Graph Generator of BDGS
5 E-commerce Transaction Data table1:4 columns,38658 rows.table2: 6columns, 242735 rows(structured table) Table Generator of BDGS
6 ProfSearch Person Resumes 278956 resumes(semi-structured table) Table Generator of BDGS
7 ImageNet (1GB,10GB) ILSVRC2014 DET image dataset(unstructured image) Ongoing development
8 English broadcasting audio files(1GB,10GB) Sampled at 16 kHz, 16-bit linear sampling(unstructured audio) Ongoing development
9 DVD Input Streams(712M) 110 input streams,resolution:704*480(unstructured video) Ongoing development
10 Image scene (1GB,10GB,100GB) 39 image scene description files(unstructured text) Ongoing development
11 Genome sequence data Cfa data format(unstructured text) 4 volumes of data sets
12 Assembly of the human genome Fa data format(unstructured text) 4 volumes of data sets
13 SoGou Data the corpus and search query data from So-Gou Labs(unstructured text) Ongoing development
14 MNIST handwritten digits database which has60,000 training examples and 10,000 test examples(unstructured image) Ongoing development

Download software

We provide two options: download the full software package one time or download components one by one. Please note that you need to download and deploy prerequisite software packages before using BigDataBench. Please refer to the user manual. The following packages should be installed firstly, and the running platform is Linux.

Software Version Download
Hadoop 1.0.2 http://hadoop.apache.org/#Download+Hadoop
HBase 0.94.5 http://www.apache.org/dyn/closer.cgi/hbase/
Cassandra 1.2.3 http://cassandra.apache.org/download/
MongoDB 2.4.1 http://www.mongodb.org/downloads
Mahout 0.8 https://cwiki.apache.org/confluence/display/MAHOUT/Downloads
Hive 0.9.0 https://cwiki.apache.org/confluence/display/Hive/GettingStarted #GettingStarted-InstallationandConfiguration
Spark 0.8.0 http://spark.incubator.apache.org/
Shark 0.8.0 http://shark.cs.berkeley.edu/
Impala 1.1.1 http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_install.html
MPICH 2.0 http://www.mpich.org/downloads/
Boost 1_43_0 http://www.boost.org/doc/libs/1_43_0/more/getting_started/unix-variants.html
Scala 2.9.3 http://www.scala-lang.org/download/2.9.3.html
GCC 4.8.2 http://gcc.gnu.org/releases.html
GSL 1.16 http://www.gnu.org/software/gsl/

Full download

Full software packages of different implementations are available from the following links:

Separate download

You may download different components of BigDataBench from the following Tables.

BDGS´╝ÜBig Data Generator Suite in BigDataBench

Name Description
BDGS generates big data on the basis of six raw data sets Text BigDataGeneratorSuite.tar.gz
Size: 95MB
Graph
Table

 

BigDataBench workloads. Please note that each shell script for generating data and running workloads is included in the distribution.

Domains Operations or Algorithm Types Data Sets Software Stacks ID
SearchEngine Read Cloud OLTP ProfSearch Resumes BasicDatastoreOperations.tar.gz
Size: 95MBBasicDatastoreOperations-Source-Code.tar.gz
Size: 212KB
W1-11-1
Write W1-11-2
Scan W1-11-3
Grep Offline Ancalytics Wikipedia SearchEngine Offline-Hadoop-1.2.tar.gz
Size: 49MBSearchEngine Offline-Hadoop-1.2-Source-Code.tar.gz
Size: 48KBSearchEngine Offline-Hadoop-2.7.tar.gz
Size: 49MBSearchEngine Offline-Hadoop-2.7-Source-Code.tar.gz
Size: 48KBSearchEngine Offline-Spark-0.8.tar.gz
Size:18MBSearchEngine Offline-Spark-0.8-Source-Code.tar.gz
Size:8KBSearchEngine Offline-Spark-1.4.tar.gz
Size:18MBSearchEngine Offline-Spark-1.4-Source-Code.tar.gz
Size:8KBSearchEngine Offline-MPI.tar.gz
Size:20MB
W1-1
WordCount W1-2
Index W1-4
PageRank W1-5
Sort W1-7
Nutch Server Online Service SoGou Search.tar.gz
Size:385MB
W1-6-1
SocialNetwork BFS Offline Analytics Graph500 Data BFS-MPI.tar.gz
Size:4.8MB
W2-9
Kmeans Facebook Social Network SocialNetwork Offline-Hadoop-1.2.tar.gz
Size:6.1MBSocialNetwork Offline-Hadoop-1.2-Source-Code.tar.gz
Size:52KBSocialNetwork Offline-Hadoop-2.7.tar.gz
Size:6.1MBSocialNetwork Offline-Hadoop-2.7-Source-Code.tar.gz
Size:52KBSocialNetwork Offline-Spark-0.8.tar.gz
Size:8.7MBSocialNetwork Offline-Spark-0.8-Source-Code.tar.gz
Size:8KBSocialNetwork Offline-Spark-1.4.tar.gz
Size:8.7MBSocialNetwork Offline-Spark-1.4-Source-Code.tar.gz
Size:8KBSocialNetwork Offline-MPI.tar.gz
Size:11MB
W2-8-2
CC W2-8-1
E-commerce CF Offline Analytics Amazon Movie Review E-commerce_Offline-Hadoop-1.2.tar.gz
Size:95MBE-commerce_Offline-Hadoop-1.2-Source-Code.tar.gz
Size:24KBE-commerce_Offline-Hadoop-2.7.tar.gz
Size:95MBE-commerce_Offline-Hadoop-2.7-Source-Code.tar.gz
Size:24KBE-commerce_Offline-Spark-0.8.tar.gz
Size:12MBE-commerce_Offline-Spark-0.8-Source-Code.tar.gz
Size:8KBE-commerce_Offline-Spark-1.4.tar.gz
Size:12MBE-commerce_Offline-Spark-1.4-Source-Code.tar.gz
Size:8kbE-commerce_Offline-MPI.tar.gz
Size:13MB
W3-4-2
Bayes W3-5
Project Interactive Analytics E-commerce E-commerce_Query-Hive.tar.gz
Size:21MBE-commerce_Query-Shark.tar.gz
Size:21MBE-commerce_Query-Impala.tar.gz
Size:21MB
W3-6-1
Filter W3-6-2
Cross Product W3-6-3
OrderBy W3-6-4
Union W3-6-5
Difference W3-6-6
Aggregation W3-6-7
Join Query W3-3
Select Query W3-1
Aggregation Query W3-2
Multimedia BasicMPEG OfflineAnalytics Stream Data MPEG.tar.gz
Size:1.2MB
W4-1
SIFT ImageNet Multimedia Offline-MPI.tar.gz
Size:60MB
W4-2-1
Speech Recognition Audio files W4-3
Ray Tracing Scene description files W4-4
Image Segmentation ImageNet W4-5
Face Detection W4-6
DBN MNIST W4-2-2
Bioinformation SAND Offline Analytics Genome sequence Data SAND W5-1
BLAST Assembly of the human genome BLAST W5-2