Data generated from modern scientific instruments have grown up to an unprecedented scale. Moreover, data formats and computational behaviors of scientific big data workloads are much more complex than those in Internet services. These two facts pose a serious challenge to scientific data management and analytics. Among many concerns, the first one is how to build a comprehensive and representative scientific big data benchmark suite. Previous benchmark efforts either focus on Internet areas (i.e. BigDataBench) or pay attention to a specific area (i.e. GeneBASE). We present the comprehensive scientific big data benchmark suite—BigDataBench-S.



There are three data sets and 17 workloads in BigDataBench-S. Table 1 summarizes the real-world data sets and workloads of BigDataBench-S.

Domains Datasets Workloads
High Energy Physics ATLAS Dataset Data Manipulation Queries Selection: select events based on filter conditions
Classification SVM
k-Nearest Neighbor
Regression Boosted decision trees
Maximum likelihood fit
Astronomy Simulated Dataset Using Generator from SS-DB Data Manipulation Queries Selection: Select images in given time and space ranges
Aggregation: Compute average value of cells of images to find average background noise
Join: Select values of each cell of images with the average noise of this cell
Complex Analysis Intersection of images
Genomics Simulated Dataset Using Generator from GenBase Data Manipulation Queries Selection: Select genes based on filter conditions
Aggregation: Compute average expression value of genes
Join: Join genes with gene ontologies
Complex Analysis QR decomposition



GCM-Bench:  A General Benchmark for RDF Data Management System(;


For Citations

If you need a citation for BigDataBench-S, please cite the following papers related with your work:

BigDataBench-S: An Open-source Scientific Big Data Benchmark Suite【PDF】

Xinhui Tian, Shaopeng Dai, Zhihui Du, Wanling Gao, Rui Ren, Yaodong Cheng, Zhifei Zhang, Zhen Jia, Peijian Wang and Jianfeng Zhan. BigDataBench-S: An Open-source Big Data Benchmark Suite. The 3rd IEEE International Workshop on High-Performance Big Data Computing (HPBDC), IPDPSW2017


  • 詹剑锋 ,中科院计算所
  • 黎建辉 ,中科院计算网络中心
  • 孟小峰 ,中国人民大学
  • 都志辉 ,清华大学
  • 邹磊 ,北京大学
  • 齐勇 ,西安交通大学
  • 沈志宏 ,中科院计算网络中心
  • 王培健 ,西安交通大学
  • 查礼 ,中科院计算所
  • 程耀东 ,中科院高能物理所
  • 徐俊刚 ,中国科学院大学
  • 张知非 ,首都医科大学
  • 贾禛 ,普林斯顿大学
  • 田昕晖 ,中科院计算所
  • 戴绍鹏 ,中科院计算所
  • 高婉铃 ,中科院计算所