This is the second workshop addressing the challenge of benchmarks, performance optimization, and emerging hardware of Big Data systems and applications, in conjunction with CCF HPC China 2013. The theme of this workshop is benchmarking and optimization of Big Data systems and cloud computing. Big Data has emerged as a strategic property of nations and organizations, researchers from enterprises and scientific research organizations are distilling meanings and values from Big Data. Big Data is high volume, high velocity, and high variety information assets that require new forms of processing, which makes it challenging to acquire values from it. Owners of Big Data can hardly make choice on which system is most suited for their specific requirements, they also have to face the problems of optimizing data processing and evaluating existing Big Data systems. In addition, with the new techniques of system architecture, operating system and programming models being put forward, the infrastructures and processing algorithms of big data systems are changing subsequently. The research work of data management and data processing based on the emerging hardware platforms and systems is well worth discussing, for example, analyzing the proper hardware and software platforms for big data.
- Bring together big data researchers from communities of architecture, operating systems, and data management. We will discuss the mutual influences of architectures, systems, and data management in the context of big data. This workshop is very concerned about specific research and application cases.
- Bridge the gap of big data researches and practices between industry and academia. Researchers from universities, institutes, and companies will attend this workshop.
- This workshop is based on invited premium talks by pioneers and leaders in the field of big data, there are no papers, all the talks and discussions are available on the web page.
This workshop welcomes research and industry work that address fundamental issues in benchmarking, characterizing, designing and optimizing Big Data systems based on novel hardware and software applications.
Topics of interest include, but are not limited to:
- Big Data benchmarking
- Performance and energy efficiency evaluations of big data hardware platforms
- Benchmarks, performance analysis and optimization of cloud computing systems
- Workload characteristics analysis of data centers and CPU design
- Practice report of evaluating and optimizing industrial big data systems
Jianfeng Zhan Institute of Computing Technology, Chinese Academy of Sciences
Zhibin Yu Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
Xiaoyi Lu Ohio State University, USA
Yuqing Zhu Institute of Computing Technology, Chinese Academy of Sciences
Yingjie Shi Institute of Computing Technology, Chinese Academy of Sciences
Wei Zhu Institute of Computing Technology, Chinese Academy of Sciences
|2:00-2:05 pm||Opening remarks||Jianfeng Zhan||Professor, Institute of Computing Technology, Chinese Academy of Sciences||CV, PPT|
|2:05-2:45 pm||Challenges in Benchmarking and Evaluating Big Data Processing Middleware Modern Clusters[Abstract]
Challenges in Benchmarking and Evaluating Big Data Processing Middleware Modern Clusters
Hadoop and Memcached are currently being used on modern clusters for Big Data analytics and query processing. Designs for these middleware are also being accelerated using features of high-performance interconnects and multi-core architectures. Internal designs of these middleware interact with communication protocols (such as RDMA), storage devices (HDD and SSD) and multi-core processors at the lowest layer. These components also interact with other middleware and applications at the upper layer. Thus, determining the overall performance benefits of new accelerated designs of these middleware in a systematic manner has been full of challenges. This talk will focus on these challenges. A multi-layered and systematic approach to design benchmarks to evaluate the performance of these middleware will be addressed. At the lowest layer, we will introduce a set of micro-benchmarks (OHMB, OSU Hadoop Micro-Benchmarks) to understand and analyze performance and their trade-offs in designing different components of Hadoop (HDFS, MapReduce and RPC). Similar micro-benchmarks will be presented for HBase and Memcached. At the intermediate layer, we will show results of Hadoop designs using some of the benchmarks from BigDataBench/PUMA/SWIM suites. These benchmarks and evaluations will demonstrate interplay between high performance interconnects, storage systems (HDD and SSD) and multi-core platforms to achieve best solutions for these middleware.
|D.K. Panda||Professor, Ohio State University||CV, PPT
Prof. Dhabaleswar K. (DK) Panda
Dhabaleswar K. (DK) Panda is a Professor of Computer Science and Engineering at the Ohio State University. His research interests include parallel computer architecture, high performance networking, InfiniBand, exascale computing, programming models, GPUs and accelerators, high performance file systems and storage, virtualization, cloud computing and Big Data. He has published over 300 papers in major journals and international conferences related to these research areas. Dr. Panda and his research group members have been doing extensive research on modern networking technologies including InfiniBand, High-Speed Ethernet and RDMA over Converged Enhanced Ethernet (RoCE). The MVAPICH2 (High Performance MPI over InfiniBand, iWARP and RoCE) and MVAPICH2-X software libraries, developed by his research group (http://mvapich.cse.ohio-state.edu), are currently being used by more than 2,085 organizations worldwide (in 71 countries). This software has enabled several InfiniBand clusters to get into the latest TOP500 ranking during the last decade. More than 188,000 downloads of this software have taken place from the project’s website alone. This software package is also available with the software stacks of many network and server vendors, and Linux distributors. The new Hadoop-RDMA package, consisting of acceleration for HDFS, MapReduce and RPC, is publicly available from http://hadoop-rdma.cse.ohio-state.edu. Dr. Panda’s research has been supported by funding from US National Science Foundation, US Department of Energy, and several industry including Intel, Cisco, SUN, Mellanox, QLogic, NVIDIA and NetApp. He is an IEEE Fellow and a member of ACM. More details about Prof. Panda are available at http://www.cse.ohio-state.edu/~panda
|2:55-3:35 pm||Big data processing practice of virtual computing environment.[Abstract]
Big data processing practice of virtual computing environment.
虚拟计算环境(Virtual Computing Environment，简称iVCE)是一个面向服务的计算系统，它建立在开放的网络基础设施之上，通过对分布自治资源的集成和综合利用，为终端用户或应用系统提供和谐、安全、透明的一体化服务的环境。本项目基于iVCE的理论模型，构建了一个超过300个物理节点的、分布于各省会城市的虚拟计算基础设施及支撑软件平台，并在平台上开发部署了多种业务系统，研究利用iVCE技术来对海量业务数据进行获取、分析、处理中所涉及的业务建模、任务调度、数据分发、资源管理等关键问题。
|Xinran Liu||Associate Director, National Computer network Emergency Response technical Team Coordination Center of China,||CV, ppt
|3:45-4:10 pm||BigDataBench 2.0: Big Data Benchmarks updates[Abstract]
BigDataBench 2.0: Big Data Benchmarks updates
BigDataBench is a big data benchmark suite from Internet services. Different from the first release BigDataBench 1.0 , which is only extracted from search engine, the current release ( BigDataBench 2.0) covers six application scenarios: including micro benchmarks, Cloud “OLTP”, relational query, search engine, social network, and Ecommerce system, and include nineteen different big data workloads and six different data sets. BigDataBench also provides an innovative data generation tool to generate scalable volumes of big data from a small-scale real data preserving characteristics of raw data. The synthetic big data sets, including text data, graph data, and table data, are generated by the tool in BigDataBench. A full spectrums of system software stacks, including realtime analytics, offline analytics, and online service are being included.
|Lei Wang||Senior Engineer, Institute of Computing Technology, Chinese Academy of Sciences||CV, PPT|
|4:15-4:40 pm||The Roadmap of Big Data in NetEase[Abstract]
The Roadmap of Big Data in NetEase
For the past few years, with the coming of big data era, the enterprise’s data has attached more importance to various businesses. The type of big data processing can be divided into batching and real-time processing. The requirement of real-time is of significance. The NetEase big data platforms are mainly introduced including the organization, service types and technologies’ advantages in this speech. By deeply investigated the existing frameworks for big data technologies, we presented the big data analytical system for top Internet companies. After the comprehensive analysis by big data benchmarking tools, our system has real time high performance computing capability with robustness, extensibility and compatibility. This big data system is working normally and serving NetEase’s all business.
|Jianzong Wang||Senior Researcher, NetEase163||CV, ppt
Jianzong Wang Ph.D
Senior Scientist in NetEase (www.163.com).
|4:45-5:05 pm||Performance and Energy Efficiency Evaluation of Big Data Platforms.[Abstract]
Performance and Energy Efficiency Evaluation of Big Data Platforms.
As the development of information industry, massive data has been produced in various applications, including online transaction data, web access logs, sensor data, scientific data, etc. According to IDC white book, the data size will reach 35ZB by 2020. The era of Big Data has arrived, which brings big challenges to data centers, the performance and energy efficiency of big data systems are the two most important problems. In this report, we will introduce our comprehensive evaluations on three representative big data systems: Intel Xeon, Atom (low power processors), and many-core Tilera using BigDataBench – a big data benchmark suite. We will explore the relative performance and energy efficiency of the three implementation approaches, and provide strong guidance for the big data systems construction.
|Yingjie Shi||Assistant Professor, Institute of Computing Technology, Chinese Academy of Sciences||CV, PPT
Yingjie Shi is currently an assistant professor at the Institute of Computing Technology, Chinese Academy of Sciences. She is now the manager of platform software group, which is researching the evaluation and analysis of emerging hardware, the benchmarking and optimization of basic software platform. She received her PhD in Computer Software and Theory from Renmin University of China in 2013, and received her MS in Computer Architecture from Huazhong University of Science and Technology in China in 2007. Her research interests include big data management, evaluation and optimization of software platform for big data.
|5:10-5:30pm||Evaluating Task Scheduling in Hadoop-based Cloud System[Abstract]
Evaluating Task Scheduling in Hadoop-based Cloud System
Nowadays, private clouds are widely used for resource sharing. Hadoop-based clusters are the most popular implementations for private clouds. However, because workload traces are not publicly available, few previous work compares and evaluates different cloud solutions with publicly available benchmarks. In this paper, we use a recently-released Cloud benchmarks suite—CloudRank-D to quantitatively evaluate five different Hadoop task schedulers, including FIFO, capacity, naïve fair sharing, fair sharing with delay, and HOD (Hadoop On Demand) scheduling. Our experiments show that with an appropriate scheduler, the throughput of a private cloud can be improved by 20%.
|Jungang Xu||Associate Professor, University of Chinese Academy of Sciences||CV, ppt|
|5:35-5:55 pm||DCBench: a Data Center Benchmark Suite[Abstract]
As the amount of data explodes rapidly, more and more corporations are using data centers to make effective decisions and gain a competitive edge. Data center applications play a significant role in modern computing systems, and hence it becomes increasingly important to improve the performance of data center computer systems. Evaluation is the first thing before doing optimization. Benchmarks are frequently used by researchers when they want to evaluate a computer system. Benchmarks are necessary to experimentally determine the benefits of new designs. In this report we introduce a benchmark suite for data center workloads name DCBench and characterized those workloads in micro architecture level in order to find the impacts and implications for the modern data center system equipped with commodity hardware.
|Zhen Jia||Ph.D candidate, Institute of Computing Technology, Chinese Academy of Sciences||CV, PPT
Zhen Jia is a student of Software Group at Advanced Computer Systems Laboratory, Institute of Computing Technology, Chinese Academy of Sciences. He received his B.S. degree in 2010 at Dalian University of Technology. He graduated as an outstanding graduate of Liaoning Province. His research interests include workload characterization and benchmarks and architecture for datacenter and big data systems. Also he is the main contributor of DCBench, his paper “Characterizing data analysis workloads in data centers” has won the best paper prize of IISWC2013 in September.
|5:55-6:25 pm||Towards Benchmarking Resource Allocations in Virtualized Cloud Platforms[Abstract]
Towards Benchmarking Resource Allocations in Virtualized Cloud Platforms
The resource-centric interface provided by the cloud providers allows tenants to simply provision a number of VM instance and enables cloud providers to charge the resource on a pay-as-you-go basis. Resource auto-scaling on VM-level enables elastic application performance. There are many system issues for resource allocations in virtualized cloud platforms: efficiency, fairness, SLA guarantee, resource isolations, total ownership costs and energy consumption etc. Current cloud benchmarks and studies mainly concentrate on the application performance or resource efficiency. In this talk, we will introduce some experience and our preliminary findings on those system issues of current cloud platforms, and then present our vision of benchmarking virtualized cloud platforms.
|Haikun Liu, Bingsheng He||Nanyang Technological University, Singapore||CV, ppt|