Dr. Jianfeng Zhan is Full Professor and Director at Software Systems Lab, Institute of Computing Technology (ICT), Chinese Academy of Sciences (CAS) and University of Chinese Academy of Sciences. He enjoys building new systems, and feel great interest in collaborating with researchers with different backgrounds. He recently focuses on different aspects of datacenter computing. He founded the International Symposium on Benchmarking, Measuring, and Optimizing, dedicated to benchmarking, measuring, and optimizing different complex systems.

Research highlights

1) Data motifs: a new unified approach to rebuilding software and hardware systems for big data and AI workloads. (2014---present)

We propose a new approach to characterizing big data and AI workloads. We consider each big data and AI workload as a pipeline of one or more classes of unit of computations performed on different initial or intermediate data inputs. Each class of unit of computation captures the common requirements while being reasonably divorced from individual implementations, and hence we call it a data motif. For the first time, among a wide variety of big data and AI workloads, we identify eight data motifs that takes up most of run time, including Matrix, Sampling, Logic, Transform, Set, Graph, Sort and Statistic.
Interesting, many other scholars are advocating domain-specific hardware and software systems. We believe the data motif concept provides a new unified approach to rebuilding software and hardware systems for big data and AI workloads. Other than building or optimizing systems case by case, we can focus on accelerating eight data motifs.

Selected publications
  • Wanling Gao, Jianfeng Zhan, Lei Wang, Chunjie Luo, Daoyi Zheng, Fei Tang, Biwei Xie, Chen Zheng, Xu Wen, Xiwen He and Hainan Ye:
    Data Motifs: A Lens Towards Fully Understanding Big Data and AI Workloads. Parallel Architectures and Compilation Techniques (PACT) 2018.


2) New metrics and benchmarks for datacenter computing (2015---present)

The history witnesses that the FLOPS (FLoating-point Operations Per Second) metric and the HPC benchmarks define the concrete R&D objectives and road-maps for HPC (Gflops in the 1990, Tflops in the 2000, Pflops in the 2010, and Eflops in the 2020).
To date, to provide Internet services, or perform big data or AI analytics, more and more organizations in the world build internal datacenters, or rent hosted datacenters. It seems that the fraction of datacenter computing has outweighed HPC in terms of market share (HPC only takes 20% of total).
Unfortunately, in the context of datacenter computing, we are isolated. On one hand, the academia community has no real-world data and workloads, which are owned by different Internet service giants. On the other hand, each giant has only its own data and workloads without knowing the other.
It is time for us (both academia and industry communities) to set up the unified metrics and benchmarks for datacenter computing.

Selected publications
  • Lei Wang, Jianfeng Zhan, Wanling Gao, ZiHan Jiang, Rui Ren, Xiwen He, Chunjie Luo, Gang Lu and Jingwei Li:
    BOPS, Not FLOPS! A New Metric and Roofline Performance Model For Datacenter Computing. Technical Report, arXiv preprint arXiv:1801.09212, May 3, 2018.


3) BigDataBench: a scalable and unified big data and AI benchmark suite (2009---present)

As a multi-discipline research and engineering effort, i.e., architecture, system, data management and machine learning communities from both industry and academia, we set up an open-source big data and AI benchmark suite---BigDataBench. The current versionBigDataBench 4.0 provides 13 representative real-world data sets and 47 benchmarks. Other than creating a new benchmark or proxy for every possible workload, we propose using data motif-based benchmarks---the combination of eight data motifs---to represent diversity of big data and AI workloads. Our benchmark suite includes micro benchmarks, each of which is a single data motif, components benchmarks, which consist of the data motif combinations, and end-to-end application benchmarks, which are the combinations of component benchmarks.

For the architecture community, whatever early in the architecture design process or later in the system evaluation, it is time-consuming to run a comprehensive benchmark suite. The complex software stacks of the big data and AI workloads aggravate this issue. To tackle this challenge, we propose the data motif-based simulation benchmarks for architecture communities, which speed up runtime 100 times while preserving system and micro-architectural characteristic accuracy.

Selected publications
  • Zhen Jia, Jianfeng Zhan, Lei Wang, Chunjie Luo, Wanling Gao, Yi Jin, Rui Han, Lixin Zhang:
    Understanding Big Data Analytics Workloads on Modern Processors. IEEE Trans. Parallel Distrib. Syst. 28(6): 1797-1810 (2017).
  • Lei Wang, Jianfeng Zhan, Chunjie Luo, Yuqing Zhu, Qiang Yang, Yongqiang He, Wanling Gao, Zhen Jia, Yingjie Shi, Shujie Zhang, Chen Zheng, Gang Lu, Kent Zhan, Xiaona Li, Bizhu Qiu:
    BigDataBench: A big data benchmark suite from internet services. HPCA 2014: 488-499.
  • Xinhui Tian, Shaopeng Dai, Zhihui Du, Wanling Gao, Rui Ren, Yaodong Cheng, Zhifei Zhang, Zhen Jia, Peijian Wang, Jianfeng Zhan:
    BigDataBench-S: An Open-Source Scientific Big Data Benchmark Suite. IPDPS Workshops 2017: 1068-1077.
  • Zhen Jia, Jianfeng Zhan, Lei Wang, Rui Han, Sally A. McKee, Qiang Yang, Chunjie Luo, Jingwei Li:
    Characterizing and subsetting big data workloads. IISWC 2014: 191-201.
  • Jianfeng Zhan, Rui Han, Chuliang Weng:
    Big Data Benchmarks, Performance Optimization, and Emerging Hardware - 4th and 5th Workshops, BPOE 2014, Salt Lake City, USA, March 1, 2014 and Hangzhou, China, September 5, 2014, Revised Selected Papers. Lecture Notes in Computer Science 8807, Springer 2014, ISBN 978-3-319-13020-0 [contents].
  • Zijian Ming, Chunjie Luo, Wanling Gao, Rui Han, Qiang Yang, Lei Wang, Jianfeng Zhan:
    BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking. WBDB 2013: 138-154.
  • Zhen Jia, Lei Wang, Jianfeng Zhan, Lixin Zhang, Chunjie Luo:
    Characterizing data analysis workloads in data centers. IISWC 2013: 66-76.
  • Zhen Jia, Wanling Gao, Yingjie Shi, Sally A. McKee, Zhenyan Ji, Jianfeng Zhan, Lei Wang, Lixin Zhang:
    Understanding Processors Design Decisions for Data Analytics in Homogeneous Data Centers. 10.1109/TBDATA.2017.2758792.
  • Rui Han, Lizy Kurian John, and Jianfeng Zhan:
    Benchmarking Big Data Systems: A Review. IEEE Transactions on Services Computing, 2017.


4) Scientific and medical data Systems.

We are happy to work with many scientists and doctors on this amazing topic.

1) 10-millisecond computing

Despite computation becomes much complex on data with an unprecedented scale, we argue computers or smart devices should and will consistently provide information and knowledge to human being in the order of a few tens milliseconds. We coin a new term 10-millisecond computing to call attention to this class of workloads. 10-millisecond computing raises many challenges for both software and hardware stacks.

Selected publications
  • Gang Lu, Jianfeng Zhan, Tianshu Hao, Lei Wang:
    10-millisecond Computing. CoRR abs/1610.01267 (2016).


2) Benchmarking and Optimizing Electrical Systems.

It seems that we are happy to benchmark and optimize many things. We struggle to find something interesting. Most of time, we fail...

1) The operating systems for datacenter computing (2012-present)

Traditionally, we refer to the OS scalability in terms of the average performance. In the context of latency-critical services, the worst-case performance (latency) is amplified by the system scale. So we must care about the OS scalability in terms of both average performance and worst-case performance.
We present the "isolate first, then share" OS model in which the machine's process or cores, memory, and devices are divided up between disparate OS instances, and a new abstraction---subOS---is proposed to encapsulate an OS instance that can be created, destroyed, and resized on-the-fly. The intuition is that this avoids shared kernel states between applications, which in turn reduces performance loss caused by contention. We decompose the OS into the supervisor and several subOSes running at the same privilege level: a subOS directly manages physical resources, while the supervisor can create, destroy, resize a subOS on-the-fly. The supervisor and subOSes have few state sharing, but fast inter-subOS communication mechanisms are provided on demand. We present the first implementation---RainForest, which supports unmodified Linux binaries. Our comprehensive evaluation shows RainForest outperforms Linux with four different kernels, LXC, and Xen in terms of worst-case and average performance most of time when running a large number of benchmarks.
We submit this system paper to ASPLOS four times (from 2015 to 2018). Finally, we feel no interest in submitting this paper again. But I will happy if you feel interest in reading this paper.

Selected publications
  • Gang Lu, Jianfeng Zhan, Chongkang Tan, Xinlong Lin, Defei Kong, Tianshu Hao, Lei Wang, Fei Tang, Chen Zheng:
    Isolate First, Then Share: A New OS Architecture for Datacenter Computing. arXiv:1604.01378.


2) Cluster and Cloud system software (2002-2009)

We built three innovative cluster and cloud systems software: Phoenix, DawningCloud, and PhoenixCloud. Among them, GridView (one component of Phoenix Cluster operating system) was transferred to Sugon, which is a premier supercomputing company in China, and becomes its popular software product. Having not open-sourced these projects is my deepest regret.

Selected publications
  • Jianfeng Zhan, Lei Wang, Xiaona Li, Weisong Shi, Chuliang Weng, Wenyao Zhang, Xiutao Zang:
    Cost-Aware Cooperative Resource Provisioning for Heterogeneous Workloads in Data Centers. IEEE Trans. Computers 62(11): 2155-2168 (2013).
  • Lei Wang, Jianfeng Zhan, Weisong Shi, Yi Liang:
    In Cloud, Can Scientific Communities Benefit from the Economies of Scale? IEEE Trans. Parallel Distrib. Syst. 23(2): 296-303 (2012).
  • Jianfeng Zhan, Gengpu Liu, Lei Wang, Bibo Tu, Yi Jin, Yang Li, Yan Hao, Xuehai Hong, Dan Meng, Ninghui Sun:
    PhoenixG: A Unified Management Framework for Industrial Information Grid. CCGRID 2006: 489-496.
  • Ying Jiang, Dan Meng, Chao Ren, Jianfeng Zhan:
    An Integrated Adaptive Management System for Cluster-based Web Services. CLUSTER 2006.
  • Jianfeng Zhan, Ninghui Sun:
    Fire Phoenix Cluster Operating System Kernel and its Evaluation. CLUSTER 2005: 1-9.


3) Other systems which you may feel interest

In collaboration with Tencent, we build a programming framework for building different data-parallel programming models.

  • Peng Wang, Dan Meng, Jizhong Han, Jianfeng Zhan, Bibo Tu, Xiaofeng Shi, Le Wan: Transformer: A New Paradigm for Building Data-Parallel Programming Models. IEEE Micro 30(4): 55-64 (2010).


We feel great interests and engage in proposing approaches and developing performance analysis tools for large-scale scientific computing and datacenter, and cloud computing.

  • Gang Lu, Jianfeng Zhan, Haining Wang, Lin Yuan, Yunwei Gao, Chuliang Weng, Yong Qi:
    PowerTracer: Tracing Requests in Multi-Tier Services to Reduce Energy Inefficiency. IEEE Trans. Computers 64(5): 1389-1401 (2015).
  • Zhen Jia, Chao Xue, Guancheng Chen, Jianfeng Zhan, Lixin Zhang, Yonghua Lin, Peter Hofstee:
    Auto-tuning Spark Big Data Workloads on POWER8: Prediction-Based Dynamic SMT Threading. PACT 2016: 387-400.
  • Rui Ren, Zhen Jia, Lei Wang, Jianfeng Zhan, Tianxu Yi:
    BDTUne: Hierarchical correlation-based performance analysis and rule-based diagnosis for big data systems. BigData 2016: 555-562.
  • Biwei Xie, Xu Liu, Sally A. McKee, Jianfeng Zhan, Zhen Jia, Lei Wang, Lixin Zhang:
    Understanding Data Analytics Workloads on Intel(R) Xeon Phi(R). HPCC/SmartCity/DSS 2016: 206-215.
  • Chen Zheng, Jianfeng Zhan, Zhen Jia, Lixin Zhang:
    Characterizing OS Behaviors of Datacenter and Big Data Workloads. HPCC/SmartCity/DSS 2016: 1079-1086.
  • Bibo Tu, Jianping Fan, Jianfeng Zhan, Xiaofang Zhao:
    Performance analysis and optimization of MPI collective operations on multi-core clusters. The Journal of Supercomputing 60(1): 141-162 (2012).
  • Gang Lu, Jianfeng Zhan, Haining Wang, Lin Yuan, Chuliang Weng:
    PowerTracer: tracing requests in multi-tier services to diagnose energy inefficiency. ICAC 2012: 97-102.
  • Xu Liu, Jianfeng Zhan, Kunlin Zhan, Weisong Shi, Lin Yuan, Dan Meng, Lei Wang:
    Automatic performance debugging of SPMD-style parallel programs. J. Parallel Distrib. Comput. 71(7): 925-937 (2011).
  • Bibo Tu, Jianping Fan, Jianfeng Zhan, Xiaofang Zhao:
    Accurate Analytical Models for Message Passing on Multi-core Clusters. PDP 2009: 133-139.


We built several tools for understanding the reliability and availability of large-scale computing systems.

  • Pengfei Zheng, Yong Qi, Yangfan Zhou, Pengfei Chen, Jianfeng Zhan, Michael R. Lyu:
    An Automatic Framework for Detecting and Characterizing Performance Degradation of Software Systems. IEEE Trans. Reliability 63(4): 927-943 (2014).
  • Xiaoyu Fu, Rui Ren, Sally A. McKee, Jianfeng Zhan, Ninghui Sun:
    Digging deeper into cluster system logs for failure prediction and root cause diagnosis. CLUSTER 2014: 103-112.
  • Xiaoyu Fu, Rui Ren, Jianfeng Zhan, Wei Zhou, Zhen Jia, Gang Lu:
    LogMaster: Mining Event Correlations in Logs of Large-Scale Cluster Systems. SRDS 2012: 71-80.
  • Wei Zhou, Jianfeng Zhan, Dan Meng, Zhihong Zhang:
    Online Event Correlations Analysis in System Logs of Large-Scale Cluster Systems. NPC 2010: 262-276.
  • Zhihong Zhang, Jianfeng Zhan, Yong Li, Lei Wang, Dan Meng, Bo Sang:
    Precise request tracing and performance debugging for multi-tier services of black boxes. DSN 2009: 337-346.

Bio

ACADEMIC POSITIONS

Sep 2012-
Full Professor, Institute of Computing Technology, Chinese Academy of Sciences, and University of Chinese Academy of Sciences. Beijing, P. R. China
Mar 2004- Sep 2012
Associate Professor, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, P. R. China
Aug 2002- Mar 2004
Assistant Professor, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, P. R. China

EDUCATION BACKGROUND

Ph. D.
Computer Software & Theory, Institute of Software, Chinese Academy of Sciences, Beijing, P. R. China, 2002
M. Sc.
Solid Mechanics, Southwest Jiaotong University, Chengdu, P. R. China, 1999
B. Sc.
Civil Engineering, Southwest Jiaotong University, Chengdu, P. R. China, 1996

Honors and Awards

  • IISWC 2013 Best paper award.
  • Third Class of Cooperation and Contribution Prize, Huawei
  • Second Class of National Science and Technology Promotion Prize, The Central People’s Government of the People’s Republic of China, 2006
  • Outstanding Science and Technology Achievement Prize of the Chinese Academy of Sciences, Chinese Academy of Science, P. R. China, 2005
  • Outstanding faculty Prize,Institute of Computing Technology, Chinese Academy of Sciences, P. R. China, 2012

Group

  • Professor, Jianfeng Zhan (詹剑锋)
  • Senior Engineer, Lei Wang (王磊)
  • Assistant Professor, Chunjie Luo(罗纯杰)
  • Assistant Professor, Rui Ren (任睿)

Teaching

Students

Ph. D students

  • Xinhui Tian
  • Wanling Gao
  • Shaopeng Dai
  • Yunyou Huang
  • Xiexuan Zhou
  • Tianshu Hao
  • Yatao Li
  • Zihan Jiang
  • Fei Tang

Master students

  • Mengjia Du
  • Cheng Huang
  • Xu Wen
  • Jianan Chen

Alumni

Professional Activities

TPC member, IPDPS 2018

TPC member, ICDCS 2017

HPBDC Co-Chair, in conjunction with IPDPS’16 17 18

BPOE chair, in conjunction with ASPLOS 2014, 15, 16, 17, VLDB’14

TPC member, IISWC 2014

TPC member, CCGrid 2014

TPC member. CCF Big Data conference 2013

TPC member, International Conference on Computer Communications and Networks (ICCCN 2014)

Founding Organizer, The First Workshop of Benchmarks, Performance Optimization, and Emerging Hardware of Big Data Systems and Applications (BPOE 2013), In conjunction with IEEE Big Data Conference 2013, October 8, 2013, Silicon Valley, CA, USA

TPC Member, The second IEEE International Conference on Big Data Science and Engineering (BDSE),December 3-5, 2013 in Sydney, Australia.

TPC member, The ACM Cloud and Autonomic Computing Conference (CAC 2013), Miami, Florida, USA August 5-August 9, 2013

Organizer, HPCA 2013 Tutorial, High Volume Computing: The Motivations, Metrics, and Benchmarks Suites for Data Center Computer Systems, Shenzhen, 2013.

Track Chair of Utility Computing, HPCC 2013

PC Member and Publicity Chair, ICAC 2013

PC Member, SOSE 2013

PC member, NPC 2012

Guest editor, Cloud computing special issue of Frontier of Computer Sciences, 2012

PC Member, IDPDS 2012, Ph.D Forum

Publicity Chair for China of ICAC 2012 (The 9th International Conference on Autonomic Computing

PC Member, ICDCS 2012 (The 32nd International Conference on Distributed Computing Systems)

PC Member, AINA 2012 (The 26th IEEE International Conference on Advanced Information Networking and Applications, Tokyo, Japan, March 26-29, 2012.),

PC Member, NPC 2011, CSE 2011, GCC 2011, Cloud 2011