The focus of this project is to build benchmarks and tools for datacenter and cloud computing.
The emergence of popular Internet services, e.g., search, twitter, and social networks, have accelerated a trend toward cloud or datacenter computing (in short, DC). Driven by the massive scale of data repositories and the large number of users, these Internet services all require a massive computing infrastructure, which Barroso et al. call Warehousescale machines .
As shown in Table 1, the characteristics of datacenter and cloud computing are different from that of high-end high performance computing (HPC). However, there is no publicly available benchmarks for datacenter and cloud computing.
|DC||Loosely coupled: independent tasks or requests Workload churn||Ample parallelism||No checkpoint need for single failures. Reliability requirements depend upon the nature of data||High throughput|
|High end HPC||Tightly coupled: a single job with huge resource demand,depending on collective communication.||Difficult to exploit parallelism.||Checkpoint of a whole application for a single failure.||The turnaround time|
Table 1: The distinguished differences of datacenter or cloud computing from high-end high performance computing.
Due to the lack of permission to probe real-world web search engines, we set up a search server in our lab using Nutch as the search engine, and SoGou web corpus as the indices and snapshot data. However, we have obtained permission to use three real workload traces, one from SoGou and the other two from two of the largest search service providers in China. We have released the search system as a benchmark for datacenter computing, which is named Search.
- DCAngel: a Workload Characterization tool
We have developed a comprehensive workload characterization tool, named DCAngel. DCAngel can collect, analyze, and visualize a large number of performance metrics, ranging from performance counters such as cycles-per instruction and average memory access latency, to quality of services measurements.
- PreciseTracer: a request tracing tool
we present a precise and scalable request tracing tool for online analysis of multi-tier services of black-boxes. Our tool collects activity logs of multi-tier services through the kernel instrumentation, which can be enabled or disabled on demand. Through tolerating log losses, our system supports sampling or tracing on demand, which significantly decreases the collected and analyzed logs and improves the system scalability.
★Please note that the application assumptions of our tool are as follows:
We treat each component in a multi-tier service as a black box, since we cannot obtain the application or middleware source code, neither deploy the instrumented middleware, nor have the knowledge of high-level protocols used by services, like HTTP.
We presume that a single execution entity (a process or a kernel thread) of each component can only serve one request in a certain period. For serving each individual request, execution entities of the components cooperate through sending or receiving messages via a reliable communication protocol, like TCP. An individual request is tracked by monitoring a series of activities, which have causal relations for tracing requests.
- L. Barroso and U. H¨olzle. The datacenter as a computer: An introduction to the design of warehouse-scale machines. Synthesis Lectures on Computer Architecture, 4(1):1–108, 2009.
- J. Zhan, L. Wang and N. Sun, Performance Evaluation of a Datacenter Computer ( 高通量计算机的性能评价 in Chinese ), Communication of CCF. July, 2011.
- H. Xi, J. Zhan, et al. Characterization of Real Workloads of Web Search Engines. 2011 IEEE International Symposium on Workload Characterization （IISWC-2011）. 2011.
- Z. Zhang, J. Zhan, et al. Precise Request Tracing and Performance Debugging of Multi-tier Services of Black Boxes. Regular paper, 39th Dependable System and Network (DSN 2009).
- B. Sang, J. Zhan, G. Lu, H. Wang, D. Xu, L. Wang,Z. Zhang, and Z. Jia. Precise, scalable, and online request tracing for multi-tier services of black boxes. IEEE Transactions on Parallel and Distributed Systems, 2011.