profile picture
Zhen Jia(贾禛)

+86-010-62601026
jiazhen@ncic.ac.cn
chinese version
NCIC

Zhen Jia focuses on the following related research topics:

Zhen Jia is a student of software group at Advanced Computer Systems Laboratory, Institute of Computing Technology, Chinese Academy of Sciences. He received his B.S. degree in 2010 at Dalian University of Technology.  He graduated as an Outstanding graduates of Liaoning Province.


Research Highlights


Abstract:Search is the most heavily used web application in the world and is still growing at an extraordinary rate. Understanding the behaviors of web search engines, therefore, is becoming increasingly important to the design and deployment of data center systems hosting search engines. In this paper, we study three search query traces collected from real world web search engines in three different search service providers. The first part of our study is to uncover the patterns hidden in the query traces by analyzing the variations, frequencies, and locality of query requests. Our analysis reveals that, contradicted to some previous studies, real-world query traces do not follow well-defined probability models, such as Poisson distribution and log-normal distribution. The second part of our study is to deploy the real query traces and three synthetic traces generated using probability models proposed by other researchers on a Nutch based search engine. The measured performance data from the deployments further confirm that synthetic traces do not accurately reflect the real traces. We develop an evaluation tool that can collect performance metrics on-line with negligible overhead. The performance metrics include average response time, CPU utilization,Disk accesses, and cycles-per-instructions, etc. The third of our study is to compare the search engine with representative benchmarks , namely Gridmix, SPECweb2005, TPC-C, SPECCPU2006, and HPCC, with respect to basic architecture-level characteristics and performance metrics, such as instruction mix, processor pipeline stall breakdown, memory access latency, and disk accesses. The experimental results show that web search engines have a high percentage of load/store instructions, but have good cache/memory performance.
Abstract: As more and more multi-tier services are developed from commercial off-the-shelf components or heterogeneous middleware without source code available, both developers and administrators need a request tracing tool to (1) exactly know how a user request of interest travels through services of black boxes and (2) obtain macro-level user request behaviors of services without manually analyzing massive logs. This need is further exacerbated by IT system ”agility? which mandates the tracing tool to provide on-line performance data since off-line approaches cannot reflect system changes in real time. Moreover, considering the large scale of deployed services, a pragmatic tracing approach should be scalable in terms of the cost in collecting and analyzing logs. In this paper, we introduce a precise, scalable, and online request tracing tool for multi-tier services of black boxes. Our contributions are three-fold. First, we propose a precise request tracing algorithm for multi-tier services of black boxes, which only uses application-independent knowledge. Second, we present a micro-level abstraction, component activity graph, to represent causal paths of each request. On the basis of this abstraction, we use dominated causal path patterns to represent repeatedly executed causal paths that account for significant fractions; and we further present a derived performance metric of causal path patterns, latency percentages of components, to enable debugging performance-in-the-large. Third, we develop two mechanisms, tracing on demand and sampling, to significantly increase the system scalability. We implement a prototype of the proposed system, called PreciseTracer, and release it as open source code. In comparison with WAP5—a black-box tracing approach, PreciseTracer achieves higher tracing accuracy and faster response time. Our experimental results also show that PreciseTracer has low overhead, and still achieves high tracing accuracy even if an aggressive sampling policy is adopted, indicating that PreciseTracer is a promising tracing tool for large-scale production systems.

Open Source Projects