Open Access Dissertation
Date of Award
Electrical and Computer Engineering
Virtual machine is a primary way to increase resource utilizations in data centers by encapsulating multi-resource demands for applications and providing performance isolation. Moreover, the resource configuration can change on the fly to satisfy performance target. Container is another popular way for fine-grained multi-resource allocation. In this disser- tation work, we aim to design and implement an automatic resource management system to improve application performance, optimize system efficiency and job completion times in virtual and physical clusters respectively.
For large-scale applications hosted in data center, automatic resource configuration is crucial to service availability and quality. The workload dynamics, cloud dynamics and nonlinear relationship between resource allocation and response time requires an automatic and effective resource allocation strategy. To improve the quality of application services experienced by clients, we propose a novel adaptive fuzzy control approach to tune CPU resource allocation for applications in cloud. It is able to adapt resource allocations not only to the workload variations, but also provide response time guarantee for applications. We have conducted extensive experiments to evaluate the performance, and compared it against other three control methods in terms of stability, overshoot, settling time. Our experiments indicate that Adaptive fuzzy control has the best stability, shortest settling time and minimum overshoot.
Multi-resource configuration provides more space for optimizing resource combina- tions. The optimization of resource efficiency and utilization has great significance to IaaS providers. To this end, we propose a framework, BConf, for dynamic balanced configura- tion of multi-resources for the provisioning of response time guarantee in virtualized clusters. BConf employs an integrated MPC (model predictive control) and adaptive PI (proportional integral) control approach (IMAP). MPC is applied to actively balance multiple resources using a novel resource metric. For the performance prediction, a gray-box model is built on generic OS and hardware metrics in addition to resource actuators and performance. We find out that resource penalty is an effective metric to measure the imbalanced degree of a configuration. Using this metric and the model, BConf tunes resources in a balanced way by minimizing the resource penalty while satisfying the response time target. Adaptive PI is used to coordinate with MPC by narrowing the optimization space to a promising region. Within BConf framework, resources are coordinated during contention. Experimental results with mixed TPC-W and TPC-C benchmarks show that BConf reduces resource usages by about 50% and 30% for TPC-W and TPC-C respectively, improves stability by more than 35.6%, and has a much shorter settling time, in comparison with a representative partition approach. The advantages of BConf in resource coordination are also demonstrated.
Modern cluster systems, such as Mesos and Yarn, provides fine-grain resource allocation mechanism to satisfy heterogenous demands of various tasks. Moreover, it sup- ports diverse programming models, such as Map-Reduce and MPI. Jobs are transformed into a set of tasks. In shared clusters, scheduling has three different objectives−system efficiency, fairness, and minimal job completion times (JCT). In each pair of two objectives, one may conflict with the other under state-of-the-art policies or strategies. To address the conflicts among the three, we propose a simple theorem-proof policy, called credit sharing (CS) policy, in online scheduling that guarantees the JCTs resulting from DRF. The policy allows short jobs to borrow resource credit from long jobs as long as the long jobs later have enough resource demand to consume the credit that short jobs pay down or off in the process of execution or upon accelerated completions. In addition to accelerating short jobs, the power of packing is unleashed for maximizing system efficiency by packing tasks of short jobs. In multi-resource environment, preservation of data locality has evolved into optimization of effective resource utilizations. Moreover, stage barrier of multi-stage jobs, such MapReduce, may cause imbalanced utilization of network bandwidth in clusters. According to CS pol- icy, we present CANAL (Credit sharing-oriented Network And Locality-aware scheduling) in multi-resource clusters. We conduct a detailed evaluation in both trace-driven simulation and prototype implementation. Compared with the most competitive scheduler, we find that CANAL significantly speed up jobs by up to 2.36 time, particularly for network-intensive jobs, and reduce makespan by 25%. Moreover, each job under CANAL does not decelerate compared with DRF.
Wei, Yudi, "Automatic Resource Management And Performance Optimization In Clusters" (2019). Wayne State University Dissertations. 2231.