Abstract:
A wide range of modern computer systems rely on parallelism to process jobs quickly. Unlike the jobs considered in most classical scheduling problems, parallelizable jobs can be completed more quickly when they are run on multiple servers or cores. However, not all jobs are perfectly parallelizable. Computing workloads are typically composed of a mixture of highly parallelizable elastic jobs and less parallelizable inelastic jobs. Given a fixed number of cores, it is not obvious how to best allocate cores across a stream of arriving elastic and inelastic jobs. We consider the problem of allocating cores to jobs in order to minimize the mean response time across jobs — the average time from when a job arrives to the system until it is complete. To solve this problem, one must balance a tradeoff between prioritizing short jobs and deferring parallelizable work. Completing short jobs before long jobs is known to reduce the mean response time in many systems. However, when jobs are parallelizable, it is also important to keep some elastic jobs in the system to ensure that the available cores remain utilized. Hence, it can also be beneficial to prioritize longer inelastic jobs ahead of shorter elastic jobs. Using coupling arguments and Lyapunov drift arguments from queueing theory, we show how to optimally balance this tradeoff. Specifically, we show how the optimal core allocation policy depends on the number of cores in the system, the system load, and the distributions of elastic and inelastic job sizes.