Publication: A Simulator for Job Co-allocation in Multiple HPC Clusters

All || By Area || By Year

Title A Simulator for Job Co-allocation in Multiple HPC Clusters
Authors/Editors* J. Qin, M. Bauer
Where published* Proceedings of the 18th International Conference on Parallel and Distributed Computing and Systems
How published* Proceedings
Year* 2006
Pages 308-314
Publisher IASTED
Keywords allocation, multiple HPC clusters, HPC grid
To more effectively use HPC clusters for even larger computations, users are looking to interconnect multiple HPC clusters, creating a grid. To effectively use such grids, it may be desirable to split and co-allocate jobs requiring many processes across multiple clusters. The benefit, in terms of reducing users’ turn-around time, however, ultimately depends on the inter-cluster communication cost. In studies of job co-allocation strategies, previous research commonly used a uniform slowdown ratio and a static communication model to examine the impact on a job’s execution if the job was split across multiple clusters. However, in reality the slowdown ratio is unlikely to be uniform when there is a choice of multiple clusters with different communication links. Moreover, the slowdown ratio may actually change dynamically based on the run-time circumstances. In this paper, we report on a simulator which was developed to simulate the dynamic behavior of jobs across multiple clusters. The simulator has been validated based the experiments across two HPC clusters. The overall objective of the work is to understand the impact of communications on multi-processor jobs in order to develop scheduling and co-allocation strategies which can accommodate communication factors.
Go to Distributed Systems
Back to page 75 of list