ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Using job scheduler simulator to evaluate the effectiveness of job run time prediction

Journal: Software & Systems (Vol.35, No. 1)

Publication Date:

Authors : ; ;

Page : 124-131

Keywords : mvs-10p; simulator; job scheduling; slurm; predictive analytics;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

The paper investigates the efficiency of queue scheduling using pretrained models. A supercomputer cluster uses a scheduler to distribute the incoming job flow among the available computing resources. In order to place a job in the queue, the scheduler uses the data specified by a user, including the ordered program runtime. However, users often misjudge the runtime and choose an upper estimate. If the job completes earlier than specified, then the scheduler needs to reschedule the queue. A large number of such events can reduce the efficiency of resource allocation. Recently, there have been many papers describing the use of machine learning to predict the job run time. This allows using the run time calculated by a pretrained model during the scheduling process. However, all the models contain an estimation error. Therefore, the problem is the need to assess the efficiency of planning for a given value of the model error. This paper investigates the effectiveness of the proposed approach by comparing the scheduling efficiency in two scenarios: 1) the scheduler uses the time specified by a user and 2) the scheduler uses the real job runtime. For this purpose, the SLURM scheduler simulator performs simulation on the statistical data of the MVS-10P OP2 supercomputer installed at the Joint Supercomputer Center of the Russian Academy of Sciences. The results show that average waiting time in scenario 2 reduced by 25 %. Slowdown reduced by 50 %. Resource utilization did not change significantly. The experimental results indicate the practicability of using machine learning algorithms to predict the running time of jobs arriving at a supercomputer cluster. Thus, the article provides an estimate of the ultimate optimization, since the experiment assumes a hundred percent prediction accuracy, which to date is not demonstrated by any of the presented works on runtime prediction.

Last modified: 2022-07-06 17:45:55