Modelling a supercomputer job bundling system based on the Alea simulator
Journal: Software & Systems (Vol.35, No. 4)Publication Date: 2022-12-16
Authors : Baranov A.V.; Lyakhovets D.S.;
Page : 631-643
Keywords : high-performance computing; job management system; simulation; job bundling; alea;
Abstract
Modern supercomputer job management systems (JMS) are complex software using many different scheduling algorithms with various parameters. We cannot predict or calculate the impact of changing these parameters on JMS quality metrics. For this reason, researchers use simulation modelling to determine the optimal JMS parameters. This article discusses the problem of developing a supercomputer job management system model based on the well-known Alea simulator. The object of study is our scheduling algorithm used for developing the supercomputer job bundling system. The algorithm bundles jobs with a long initialization time into groups (packets) according to job types. Initialization is performed once for each group, and then the jobs of the group are executed one after the other. By using a bundling system, it is possible to reduce the initialization overhead and increase the job scheduling efficiency. We implemented the bundling algorithm as a part of the Alea simulator. We have done comparative simulation of implemented algorithm for various workloads. The comparison involved the FCFS and Backfill scheduling algorithms built into Alea. Several workloads with different intensities were generated for the simulation. The minimum job initialization share thresholds for these workloads were determined based on the simulation results. The bundling system noticeably improves the scheduling efficiency compared to the FCFS and Backfill algorithms starting from these thresholds. The study results showed that the developed simulation model could be used as a software tool for a comparative analysis of various algorithms for supercomputer job scheduling.
Other Latest Articles
- Evaluating the capabilities of classical computers in implementing quantum algorithm simulators
- A software platform demonstrator for configuring ANFIS neural network hyperparameters in fuzzy systems
- Development of trusted microprocessor software models and a microprocessor system
- A GraphHunter software tool for mapping parallel programs to a supercomputer system structure
- DIY DDoS Protection: operational development and implementation of the service in the National Research Computer Network of Russia
Last modified: 2023-04-07 16:45:28