mstk.scheduler.Slurm

class mstk.scheduler.Slurm

Slurm job scheduler with support of GPU allocation and MPI/OpenMP hybrid parallelization.

Slurm is a rather powerful and complicated job scheduler with tons of configurations and options. It is not the goal of mstk to provide a comprehensive wrapper for Slurm. Therefore, it’s very likely that the job script generated by this class doesn’t fully fit the requirement of a specific computing center. In that case, it’s viable to do some process on the generated job script before submitting it.

username

The current user

Type:

str

sh

The default name of the job script

Type:

str

job_parameter

The default parameters for submitting a job

Type:

JobParameter

submit_cmd

The command for submitting the job script. If is sbatch by default. But extra argument can be provided, e.g. sbatch –qos=debug.

Type:

str

cached_jobs_expire

The lifetime of cached jobs in seconds.

Type:

int

Methods

__init__()

download(**kwargs)

Download the simulation files to target folder.

generate_sh(commands, name[, parameter, ...])

Generate a shell script for commands to be executed by the job scheduler on compute nodes.

get_job_from_name(name)

Get the job with specified name.

get_jobs([use_cache])

Retrieve all the jobs that are currently managed by job scheduler.

is_running(name)

Check whether a job is pending or running (not killed or finished or failed).

is_working()

Check whether Slurm is working normally on this machine.

kill_job(name)

Kill a job which has the specified name.

submit([sh, workdir])

Submit a job script to scheduler.

upload(**kwargs)

Upload the simulation files to target folder.

Attributes

is_remote

Whether this is a remote job scheduler

is_remote = False

Whether this is a remote job scheduler

is_working() bool

Check whether Slurm is working normally on this machine.

It calls sinfo –version and check the output.

Returns:

is

Return type:

bool

generate_sh(commands, name, parameter=None, workdir=None, sh=None, id_prior=None, **kwargs)

Generate a shell script for commands to be executed by the job scheduler on compute nodes.

Because of the complexity of Slurm configurations, it’s probable that the job script generated here is not fully valid. In that case, it’s viable to do some process on the generated job script before submitting it.

Parameters:
  • commands (list of str) – The commands to be executed step by step

  • name (str) – The name of the job to be submitted

  • parameter (JobParameter, Optional) – The parameter for submitting this job. If not set, will use the default :attr:job_parameter.

  • workdir (str, Optional) – Explicitly set the working directory of this job

  • sh (str, Optional) – The name (path) of shell script to be generated. If not set, will use the default sh.

  • id_prior (int, Optional) – The id of prior job this one depends on

submit(sh=None, workdir=None)

Submit a job script to scheduler.

Parameters:

sh (str, optional) – The file name of the job script. If not set, will use the default sh.

Returns:

id – Job ID. -1 means failed

Return type:

int

kill_job(name) bool

Kill a job which has the specified name.

Parameters:

name (str) –

Returns:

killed

Return type:

bool