mstk.scheduler.RemoteSlurm

class mstk.scheduler.RemoteSlurm(host, username, remote_dir, port=22)

Slurm job scheduler running on a remote machine.

Parameters:
  • host (str) – The IP address of the remote host that is running Slurm

  • port (int) – The SSH port for logging in the remote host

  • username (str) – The username for logging in the remote host

  • remote_dir (str) – The default directory to use on the remote host for running calculation

host

The IP address of the remote host that is running Slurm

Type:

str

port

The SSH port for logging in the remote host

Type:

int

username

The username for logging in the remote host

Type:

str

remote_dir

The default directory to use on the remote host for running calculation

Type:

str

sh

The default name of the job script

Type:

str

job_parameter

The default parameters for submitting a job

Type:

JobParameter

submit_cmd

The command for submitting the job script. If is sbatch by default. But extra argument can be provided, e.g. sbatch –qos=debug.

Type:

str

cached_jobs_expire

The lifetime of cached jobs in seconds.

Type:

int

Methods

__init__(host, username, remote_dir[, port])

download([remote_dir, local_dir])

Upload all the files in remote directory to current local directory.

generate_sh(commands, name[, parameter, ...])

Generate a shell script for commands to be executed by the job scheduler on compute nodes.

get_job_from_name(name)

Get the job with specified name.

get_jobs([use_cache])

Retrieve all the jobs that are currently managed by job scheduler.

is_running(name)

Check whether a job is pending or running (not killed or finished or failed).

is_working()

Check whether Slurm is working normally on the remote machine.

kill_job(name)

Kill a job which has the specified name.

submit([sh, remote_dir])

Submit a job script to the Slurm scheduler on the remote machine.

upload([local_dir, remote_dir])

Upload all the files in current local directory to remote directory.

Attributes

is_remote

Whether this is a remote job scheduler

is_remote = True

Whether this is a remote job scheduler

is_working() bool

Check whether Slurm is working normally on the remote machine.

It calls sinfo –version and check the output.

Returns:

is

Return type:

bool

upload(local_dir=None, remote_dir=None)

Upload all the files in current local directory to remote directory.

Parameters:
  • local_dir (dir, optional) – If not set, will use the current dir.

  • remote_dir (dir, optional) – If not set, will use the default remote_dir.

Returns:

successful – Whether the upload is successful

Return type:

bool

download(remote_dir=None, local_dir=None) bool

Upload all the files in remote directory to current local directory.

Parameters:
  • remote_dir (dir, optional) – If not set, will use the default remote_dir.

  • local_dir (dir, optional) – If not set, will use the current dir.

Returns:

successful – Whether the download is successful

Return type:

bool

submit(sh=None, remote_dir=None)

Submit a job script to the Slurm scheduler on the remote machine.

Parameters:
  • sh (str) – The job script to be submitted.

  • remote_dir (str) – The directory to submit the script on the remote machine.

Returns:

id

Return type:

int

kill_job(name) bool

Kill a job which has the specified name.

Parameters:

name (str) –

Returns:

killed

Return type:

bool