Slurm

SLURM is the workload manger and job scheduler for tron.ift.uni.wroc.pl

Basic usage

sinfo -alN	show nodes information
squeue	Show job queue
squeue -u <username>	List all current jobs for a user
squeue -u <username> -t RUNNING	List all running jobs for a user
squeue -u <username> -t PENDING	List all pending jobs for a user
scancel <jobid>	To cancel one job
scancel -u <username>	To cancel all the jobs for a user
scancel -t PENDING -u <username>	To cancel all the pending jobs for a user
scancel –name myJobName	To cancel one or more jobs by name

Slurm batch

The following parameters can be used as command line parameters with *sbatch* and *srun* or in jobscript, see also Job script example below

Basic settings

Parameter	Function
–job-name=<name>	Job name to be displayed by for example squeue
–output=<path>	Path to the file where the job (error) output is written to
–mail-type=<type>	Turn on mail notification; type can be one of BEGIN, END, FAIL, REQUEUE or ALL
–mail-user=<email_address>	Email address to send notifications to

Resources

Parameter	Function
–time=<d-hh:mm:ss>	Time limit for job. Job will be killed by SLURM after time has run out. Format days-hours:minutes:seconds
–nodes=<num_nodes>	Number of nodes. Multiple nodes are only useful for jobs with distributed-memory (e.g. MPI).
–mem=<MB>	Memory (RAM) per node. Number followed by unit prefix, e.g. 16G
–mem-per-cpu=<MB>	Memory (RAM) per requested CPU core
–ntasks-per-node=<num_procs>	Number of (MPI) processes per node. More than one useful only for MPI jobs. Maximum number depends nodes (number of cores)
–cpus-per-task=<num_threads>	CPU cores per task. For MPI use one. For parallelized applications benchmark this is the number of threads.
–exclusive	Job will not share nodes with other running jobs. You will be charged for the complete nodes even if you asked for less.

Additional

Parameter	Function
–array=<indexes>	Submit a collection of similar jobs, e.g. –array=1-10. (sbatch command only). See official SLURM documentation
–dependency=<state:jobid>	Wait with the start of the job until specified dependencies have been satified. E.g. –dependency=afterok:123456
–ntasks-per-core=2	Enables hyperthreading. Only useful in special circumstances.

Job array

Job arrays offer a mechanism for submitting and managing collections of similar jobs quickly and easily; job arrays with millions of tasks can be submitted in milliseconds (subject to configured size limits). All jobs must have the same initial options (e.g. size, time limit, etc.)

#SBATCH --array 1-200
#SBATCH --array 1-200%5 # %N suffix where N is the number of active tasks

Variables

SLURM_ARRAY_JOB_ID	will be set to the first job ID of the array
SLURM_ARRAY_TASK_ID	will be set to the job array index value
SLURM_ARRAY_TASK_COUNT	will be set to the number of tasks in the job array
SLURM_ARRAY_TASK_MAX	will be set to the highest job array index value
SLURM_ARRAY_TASK_MIN	will be set to the lowest job array index value

Job script example

Before usage, please adjust parameters

#!/bin/bash -l
# Give your job a name, so you can recognize it in the queue overview
#SBATCH --job-name=example
 
#SBATCH -o slurm-%j.output # %j - will return SLURM_JOB_ID
#SBATCH -e slurm-%j.error
 
# Define, how many nodes you need. Here, we ask for 1 node.
# Each node has 8 cores.
#SBATCH --nodes=1
# You can further define the number of tasks with --ntasks-per-*
# See "man sbatch" for details. e.g. --ntasks=4 will ask for 4 cpus.
#SBATCH --ntasks=4
 
# How much memory you need.
# --mem will define memory per node and
# --mem-per-cpu will define memory per CPU/core. Choose one of those.
#SBATCH --mem=5GB   
##SBATCH --mem-per-cpu=1500MB  # this one is not in effect, due to the double hash
 
# Turn on mail notification. There are many possible self-explaining values:
# NONE, BEGIN, END, FAIL, ALL (including all aforementioned)
# For more values, check "man sbatch"
##SBATCH --mail-type=END,FAIL # this one is not in effect, due to the double hash
 
# You may not place any commands before the last SBATCH directive
 
# Define workdir for this job
WORK_DIRECTORY=/home/${USER}/test
cd ${WORK_DIRECTORY}
 
# This is where the actual work is done. In this case, the script only waits.
# The time command is optional, but it may give you a hint on how long the
# command worked
time sleep 10
#sleep 10
 
# Finish the script
exit 0

Put script to job queue with

sbatch ~/sampleScript.sh

Interactive mode

Get interactive access to shell on compute node

srun --nodes=1 --ntasks-per-node=1 --time=01:00:00 --pty bash -i
# Request specific node by name
srun --nodelist=node2 --ntasks-per-node=1 --time=01:00:00 --pty bash -i

Table of Contents