Table of Contents
Slurm
SLURM is the workload manger and job scheduler for tron.ift.uni.wroc.pl
Basic usage
sinfo -alN | show nodes information |
squeue | Show job queue |
squeue -u <username> | List all current jobs for a user |
squeue -u <username> -t RUNNING | List all running jobs for a user |
squeue -u <username> -t PENDING | List all pending jobs for a user |
scancel <jobid> | To cancel one job |
scancel -u <username> | To cancel all the jobs for a user |
scancel -t PENDING -u <username> | To cancel all the pending jobs for a user |
scancel –name myJobName | To cancel one or more jobs by name |
Slurm batch
The following parameters can be used as command line parameters with *sbatch* and *srun* or in jobscript, see also Job script example below
Basic settings
Parameter | Function |
–job-name=<name> | Job name to be displayed by for example squeue |
–output=<path> | Path to the file where the job (error) output is written to |
–mail-type=<type> | Turn on mail notification; type can be one of BEGIN, END, FAIL, REQUEUE or ALL |
–mail-user=<email_address> | Email address to send notifications to |
Resources
Parameter | Function |
–time=<d-hh:mm:ss> | Time limit for job. Job will be killed by SLURM after time has run out. Format days-hours:minutes:seconds |
–nodes=<num_nodes> | Number of nodes. Multiple nodes are only useful for jobs with distributed-memory (e.g. MPI). |
–mem=<MB> | Memory (RAM) per node. Number followed by unit prefix, e.g. 16G |
–mem-per-cpu=<MB> | Memory (RAM) per requested CPU core |
–ntasks-per-node=<num_procs> | Number of (MPI) processes per node. More than one useful only for MPI jobs. Maximum number depends nodes (number of cores) |
–cpus-per-task=<num_threads> | CPU cores per task. For MPI use one. For parallelized applications benchmark this is the number of threads. |
–exclusive | Job will not share nodes with other running jobs. You will be charged for the complete nodes even if you asked for less. |
Additional
Parameter | Function |
–array=<indexes> | Submit a collection of similar jobs, e.g. –array=1-10. (sbatch command only). See official SLURM documentation |
–dependency=<state:jobid> | Wait with the start of the job until specified dependencies have been satified. E.g. –dependency=afterok:123456 |
–ntasks-per-core=2 | Enables hyperthreading. Only useful in special circumstances. |
Job array
Job arrays offer a mechanism for submitting and managing collections of similar jobs quickly and easily; job arrays with millions of tasks can be submitted in milliseconds (subject to configured size limits). All jobs must have the same initial options (e.g. size, time limit, etc.)
#SBATCH --array 1-200 #SBATCH --array 1-200%5 # %N suffix where N is the number of active tasks
Variables
SLURM_ARRAY_JOB_ID | will be set to the first job ID of the array |
SLURM_ARRAY_TASK_ID | will be set to the job array index value |
SLURM_ARRAY_TASK_COUNT | will be set to the number of tasks in the job array |
SLURM_ARRAY_TASK_MAX | will be set to the highest job array index value |
SLURM_ARRAY_TASK_MIN | will be set to the lowest job array index value |
Job script example
Before usage, please adjust parameters
#!/bin/bash -l # Give your job a name, so you can recognize it in the queue overview #SBATCH --job-name=example #SBATCH -o slurm-%j.output # %j - will return SLURM_JOB_ID #SBATCH -e slurm-%j.error # Define, how many nodes you need. Here, we ask for 1 node. # Each node has 8 cores. #SBATCH --nodes=1 # You can further define the number of tasks with --ntasks-per-* # See "man sbatch" for details. e.g. --ntasks=4 will ask for 4 cpus. #SBATCH --ntasks=4 # How much memory you need. # --mem will define memory per node and # --mem-per-cpu will define memory per CPU/core. Choose one of those. #SBATCH --mem=5GB ##SBATCH --mem-per-cpu=1500MB # this one is not in effect, due to the double hash # Turn on mail notification. There are many possible self-explaining values: # NONE, BEGIN, END, FAIL, ALL (including all aforementioned) # For more values, check "man sbatch" ##SBATCH --mail-type=END,FAIL # this one is not in effect, due to the double hash # You may not place any commands before the last SBATCH directive # Define workdir for this job WORK_DIRECTORY=/home/${USER}/test cd ${WORK_DIRECTORY} # This is where the actual work is done. In this case, the script only waits. # The time command is optional, but it may give you a hint on how long the # command worked time sleep 10 #sleep 10 # Finish the script exit 0
Put script to job queue with
sbatch ~/sampleScript.sh
Interactive mode
Get interactive access to shell on compute node
srun --nodes=1 --ntasks-per-node=1 --time=01:00:00 --pty bash -i # Request specific node by name srun --nodelist=node2 --ntasks-per-node=1 --time=01:00:00 --pty bash -i