====== Slurm ====== SLURM is the workload manger and job scheduler for tron.ift.uni.wroc.pl ===== Basic usage ===== |sinfo -alN|show nodes information| |squeue|Show job queue| |squeue -u |List all current jobs for a user| |squeue -u -t RUNNING|List all running jobs for a user| |squeue -u -t PENDING|List all pending jobs for a user| |scancel |To cancel one job| |scancel -u |To cancel all the jobs for a user| |scancel -t PENDING -u |To cancel all the pending jobs for a user| |scancel --name myJobName|To cancel one or more jobs by name| ===== Slurm batch ===== The following parameters can be used as command line parameters with *sbatch* and *srun* or in jobscript, see also Job script example below **Basic settings** |**Parameter**|**Function**| |–job-name=|Job name to be displayed by for example squeue| |–output=|Path to the file where the job (error) output is written to| |–mail-type=|Turn on mail notification; type can be one of BEGIN, END, FAIL, REQUEUE or ALL| |–mail-user=|Email address to send notifications to| **Resources** |**Parameter**|**Function**| |–time=|Time limit for job. Job will be killed by SLURM after time has run out. Format days-hours:minutes:seconds| |–nodes=|Number of nodes. Multiple nodes are only useful for jobs with distributed-memory (e.g. MPI).| |–mem=|Memory (RAM) per node. Number followed by unit prefix, e.g. 16G| |–mem-per-cpu=|Memory (RAM) per requested CPU core| |–ntasks-per-node=|Number of (MPI) processes per node. More than one useful only for MPI jobs. Maximum number depends nodes (number of cores)| |–cpus-per-task=|CPU cores per task. For MPI use one. For parallelized applications benchmark this is the number of threads.| |–exclusive|Job will not share nodes with other running jobs. You will be charged for the complete nodes even if you asked for less.| **Additional** |**Parameter**|**Function**| |–array=|Submit a collection of similar jobs, e.g. --array=1-10. (sbatch command only). See official SLURM documentation| |–dependency=|Wait with the start of the job until specified dependencies have been satified. E.g. –dependency=afterok:123456| |–ntasks-per-core=2|Enables hyperthreading. Only useful in special circumstances.| ==== Job array ==== Job arrays offer a mechanism for submitting and managing collections of similar jobs quickly and easily; job arrays with millions of tasks can be submitted in milliseconds (subject to configured size limits). All jobs must have the same initial options (e.g. size, time limit, etc.) #SBATCH --array 1-200 #SBATCH --array 1-200%5 # %N suffix where N is the number of active tasks **Variables** |SLURM_ARRAY_JOB_ID|will be set to the first job ID of the array| |SLURM_ARRAY_TASK_ID|will be set to the job array index value| |SLURM_ARRAY_TASK_COUNT|will be set to the number of tasks in the job array| |SLURM_ARRAY_TASK_MAX|will be set to the highest job array index value| |SLURM_ARRAY_TASK_MIN|will be set to the lowest job array index value| ==== Job script example ==== Before usage, please adjust parameters #!/bin/bash -l # Give your job a name, so you can recognize it in the queue overview #SBATCH --job-name=example #SBATCH -o slurm-%j.output # %j - will return SLURM_JOB_ID #SBATCH -e slurm-%j.error # Define, how many nodes you need. Here, we ask for 1 node. # Each node has 8 cores. #SBATCH --nodes=1 # You can further define the number of tasks with --ntasks-per-* # See "man sbatch" for details. e.g. --ntasks=4 will ask for 4 cpus. #SBATCH --ntasks=4 # How much memory you need. # --mem will define memory per node and # --mem-per-cpu will define memory per CPU/core. Choose one of those. #SBATCH --mem=5GB ##SBATCH --mem-per-cpu=1500MB # this one is not in effect, due to the double hash # Turn on mail notification. There are many possible self-explaining values: # NONE, BEGIN, END, FAIL, ALL (including all aforementioned) # For more values, check "man sbatch" ##SBATCH --mail-type=END,FAIL # this one is not in effect, due to the double hash # You may not place any commands before the last SBATCH directive # Define workdir for this job WORK_DIRECTORY=/home/${USER}/test cd ${WORK_DIRECTORY} # This is where the actual work is done. In this case, the script only waits. # The time command is optional, but it may give you a hint on how long the # command worked time sleep 10 #sleep 10 # Finish the script exit 0 Put script to job queue with sbatch ~/sampleScript.sh ===== Interactive mode ===== Get interactive access to shell on compute node srun --nodes=1 --ntasks-per-node=1 --time=01:00:00 --pty bash -i # Request specific node by name srun --nodelist=node2 --ntasks-per-node=1 --time=01:00:00 --pty bash -i