Slurm Cheatsheet

A simple SLURM command cheatsheet.

Operation Command
Drain a node scontrol update NodeName=nodelist State=drain Reason="describe reason here"
Job history sacct -u username -S MMDD --format=JobID,JobName,MaxRSS,Elapsed
Show queue scontrol show partition=threaded
Show QOS sacctmgr show qos
Job details scontrol show jobid=13703
Change job time limit scontrol update jobid=2873672 timelimit=35-0:0:0
All jobs since date sacct -a -X -o User,Account,Submit,Start,End,State -S '2018-07-01'
Completed jobs sacct -a -X -o User,State -S '2018-07-01' | grep COMPLETED
Queued jobs sacct -a -X -o User,State -S '2018-07-01' | grep PENDING
Jobs that ran out of time sacct -a -X -o User,State -S '2018-07-01' | grep TIMEOUT
Maximum queued jobs allowed sacctmgr list account username withAssoc -p
(Globally) scontrol show config | grep -i max
Jobs on particular node range with their timelimit squeue --nodes dus[21-24]
Or for more precise output control and finding jobs gone from the queue sacct -a -X -s R -N dus[21-24] -o User,Account,Submit,Start,End,Timelimit,NodeList
Create system reservation scontrol create reservation starttime=2019-02-19T15:00:00 duration=$((60*24)) user=root flags=maint nodes=ALL
Group limits and priority sacctmgr list account def-username_cpu withAssoc -p

See also