Slurm Cheatsheet -

A simple SLURM command cheatsheet.

Operation	Command
Drain a node	`scontrol update NodeName=nodelist State=drain Reason="describe reason here"`
Job history	`sacct -u username -S MMDD --format=JobID,JobName,MaxRSS,Elapsed`
Show queue	`scontrol show partition=threaded`
Show QOS	`sacctmgr show qos`
Job details	`scontrol show jobid=13703`
Change job time limit	`scontrol update jobid=2873672 timelimit=35-0:0:0`
All jobs since date	`sacct -a -X -o User,Account,Submit,Start,End,State -S '2018-07-01'`
Completed jobs	`sacct -a -X -o User,State -S '2018-07-01' \| grep COMPLETED`
Queued jobs	`sacct -a -X -o User,State -S '2018-07-01' \| grep PENDING`
Jobs that ran out of time	`sacct -a -X -o User,State -S '2018-07-01' \| grep TIMEOUT`
Maximum queued jobs allowed	`sacctmgr list account username withAssoc -p`
(Globally)	`scontrol show config \| grep -i max`
Jobs on particular node range with their timelimit	`squeue --nodes dus[21-24]`
Or for more precise output control and finding jobs gone from the queue	`sacct -a -X -s R -N dus[21-24] -o User,Account,Submit,Start,End,Timelimit,NodeList`
Create system reservation	`scontrol create reservation starttime=2019-02-19T15:00:00 duration=$((60*24)) user=root flags=maint nodes=ALL`
Group limits and priority	`sacctmgr list account def-username_cpu withAssoc -p`

slurm