A simple SLURM command cheatsheet.
Operation | Command |
---|---|
Drain a node | scontrol update NodeName=nodelist State=drain Reason="describe reason here" |
Job history | sacct -u username -S MMDD --format=JobID,JobName,MaxRSS,Elapsed |
Show queue | scontrol show partition=threaded |
Show QOS | sacctmgr show qos |
Job details | scontrol show jobid=13703 |
Change job time limit | scontrol update jobid=2873672 timelimit=35-0:0:0 |
All jobs since date | sacct -a -X -o User,Account,Submit,Start,End,State -S '2018-07-01' |
Completed jobs | sacct -a -X -o User,State -S '2018-07-01' | grep COMPLETED |
Queued jobs | sacct -a -X -o User,State -S '2018-07-01' | grep PENDING |
Jobs that ran out of time | sacct -a -X -o User,State -S '2018-07-01' | grep TIMEOUT |
Maximum queued jobs allowed | sacctmgr list account username withAssoc -p |
(Globally) | scontrol show config | grep -i max |
Jobs on particular node range with their timelimit | squeue --nodes dus[21-24] |
Or for more precise output control and finding jobs gone from the queue | sacct -a -X -s R -N dus[21-24] -o User,Account,Submit,Start,End,Timelimit,NodeList |
Create system reservation | scontrol create reservation starttime=2019-02-19T15:00:00 duration=$((60*24)) user=root flags=maint nodes=ALL |
Group limits and priority | sacctmgr list account def-username_cpu withAssoc -p |