A simple SLURM command cheatsheet.
| Operation | Command | 
|---|---|
| Drain a node | scontrol update NodeName=nodelist State=drain Reason="describe reason here" | 
| Job history | sacct -u username -S MMDD --format=JobID,JobName,MaxRSS,Elapsed | 
| Show queue | scontrol show partition=threaded | 
| Show QOS | sacctmgr show qos | 
| Job details | scontrol show jobid=13703 | 
| Change job time limit | scontrol update jobid=2873672 timelimit=35-0:0:0 | 
| All jobs since date | sacct -a -X -o User,Account,Submit,Start,End,State -S '2018-07-01' | 
| Completed jobs | sacct -a -X -o User,State -S '2018-07-01' | grep COMPLETED | 
| Queued jobs | sacct -a -X -o User,State -S '2018-07-01' | grep PENDING | 
| Jobs that ran out of time | sacct -a -X -o User,State -S '2018-07-01' | grep TIMEOUT | 
| Maximum queued jobs allowed | sacctmgr list account username withAssoc -p | 
| (Globally) | scontrol show config | grep -i max | 
| Jobs on particular node range with their timelimit | squeue --nodes dus[21-24] | 
| Or for more precise output control and finding jobs gone from the queue | sacct -a -X -s R -N dus[21-24] -o User,Account,Submit,Start,End,Timelimit,NodeList | 
| Create system reservation | scontrol create reservation starttime=2019-02-19T15:00:00 duration=$((60*24)) user=root flags=maint nodes=ALL | 
| Group limits and priority | sacctmgr list account def-username_cpu withAssoc -p |