Q: Where is slurm.conf?
A: in /etc/slurm-llnl/slurm.conf
Q: Why can't I run 2 "srun" on the same node at the same time?
A: We should use "--mem-per-cup=<sth in MB>"
Q: How to get resource usage for a job?
A: Let me give some examples. First run the job:
srun -w minion01 -p minion_superfast --ntasks=1 --nodes=1 --cpus-per-task=1 --mem-per-cpu=10 ping www.google.com
note: --mem-per-cpu=10 means 10 MB
Tasks and Nodes allocated to the job:
pduan@gru ~ % squeue -n ping --Format=numnodes NODES 1 pduan@gru ~ % squeue -n ping --Format=numtasks TASKS 1
CPU number used by the job:
Number of CPUs requested by the job or allocated to it if already running. As a job is completing this number will reflect the current number of CPUs allocated. (Valid for jobs only)
pduan@gru ~ % squeue -n ping --format="%C" CPUS 2
or
pduan@gru ~ % squeue -n ping --Format=numcpus CPUS 2
Min memories requested by the job:
Minimum size of memory (in MB) requested by the job. (Valid for jobs only)
pduan@gru ~ % squeue -n ping --format="%m" MIN_MEMORY 10M
or
pduan@gru ~ % squeue -n ping --Format=minmemory MIN_MEMORY 10M
Tracble resource usage:
Print the trackable resources allocated to the job.
pduan@gru ~ % squeue -n ping --Format=tres TRES cpu=2,mem=20M,node=1
Note: I found that "--cpus-per-task=<>" makes no difference because when I remove "--cpus-per-task=1" for the above job, the resource usage shows the same.
Q: How to get resource usage for a node?
A: When the above srun job is running, let's use sinfo to get such statistics.
CPUs a node owns:
pduan@gru ~ % sinfo -n minion01 --format=%c CPUS 40
CPUs a node owns in the format "allocated/idle/other/total":
pduan@gru ~ % sinfo -n minion01 --format=%C CPUS(A/I/O/T) 2/38/0/40
Size of temporary disk space per node in megabytes:
pduan@gru ~ % sinfo -n minion01 --format=%d TMP_DISK 0
Free memory of a node:
pduan@gru ~ % sinfo -n minion01 --format=%e FREE_MEM 1112
Size of memory per node in megabytes:
SLURM imposes a memory limit on each job. By default, it is deliberately relatively small — 128 MB per node.
pduan@gru ~ % sinfo -n minion01 --format=%m MEMORY 128
How many memories have been allocated in MB?
pduan@gru ~ % sinfo -n minion01 --Format=allocmem ALLOCMEM 20
why 20? because the job took 2 CPUs, and we set --mem-per-cpu=10MB
X:Y:Z
pduan@gru ~ % sinfo -n minion01 --format=%X SOCKETS 2 pduan@gru ~ % sinfo -n minion01 --format=%Y CORES 10 pduan@gru ~ % sinfo -n minion01 --format=%Z THREADS 2
or
pduan@gru ~ % sinfo -n minion01 --format=%z S:C:T 2:10:2