• Test SLURM


    Q: Where is slurm.conf?

    A: in /etc/slurm-llnl/slurm.conf

    Q: Why can't I run 2 "srun" on the same node at the same time?

    A: We should use "--mem-per-cup=<sth in MB>"

    Q: How to get resource usage for a job?

    A: Let me give some examples. First run the job:

    srun -w minion01 -p minion_superfast --ntasks=1 --nodes=1 --cpus-per-task=1 --mem-per-cpu=10 ping www.google.com

    note: --mem-per-cpu=10 means 10 MB

    Tasks and Nodes allocated to the job:

    pduan@gru ~ % squeue -n ping  --Format=numnodes
    NODES               
    1                   
    pduan@gru ~ % squeue -n ping  --Format=numtasks
    TASKS               
    1  

     CPU number used by the job:

    Number of CPUs requested by the job or allocated to it if already running. As a job is completing this number will reflect the current number of CPUs allocated. (Valid for jobs only)

    pduan@gru ~ % squeue -n ping  --format="%C"
    CPUS
    2

    or

    pduan@gru ~ % squeue -n ping  --Format=numcpus
    CPUS                
    2  

    Min memories requested by the job:

    Minimum size of memory (in MB) requested by the job. (Valid for jobs only)

    pduan@gru ~ % squeue -n ping  --format="%m"
    MIN_MEMORY
    10M

    or

    pduan@gru ~ % squeue -n ping  --Format=minmemory
    MIN_MEMORY          
    10M 

    Tracble resource usage:

    Print the trackable resources allocated to the job.

    pduan@gru ~ % squeue -n ping  --Format=tres   
    TRES                
    cpu=2,mem=20M,node=1

    Note: I found that "--cpus-per-task=<>" makes no difference because when I remove "--cpus-per-task=1" for the above job, the resource usage shows the same

    Q: How to get resource usage for a node?

    A: When the above srun job is running, let's use sinfo to get such statistics.

    CPUs a node owns:

    pduan@gru ~ % sinfo -n minion01 --format=%c
    CPUS
    40

    CPUs a node owns in the format "allocated/idle/other/total":

    pduan@gru ~ % sinfo -n minion01 --format=%C
    CPUS(A/I/O/T)
    2/38/0/40

    Size of temporary disk space per node in megabytes:

    pduan@gru ~ % sinfo -n minion01 --format=%d
    TMP_DISK
    0

    Free memory of a node:

    pduan@gru ~ % sinfo -n minion01 --format=%e
    FREE_MEM
    1112

    Size of memory per node in megabytes:

    SLURM imposes a memory limit on each job. By default, it is deliberately relatively small — 128 MB per node.

    pduan@gru ~ % sinfo -n minion01 --format=%m
    MEMORY
    128

    How many memories have been allocated in MB? 

    pduan@gru ~ % sinfo -n minion01 --Format=allocmem
    ALLOCMEM            
    20

    why 20? because the job took 2 CPUs, and we set --mem-per-cpu=10MB

    X:Y:Z

    pduan@gru ~ % sinfo -n minion01 --format=%X
    SOCKETS
    2
    pduan@gru ~ % sinfo -n minion01 --format=%Y
    CORES
    10
    pduan@gru ~ % sinfo -n minion01 --format=%Z
    THREADS
    2

    or

    pduan@gru ~ % sinfo -n minion01 --format=%z
    S:C:T
    2:10:2
  • 相关阅读:
    重启停止的作业 bg和fg
    shell nohup 让脚本一直以后台模式运行到结束
    shell jobs查看作业
    shell 移除信号捕获
    shell 多进程运行程序
    shell 脚本后台运行
    python3 生产者消费者
    python3 生产者消费者(守护线程)
    python3 进程线程协程 并发查找列表
    python3 线程间通信
  • 原文地址:https://www.cnblogs.com/chaseblack/p/10274868.html
Copyright © 2020-2023  润新知