• Linux thread process and kernel mode and user mode page table


    Linux 中线程和进程切换的开销:

    Linux 操作系统层面的进程和线程的实现都是task_struct描述符. task_struct 包含成员变量:内核态stack.  这些都存在3-4G虚拟地址空间的内核态空间中。内核栈用于保存各个寄存器值:CS,DS,SS等.  os层面的线程进程切换,都是在kernel mode下操作的。每个process都有自己unique的内核栈(因为每个process对应一个task_struct,kernel stack is member of the struct). 

    process context switch: 从user mode 到kernel mode, 内核stack用于保存user mode的寄存器值,用于下次返回用户态时候,能够通过寄存器找到指令和内存地址。user mode 通过中断进去kernel mode,通过int $80 syscall mechanism,找到中断处理程序:

    包括:

    The int instruction is a complex multi step instruction. Here is an explanation of what it does:

    1.) Extracts descriptor from IDT (IDT address stored in special register) and checks that CPL <= DPL. CPL is a current privilege level, which could be read from CS register. DPL is stored in the IDT descriptor. As a consequence of this - you can't generate some exceptions (f.e. page fault) from user space directly by int instruction. If you will try to do this, you will get general protection exception

    2.) The processor switches to the stack defined in TSS. TSS was initialized earlier, and already contains values of ESP and SS, which holds the kernel stack address. So now ESP points to kernel stack.

    3.) The processor pushes to the newly switched kernel stack user space registers: ss, esp, eflags, cs, eip. We need to return back after syscall is served, right?

    4.) Next processor set CS and EIP from IDT descriptor. This address defines exception vector entry point.

    5.) Here we are in the syscall exception vector in kernel.

    以上是user to kernel,那么如果是线程进程切换呢?sched_yield system call会接着把选择一个线程进行切换,把new 线程的内核栈pop到寄存器中,正式进入新线程的内核态,然后返回user mode。完成切换

    区别呢?proces 切换包括 虚拟地址空间的切换,切换的实质就是cr3切换(内存空间切换,在switch_mm函数中)+ 寄存器切换(包括EIP,ESP等,均在switch_to函数中). 任何线程内核态的页表完全一样,是共享的。只有用户态页表不同。这就是主要区别,就是页表,由此到来的TLB 失效,导致的性能开销。 所谓TLB,是因为TLB存在最近使用的页表项,页表本身是物理内存。TLB减少了页表项的寻址.

    用户层面的线程栈大小为什么是8MB限制。因为很多语言都支持多线程。例如C++ pthread,所谓线程栈都在进程地址空间的stack栈区。不同线程栈不应该相互重叠,否则会写坏各自的栈区crash。所以如果不事先规定stack的地址和大小。而是无限增长,那么肯定会重叠。且分配过大会导致可create的线程数变小。用户态线程切换的本质就是寄存器的切换,非常轻量级别

    CPU的特权级别:ring 0- ring 3. cs段选择子本质就是cs寄存器的值,包括index 和 CPL,index用于找到段描述符表的一个段描述符entry的偏移地址。段描述符包含段基址和DPL,也就是段地址:线性地址。同时表明这个线性地址的特权级别。注意分段机制下,cs和ds,ss段看成不同的段,现代os已经废除分段机制,intel只是为了兼容。内核态的cs,ss,ds段都会把DPL置成0,表明user mode 的指令不能操作它们。这就是保护模式。那么为什么需要RPL呢?

    RPL – Requested Privilege Level

    These are the last two bits of DS, ES, SS, FS, GS registers. RPL field is used to harden the CPL, when higher-privileged code is servicing lower-privileged processes requests.

    Assume a higher-privileged device-driver that supports a mechanism where, it can copy data from disks directly into lower-privileged processes’ data-segments. Lower-privileged processes must pass their data-segment details (selector, address and size of data to copy) to the device-driver so that device-driver can copy data into appropriate location.

    Since a device-driver is higher-privileged, a lower-privileged process can trick the driver to copy data into high-privileged data-segments, simply by passing wrong selector value. This kind of exploit is called, Privilege Escalation.

    How RPL helps to solve Privilege Escalation problem?

    Continuing the above example, whenever device-driver loads the destination segment, it modifies the destination segment’s RPL to match the requestor (lower-privileged) process. Since protection rules for data-segments check for both CPL <= DPL and RPL <= DPL conditions, higher-privileged process gets a protection-fault on RPL <= DPL check.

    The point to note is, higher-privileged code, when it is providing services to lower-privileged processes should reduce its privilege temporarily to the requestors’ privilege-level.

     cpu 的privilege 模式可以保护内存,如果user态范围了受保护的内存地址,会触发segment fault error.

    至于二级页表的根本目的就是减少连续虚拟地址空间的需求,不然32位的process 会需要4MB的页表大小(单页4KB前提下)。 因为物理页框的大小是4KB,那么虚拟线性地址空间如果找到物理地址呢?假如采用直接映射的话,一个页表项对应一个页框,4GB/4KB=1MB。需要1mb个页表项进行映射,那么每个页表项需要多少bytes呢?1MB有20bit,所以最少需要20bit,3bytes大小,实际取4bytes大小。所以不采用分页目录,每个进程页表4MB物理内存。 4KB的物理页框是2的12次方个的物理地址。说明如果是32位的话,后12位可以不考虑,直接寻址前20位。

    https://blog.csdn.net/displayMessage/article/details/80905810

  • 相关阅读:
    MVC4中常用的短句及配置归结(部分)
    结合EF5.0讲MVC4(四)将我们的程序改成数据库优先模式
    结合EF5.0讲MVC4(二)为先前程序添加查询及主外键关系
    【译】《Pro ASP.NET MVC4 4th Edition》第二章(一)
    XtraReport应用(1)(XtraReport From File)
    结合EF5.0讲MVC4(一)创建一个MVC4应用程序
    Scrum实际应用(一)
    结合EF5.0讲MVC4(三)为我们的程序添加过滤器
    C# LINQ详解(一)
    如何在 Windows Server 2008 上打开 SQL Server 防火墙端口
  • 原文地址:https://www.cnblogs.com/kkshaq/p/10832873.html
Copyright © 2020-2023  润新知