进程切换之奥秘解析

学号：SA12**6112

前面一篇博文分析了进程从用户态切换到内核态时，内核所做的主要的事，本文将研究在进程切换时，内核所做的事。

在内核态，进程切换主要分两步：

1：切换页全局目录

2：切换内核堆栈和硬件上下文

用prev指向被替换进程的表述符，next指向被激活进程的描述符

下面分析进程切换的第二步

第二步主要由switch_to宏实现：

3.3内核中X86体系下：/arch/x86/include/asm/system.h文件的第48行处：

 48 #define switch_to(prev, next, last)                                     
 49 do {                                                                    
 50         /*                                                              
 51          * Context-switching clobbers all registers, so we clobber      
 52          * them explicitly, via unused output variables.                
 53          * (EAX and EBP is not listed because EBP is saved/restored     
 54          * explicitly for wchan access and EAX is the return value of   
 55          * __switch_to())                                               
 56          */                                                             
 57         unsigned long ebx, ecx, edx, esi, edi;                          
 58                                                                         
 59         asm volatile("pushfl
	"               /* save    flags */     
 60                      "pushl %%ebp
	"          /* save    EBP   */     
 61                      "movl %%esp,%[prev_sp]
	"        /* save    ESP   */ 
 62                      "movl %[next_sp],%%esp
	"        /* restore ESP   */ 
 63                      "movl $1f,%[prev_ip]
	"  /* save    EIP   */     
 64                      "pushl %[next_ip]
	"     /* restore EIP   */     
 65                      __switch_canary                                    
 66                      "jmp __switch_to
"        /* regparm call  */     
 67                      "1:	"                                             
 68                      "popl %%ebp
	"           /* restore EBP   */     
 69                      "popfl
"                  /* restore flags */     
 70                                                                         
 71                      /* output parameters */                            
 72                      : [prev_sp] "=m" (prev->thread.sp),                
 73                        [prev_ip] "=m" (prev->thread.ip),                
 74                        "=a" (last),                                     
 75                                                                         
 76                        /* clobbered output registers: */                
 77                        "=b" (ebx), "=c" (ecx), "=d" (edx),              
 78                        "=S" (esi), "=D" (edi)                           
 79                                                                         
 80                        __switch_canary_oparam                           
 81                                                                         
 82                        /* input parameters: */                          
 83                      : [next_sp]  "m" (next->thread.sp),                
 84                        [next_ip]  "m" (next->thread.ip),                
 85                                                                         
 86                        /* regparm parameters for __switch_to(): */      
 87                        [prev]     "a" (prev),                           
 88                        [next]     "d" (next)                            
 89                                                                         
 90                        __switch_canary_iparam                           
 91                                                                         
 92                      : /* reloaded segment registers */                 
 93                         "memory");                                      
 94 } while (0)

一：由上面的代码可以看出，切换内核堆栈主要工作是：

1：把eflags和ebp寄存器保存到prev内核栈中。

2：把esp保存到prev->thread.sp中，eip保存到prev->thread.ip中。

3：把next指向的新进程的thread.esp保存到esp中，把next->thread.ip保存到eip中

至此已经完成了内核堆栈的切换。

二：切换内核堆栈之后，TSS段也要相应的改变：

这是因为对于linux系统来说同一个CPU上所有的进程共用一个TSS，进程切换了，因此TSS需要随之改变。

linux系统中主要从两个方面用到了TSS：

(1)任何进程从用户态陷入内核态都必须从TSS获得内核堆栈指针

(2)用户态读写IO需要访问TSS的权限位图。

所以在进程切换时也要更新TSS中的esp0和IO权位图的值，这主要在_switch_to函数中完成：

3.3内核X86体系下：/arch/x86/kernel/process_32.c文件中第296行处：

296 __notrace_funcgraph struct task_struct *
297 __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
298 {
299         struct thread_struct *prev = &prev_p->thread,
300                                  *next = &next_p->thread;
301         int cpu = smp_processor_id();
302         struct tss_struct *tss = &per_cpu(init_tss, cpu);
303         fpu_switch_t fpu;
304 
305         /* never put a printk in __switch_to... printk() calls wake_up*() indirectly */
306 
307         fpu = switch_fpu_prepare(prev_p, next_p, cpu);
308 
309         /*
310          * Reload esp0.
311          */
312         load_sp0(tss, next);
313 
314         /*
315          * Save away %gs. No need to save %fs, as it was saved on the
316          * stack on entry.  No need to save %es and %ds, as those are
317          * always kernel segments while inside the kernel.  Doing this
318          * before setting the new TLS descriptors avoids the situation
319          * where we temporarily have non-reloadable segments in %fs
320          * and %gs.  This could be an issue if the NMI handler ever
321          * used %fs or %gs (it does not today), or if the kernel is
322          * running inside of a hypervisor layer.
323          */
324         lazy_save_gs(prev->gs);
325 
326         /*
327          * Load the per-thread Thread-Local Storage descriptor.
328          */
329         load_TLS(next, cpu);
330 
331         /*
332          * Restore IOPL if needed.  In normal use, the flags restore
333          * in the switch assembly will handle this.  But if the kernel
334          * is running virtualized at a non-zero CPL, the popf will
335          * not restore flags, so it must be done in a separate step.
336          */
337         if (get_kernel_rpl() && unlikely(prev->iopl != next->iopl))
338                 set_iopl_mask(next->iopl);
339 
340         /*
341          * Now maybe handle debug registers and/or IO bitmaps
342          */
343         if (unlikely(task_thread_info(prev_p)->flags & _TIF_WORK_CTXSW_PREV ||
344                      task_thread_info(next_p)->flags & _TIF_WORK_CTXSW_NEXT))
345                 __switch_to_xtra(prev_p, next_p, tss);
346 
347         /*
348          * Leave lazy mode, flushing any hypercalls made here.
349          * This must be done before restoring TLS segments so
350          * the GDT and LDT are properly updated, and must be
351          * done before math_state_restore, so the TS bit is up
352          * to date.
353          */
354         arch_end_context_switch(next_p);
355 
356         /*
357          * Restore %gs if needed (which is common)
358          */
359         if (prev->gs | next->gs)
360                 lazy_load_gs(next->gs);
361 
362         switch_fpu_finish(next_p, fpu);
363 
364         percpu_write(current_task, next_p);
365 
366         return prev_p;
367 }

由上面的代码可看出：TSS的更新主要是

1： load_sp0(tss, next); 从下一个进程的thread字段中获取它的sp0，并用它来更新TSS中的sp0

2： __switch_to_xtra(prev_p, next_p, tss);必要的时候会更新IO权位值。

相关阅读:
构建之法阅读笔记02
四则运算2
第一周学习进度条
 简单四则运算
 构建之法阅读笔记01
简读《构建之法》，所想问题展示
 介绍
 典型用户和用户场景描述
 工作总结03
工作总结02
原文地址：https://www.cnblogs.com/justcxtoworld/p/3157621.html