softirq raise/处理
调用raise_softirq_irqoff(HRTIMER_SOFTIRQ)触发一个softirq,这个函数会将local_softirq_pending_ref (int型)per cpu变量对应bit置1,如果当前不在interrupt context(softirq或者hardirq或者nmi)下,将会唤醒ksoftirqd/*线程。
4.19/kernel/softirq.c
void __raise_softirq_irqoff(unsigned int nr) { trace_softirq_raise(nr); or_softirq_pending(1UL << nr); }
ksoftirqd/*线程唤醒后,将执行run_ksoftirqd()函数,主要看__do_softirq()
static void run_ksoftirqd(unsigned int cpu) { local_irq_disable(); if (local_softirq_pending()) { /* * We can safely run softirq on inline stack, as we are not deep * in the task stack here. */ __do_softirq(); local_irq_enable(); cond_resched(); return; } local_irq_enable(); }
asmlinkage __visible void __softirq_entry __do_softirq(void) { unsigned long end = jiffies + MAX_SOFTIRQ_TIME; unsigned long old_flags = current->flags; int max_restart = MAX_SOFTIRQ_RESTART; struct softirq_action *h; bool in_hardirq; __u32 pending; int softirq_bit; /* * Mask out PF_MEMALLOC s current task context is borrowed for the * softirq. A softirq handled such as network RX might set PF_MEMALLOC * again if the socket is related to swap */ current->flags &= ~PF_MEMALLOC; pending = local_softirq_pending(); account_irq_enter_time(current); __local_bh_disable_ip(_RET_IP_, SOFTIRQ_OFFSET); in_hardirq = lockdep_softirq_start(); restart: /* Reset the pending bitmask before enabling irqs */ set_softirq_pending(0); local_irq_enable(); h = softirq_vec; while ((softirq_bit = ffs(pending))) { unsigned int vec_nr; int prev_count; h += softirq_bit - 1; vec_nr = h - softirq_vec; prev_count = preempt_count(); kstat_incr_softirqs_this_cpu(vec_nr); trace_softirq_entry(vec_nr); h->action(h); trace_softirq_exit(vec_nr); if (unlikely(prev_count != preempt_count())) { pr_err("huh, entered softirq %u %s %p with preempt_count %08x, exited with %08x?\n", vec_nr, softirq_to_name[vec_nr], h->action, prev_count, preempt_count()); preempt_count_set(prev_count); } h++; pending >>= softirq_bit; } rcu_bh_qs(); local_irq_disable(); pending = local_softirq_pending(); if (pending) { if (time_before(jiffies, end) && !need_resched() && --max_restart) goto restart; wakeup_softirqd(); } lockdep_softirq_end(in_hardirq); account_irq_exit_time(current); __local_bh_enable(SOFTIRQ_OFFSET); WARN_ON_ONCE(in_interrupt()); current_restore_flags(old_flags, PF_MEMALLOC); }
__do_softirq()首先会disable bh(bottom half),这个即是将thread_info里的preempt_count softirq field加1;
然后会将当前cpu的per cpu变量local_softirq_pending_ref读出来,使用ffs获取其值中第一个不为0的bit,得到了对应是哪个softirq触发了,根据这个index,确定对应softirq_vec数组里的哪个元素,然后调用这个元素的action函数。
softirq_vec数组是struct softirq_action结构体类型数组,在open softirq时,根据softirq的idx作为此数组元素的index,将这个softirq的action函数保存到这个数组
在这个while循环里依次将local_softirq_pending_ref所有置起来的bit对应的softirq都处理完毕。
处理完所有的softirq后,调用__local_bh_enable()将bh enable,即将thread_info里的preempt_count softirq field减一
上述ksoftirqd/* kernel thread是每个cpu core均有这样一个thread,在raise softirq时,将per cpu变量local_softirq_pending_ref对应bit置1后,然后唤醒当前的cpu去处理这个softirq,也就是哪个cpu触发softirq,就由哪个cpu去处理这个softirq:
root 9 2 0 18:59:59 ? 00:00:00 [ksoftirqd/0] root 17 2 0 18:59:59 ? 00:00:00 [ksoftirqd/1] root 22 2 0 18:59:59 ? 00:00:00 [ksoftirqd/2] root 27 2 0 18:59:59 ? 00:00:00 [ksoftirqd/3]
local_bh_disable()/local_bh_enable()
这两个函数分别是disable、enable softirq,disable softirq后,当硬中断发生了,处理完硬中断后如果raise了softirq,此时不会wake ksoftirq/* thread来处理softirq。即在raise_softirq_irqoff()里会判断当前是否在interrupt context,这个in_interrupt()包括hardirq、softirq、nmi,如果先前有disable softirq,此时将判断得出是在interrupt context,所以不会wake ksoftirq/* thread:
kernel/softirq.c
inline void raise_softirq_irqoff(unsigned int nr) { __raise_softirq_irqoff(nr); /* * If we're in an interrupt or softirq, we're done * (this also catches softirq-disabled code). We will * actually run the softirq once we return from * the irq or softirq. * * Otherwise we wake up ksoftirqd to make sure we * schedule the softirq soon. */ if (!in_interrupt()) wakeup_softirqd(); }
good blog:
https://zhuanlan.zhihu.com/p/80680484