ReadCopy Update (RCU)
ULK中讲的,感觉这个想法很巧妙,也不翻译了,保留原意吧。
Read-copy update (RCU) is yet another synchronization technique designed to protect data
structures that are mostly accessed for reading by several CPUs. RCU allows many readers and
many writers to proceed concurrently (an improvement over seqlocks, which allow only one writer
to proceed). Moreover, RCU is lock-free, that is, it uses no lock or counter shared by all CPUs; this
is a great advantage over read/write spin locks and seqlocks, which have a high overhead due to
cache line-snooping and invalidation.
How does RCU obtain the surprising result of synchronizing several CPUs without shared data
structures? The key idea consists of limiting the scope of RCU as follows:
1. Only data structures that are dynamically allocated and referenced by means of pointers can
be protected by RCU.
2. No kernel control path can sleep inside a critical region protected by RCU.
When a kernel control path wants to read an RCU-protected data structure, it executes the
rcu_read_lock( ) macro, which is equivalent to preempt_disable( ) . Next, the reader
dereferences the pointer to the data structure and starts reading it. As stated above, the reader
cannot sleep until it finishes reading the data structure; the end of the critical region is marked by
the rcu_read_unlock( ) macro, which is equivalent to preempt_enable( ).
Because the reader does very little to prevent race conditions, we could expect that the writer has
to work a bit more. In fact, when a writer wants to update the data structure, it dereferences the
pointer and makes a copy of the whole data structure. Next, the writer modifies the copy. Once
finished, the writer changes the pointer to the data structure so as to make it point to the
updated copy. Because changing the value of the pointer is an atomic operation, each reader or
writer sees either the old copy or the new one: no corruption in the data structure may occur.
However, a memory barrier is required to ensure that the updated pointer is seen by the other
CPUs only after the data structure has been modified. Such a memory barrier is implicitly
introduced if a spin lock is coupled with RCU to forbid the concurrent execution of writers.
The real problem with the RCU technique, however, is that the old copy of the data structure
cannot be freed right away when the writer updates the pointer. In fact, the readers that were
accessing the data structure when the writer started its update could still be reading the old copy.
The old copy can be freed only after all (potential) readers on the CPUs have executed the
rcu_read_unlock( ) macro. The kernel requires every potential reader to execute that macro
before:
The CPU performs a process switch (see restriction 2 earlier).
The CPU starts executing in User Mode.
The CPU executes the idle loop (see the section "Kernel Threads" in Chapter 3).
In each of these cases, we say that the CPU has gone through a quiescent state.
The call_rcu( ) function is invoked by the writer to get rid of the old copy of the data structure.
It receives as its parameters the address of an rcu_head descriptor (usually embedded inside the
data structure to be freed) and the address of a callback function to be invoked when all CPUs
have gone through a quiescent state. Once executed, the callback function usually frees the old
copy of the data structure.
The call_rcu( ) function stores in the rcu_head descriptor the address of the callback and its
parameter, then inserts the descriptor in a per-CPU list of callbacks. Periodically, once every tick
(see the section "Updating Local CPU Statistics" in Chapter 6), the kernel checks whether the local
CPU has gone through a quiescent state. When all CPUs have gone through a quiescent state, a
local taskletwhose descriptor is stored in the rcu_tasklet per-CPU variableexecutes all callbacks
in the list.
RCU is a new addition in Linux 2.6; it is used in the networking layer and in the Virtual Filesystem.