Kernel Synchronization
- Synchronization conditions
algorithm 2
repeat
flag[i]:=true;
while flag[j] do no-op;
critical section
flag[i]:=false;
reminder section
until false
algorithm 1
repeat
while turn <> i do no-op
critical section
turn:=j
reminder section
until false
- Locks
- Atomic operations
- Peterson's/tie-breaker algorithm
- Eisenberg and McGuire Algorithm
- Test-and-Set
- Swap
- Spinlock
- Semaphore
- Mutexes
- Priority Inversion
- Completion Variables
- Reader/Writer Locks
- Big-Reader Locks(brlocks)
- Big Kernel Lock(BKL)
- Read-Copy-Update
- <asm-xxx/atomic.h>
- /path to kernel source/include/asm-xxx/atomic.h
- strcut atomic_t
- special lock instructions are used to prevent the other processors in the system from working until the current processor has completed the next action.
- The required instruction is actually called lock on IA-32 systems.
- If the kernel was compiled without SMP support, the operations are implemented in the same way as for normal variables (only atomic_t encapsulation is observed.)
- struct local_t
- per-cpu counters
function Test-and-Set(var target:boolean):boolean;
begin
Test-and-Set:=target;
target:=true;
end
repeat
while Test-and-Set(lock) do no-op;
critical section
lock:=false;
reminder section
until false;
- Operating Systems Study Guide - 3.2.3. Test and Set Instruction
procedure Swap (var a,b:boolean);
var temp:boolean;
begin
temp:=a;
a:=b;
b:=temp;
end
repeat
key:=true;
repeat
Swap(lock,key);
until key:=false;
critical section
lock:=false;
remainder section
until false;
- linux/Documentation/spinlocks.txt
- busy waiting
- While the kernel is waiting for a spinlock to be released, it repeatedly checks whether it can acquire the lock without going to sleep in the meantime.
- busy waiting
LOCKED EQU 0 ; define value indicating
LDR r1, <addr> ; load semaphore address
LDR r0, =LOCKED ; preload "locked" value
spin_lock
SWP r0, r0, [r1] ; swap register value with semaphore
CMP r0, #LOCKED ; if semaphore was locked already
BEQ spin_lock ; retry
1: lock; decb %0 # atomically decrement
jns 3f # if clear sign bit (>=0) jump forward to 3
2: rep; nop # wait
cmpb $0, %0 # spin – compare to 0
jle 2b # go back to 2 if <= 0 (locked)
jmp 1b # unlocked; go back to 1 to try again
3: # we have acquired the lock …
- Deadlock: an interrupt tries to lock an already locked variable. IFF you know that the spinlocks are never used in interrupt handlers, you can use the non-irq versions. If an interrupt tries to lock an already locked variable. This is ok if the other interrupt happens on another CPU, but it is not ok if the interrupt happens on the same CPU that already holds the lock, because the lock will obviously never be released.
spin_lock(&lock);
...
<- interrupt comes in:
spin_lock(&lock);
- Static Initialization
old version:
static spinlock_t xxx_lock = SPIN_LOCK_UNLOCKED;
new version:
static DEFINE_SPINLOCK(xxx_lock);
or
struct foo bar
{
.lock = __SPIN_LOCK_UNLOCKED(bar.lock);
};
- Dynamic Initialization
spinlock_t xxx_lock;
spin_lock_init(&xxx_lock);
- no spin_lock required
- locking in the same BH
- a bottom half is never run on two CPUs at once
- locking between different BHs
- only one bottom half ever runs at a time once
- Locking in the same tasklet
- a tasklet is never run on two CPUs at once
- locking in the same BH
- spin_lock/spin_unlock
- locking between different tasklets
- another tasklet (or bottom half, such as a timer) can run on the other CPUs
- locking in the same Softirq(use a per-CPU array for better performance)
- the same softirq can run on the other CPUs
- locking between different Softirqs
- UP
- /path to kernel source/include/linux/spinlock_up.h
- For kernels compiled without CONFIG_SMP, spinlocks do not exist at all.
- SMP
- locking between different tasklets
- also disables the interrupts on the local CPU
- used when you acquire a lock in an interrupt handler
- locking between an interrupt handler and Softirqs/tasklets/BHs
- The softirq processing can be interrupted by a hardware interrupt.(the reason why we disable the interrupts on the local CPU)
- The critical region could be entered by a hardware interrupt on another CPU(the reason we need spon_lock)
- spin_lock_irqsave/spin_unlock_irqrestore
- save_flags(flags);cli()/restore_flags(flags);
- spin_lock_irq/spin_unlock_irq
- disables and re-enables interrupts unconditionally, in the same manner as cli() and sti().
- This code is only safe when you know that interrupts were not already disabled before the acquisition of the lock.
- also disables softIRQs(This stops bottom halves from being run on the current CPU. If you're already in a bottom half, this does nothing.)
- locking between user context and BHs
- locking between user context and tasklets/Softirqs
- spin_lock_bh(It disables bottom halves on that CPU, then grabs the lock)
- local_bh_disable();
- preempt_disable();
- spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
- spin_unlock_bh
- Spinlock recursion
s: semaphore
wait(s)
while s≤0 do no-op;
s:=s-1;
signal(s)
s:=s+1;
repeat
wait(mutex);
critical section
signal(mutex);
reminder section
until false;
- Semaphores in Linux are sleeping locks.
- If you have a data structure which is only ever accessed from user context, then you can use a simple semaphore (linux/asm/semaphore.h) to protect it. If you can't get a semaphore, your task will put itself on the queue, and be woken up when the semaphore is released. This means the CPU will do something else while you are waiting, but there are many cases when you simply can't sleep, and so have to use a spinlock instead.
- In a slightly modified implementation, it would be possible for a semaphore's value to be less than zero. When a process executes wait(), the semaphore count is automatically decremented. The magnitude of the negative value would determine how many processes were waiting on the semaphore.
- sema_init
- down
- the process is placed in the TASK_UNINTERRUPTIBLE state and cannot receive signals while waiting to enter the critical region.
- down_interruptible
- the task in the TASK_INTERRUPTIBLE state if the semaphore could not be acquired. As a result, the process can be woken by signals while it is sleeping.
- down_trylock
- If it fails, the process does not go to sleep to wait for the semaphore but continues execution normally.
- up
- down_trylock() and up() can be called from interrupt context.
- Mutex has ownership.
- Mutex vs. Semaphores Part 1: Semaphores, Part 2: The Mutex, Part 3 (final part): Mutual Exclusion Problems
Problem of Semaphores | ||
---|---|---|
Accidental release | a semaphore isn’t correctly acquired but is then released. | a check is made on the release and an error is raised if current task is not owner. |
Recursive Deadlock | a task tries to lock a semaphore it has already locked. | Due to ownership, a mutex can support relocking of the same mutex by the owning task as long as it is released the same number of times. |
Deadlock through Task Death | a task that is holding a semaphore dies or is terminated | OS can detect if that task current owns a mutex and signal waiting tasks of this condition. |
Priority Inversion | low acquires a lock. High waits for low and low is blocked by medium which is long-running. | With ownership, this problem can be addressed using Priority Inheritance Protocol or Priority Ceiling Protocol |
Semaphore as a Signal | P and V calls are not used as a pair in the same task. | a mutex should be paired |
- init_MUTEX/init_MUTEX_LOCKED
- DECLARE_MUTEX/DECLARE_MUTEX_LOCKED
- mutex_unlock
- Note that mutex_unlock() cannot be called from interrupt context.
- Priority Inheritance Protocols: An Approach to Real-Time Synchronization
- Introduction to Priority Inversion
- Priority inheritance
- This technique mandates that a lower-priority task inherit the priority of any higher-priority task pending on a resource they share.
- This priority change should take place as soon as the high-priority task begins to pend; it should end when the resource is released.
- Priority ceilings
- The priority assigned to the resource is the priority of its highest-priority user, plus one.
- One task waits on the completion variable while another task performs some work. When the other task has completed the work, it uses the completion variable to wake up any waiting tasks. If this sounds like a semaphore, you are right the idea is much the same. In fact, completion variables merely provide a simple solution to a problem whose answer is otherwise semaphores.
- The basic summary is that we had this (fairly common) way of waiting for certain events by having a locked semaphore on the stack of the waiter, and then having the waiter do a "down()" which caused it to block until the thing it was waiting for did an "up()".
- This works fairly well, but it has a really small (and quite unlikely) race on SMP, that is not so much a race of the idea itself, as of the implementation of the semaphores. We could have fixed the semaphores, but there were a few reasons not to:
- the semaphores are optimized (on purpose) for the non-contention case. The "wait for completion" usage has the opposite default case
- the semaphores are quite involved and architecture-specific, exactly due to this optimization. Trying to change them is painful as hell. So instead, I introduced the notion of "wait for completion":
- rwlock_t
- read_lock/read_unlock
- write_lock/write_unlock
- struct rw_semaphore
- init_rwsem
- down_read/up_read
- down_write/up_write
- br_read_lock/br_read_unlock
- lock_kernel()/unlock_kernel
- Read Copy Update HOWTO
- The mechanism keeps track of all users of the pointer to the shared data structure. When the structure is supposed to change, a copy (or a new instance that is filled in appropriately, this does not make any difference) is first created and the change is performed there. After all previous readers have finished their reading work on the old copy, the pointer can be replaced by a pointer to the new, modified copy.