1 files changed, 0 insertions, 957 deletions
diff --git a/en_US.ISO8859-1/books/arch-handbook/smp/chapter.sgml b/en_US.ISO8859-1/books/arch-handbook/smp/chapter.sgml
deleted file mode 100644
index c6c78e0feb..0000000000
--- a/en_US.ISO8859-1/books/arch-handbook/smp/chapter.sgml
+++ /dev/null
@@ -1,957 +0,0 @@
-<!DOCTYPE article PUBLIC "-//FreeBSD//DTD DocBook V4.1-Based Extension//EN" [
-<!ENTITY % man PUBLIC "-//FreeBSD//ENTITIES DocBook Manual Page Entities//EN">
-%man;
-
-<!ENTITY % authors PUBLIC "-//FreeBSD//ENTITIES DocBook Author Entities//EN">
-%authors;
-<!ENTITY % misc PUBLIC "-//FreeBSD//ENTITIES DocBook Miscellaneous FreeBSD Entities//EN">
-%misc;
-
-<!--ENTITY % mailing-lists PUBLIC "-//FreeBSD//ENTITIES DocBook Mailing List Entities//EN"-->
-<!--
-%mailing-lists;
--->
-
-]>
-
-<article>
-  <articleinfo>
-    <title>SMPng Design Document</title>
-
-    <authorgroup>
-      <author>
-	<firstname>John</firstname>
-	<surname>Baldwin</surname>
-      </author>
-      <author>
-	<firstname>Robert</firstname>
-	<surname>Watson</surname>
-      </author>
-    </authorgroup>
-
-    <pubdate>$FreeBSD$</pubdate>
-
-    <copyright>
-      <year>2002</year>
-      <year>2003</year>
-      <holder>John Baldwin</holder>
-      <holder>Robert Watson</holder>
-    </copyright>
-
-    <abstract>
-      <para>This document presents the current design and implementation of
-	the SMPng Architecture.  First, the basic primitives and tools are
-	introduced.  Next, a general architecture for the FreeBSD kernel's
-	synchronization and execution model is laid out.  Then, locking
-	strategies for specific subsystems are discussed, documenting the
-	approaches taken to introduce fine-grained synchronization and
-	parallelism for each subsystem.  Finally, detailed implementation
-	notes are provided to motivate design choices, and make the reader
-	aware of important implications involving the use of specific
-	primitives. </para>
-    </abstract>
-  </articleinfo>
-
-  <sect1>
-    <title>Introduction</title>
-
-    <para>This document is a work-in-progress, and will be updated to
-      reflect on-going design and implementation activities associated
-      with the SMPng Project.  Many sections currently exist only in
-      outline form, but will be fleshed out as work proceeds.  Updates or
-      suggestions regarding the document may be directed to the document
-      editors.</para>
-
-    <para>The goal of SMPng is to allow concurrency in the kernel.
-      The kernel is basically one rather large and complex program. To
-      make the kernel multi-threaded we use some of the same tools used
-      to make other programs multi-threaded.  These include mutexes,
-      shared/exclusive locks, semaphores, and condition variables.  For
-      the definitions of these and other SMP-related terms, please see
-      the <xref linkend="glossary"> section of this article.</para>
-  </sect1>
-
-  <sect1>
-    <title>Basic Tools and Locking Fundamentals</title>
-
-    <sect2>
-      <title>Atomic Instructions and Memory Barriers</title>
-
-      <para>There are several existing treatments of memory barriers
-	and atomic instructions, so this section will not include a
-	lot of detail.  To put it simply, one can not go around reading
-	variables without a lock if a lock is used to protect writes
-	to that variable.  This becomes obvious when you consider that
-	memory barriers simply determine relative order of memory
-	operations; they do not make any guarantee about timing of
-	memory operations.  That is, a memory barrier does not force
-	the contents of a CPU's local cache or store buffer to flush.
-	Instead, the memory barrier at lock release simply ensures
-	that all writes to the protected data will be visible to other
-	CPU's or devices if the write to release the lock is visible.
-	The CPU is free to keep that data in its cache or store buffer
-	as long as it wants. However, if another CPU performs an
-	atomic instruction on the same datum, the first CPU must
-	guarantee that the updated value is made visible to the second
-	CPU along with any other operations that memory barriers may
-	require.</para>
-
-      <para>For example, assuming a simple model where data is
-	considered visible when it is in main memory (or a global
-	cache), when an atomic instruction is triggered on one CPU,
-	other CPU's store buffers and caches must flush any writes to
-	that same cache line along with any pending operations behind
-	a memory barrier.</para>
-
-      <para>This requires one to take special care when using an item
-	protected by atomic instructions.  For example, in the sleep
-	mutex implementation, we have to use an
-	<function>atomic_cmpset</function> rather than an
-	<function>atomic_set</function> to turn on the
-	<constant>MTX_CONTESTED</constant> bit.  The reason is that we
-	read the value of <structfield>mtx_lock</structfield> into a
-	variable and then make a decision based on that read.
-	However, the value we read may be stale, or it may change
-	while we are making our decision.  Thus, when the
-	<function>atomic_set</function> executed, it may end up
-	setting the bit on another value than the one we made the
-	decision on. Thus, we have to use an
-	<function>atomic_cmpset</function> to set the value only if
-	the value we made the decision on is up-to-date and
-	valid.</para>
-
-      <para>Finally, atomic instructions only allow one item to be
-	updated or read.  If one needs to atomically update several
-	items, then a lock must be used instead.  For example, if two
-	counters must be read and have values that are consistent
-	relative to each other, then those counters must be protected
-	by a lock rather than by separate atomic instructions.</para>
-    </sect2>
-
-    <sect2>
-      <title>Read Locks versus Write Locks</title>
-
-      <para>Read locks do not need to be as strong as write locks.
-	Both types of locks need to ensure that the data they are
-	accessing is not stale.  However, only write access requires
-	exclusive access.  Multiple threads can safely read a value.
-	Using different types of locks for reads and writes can be
-	implemented in a number of ways.</para>
-
-      <para>First, sx locks can be used in this manner by using an
-	exclusive lock when writing and a shared lock when reading.
-	This method is quite straightforward.</para>
-
-      <para>A second method is a bit more obscure.  You can protect a
-	datum with multiple locks.  Then for reading that data you
-	simply need to have a read lock of one of the locks.  However,
-	to write to the data, you need to have a write lock of all of
-	the locks.  This can make writing rather expensive but can be
-	useful when data is accessed in various ways.  For example,
-	the parent process pointer is protected by both the
-	proctree_lock sx lock and the per-process mutex.  Sometimes
-	the proc lock is easier as we are just checking to see who a
-	parent of a process is that we already have locked.  However,
-	other places such as <function>inferior</function> need to
-	walk the tree of processes via parent pointers and locking
-	each process would be prohibitive as well as a pain to
-	guarantee that the condition you are checking remains valid
-	for both the check and the actions taken as a result of the
-	check.</para>
-    </sect2>
-
-    <sect2>
-      <title>Locking Conditions and Results</title>
-
-      <para>If you need a lock to check the state of a variable so
-	that you can take an action based on the state you read, you
-	can not just hold the lock while reading the variable and then
-	drop the lock before you act on the value you read.  Once you
-	drop the lock, the variable can change rendering your decision
-	invalid. Thus, you must hold the lock both while reading the
-	variable and while performing the action as a result of the
-	test.</para>
-    </sect2>
-  </sect1>
-
-  <sect1>
-    <title>General Architecture and Design</title>
-
-    <sect2>
-      <title>Interrupt Handling</title>
-
-      <para>Following the pattern of several other multi-threaded Unix
-	kernels, FreeBSD deals with interrupt handlers by giving them
-	their own thread context.  Providing a context for interrupt
-	handlers allows them to block on locks.  To help avoid
-	latency, however, interrupt threads run at real-time kernel
-	priority. Thus, interrupt handlers should not execute for very
-	long to avoid starving other kernel threads.  In addition,
-	since multiple handlers may share an interrupt thread,
-	interrupt handlers should not sleep or use a sleepable lock to
-	avoid starving another interrupt handler.</para>
-
-      <para>The interrupt threads currently in FreeBSD are referred to
-	as heavyweight interrupt threads.  They are called this
-	because switching to an interrupt thread involves a full
-	context switch. In the initial implementation, the kernel was
-	not preemptive and thus interrupts that interrupted a kernel
-	thread would have to wait until the kernel thread blocked or
-	returned to userland before they would have an opportunity to
-	run.</para>
-
-      <para>To deal with the latency problems, the kernel in FreeBSD
-	has been made preemptive.  Currently, we only preempt a kernel
-	thread when we release a sleep mutex or when an interrupt
-	comes in.  However, the plan is to make the FreeBSD kernel
-	fully preemptive as described below.</para>
-
-      <para>Not all interrupt handlers execute in a thread context.
-	Instead, some handlers execute directly in primary interrupt
-	context.  These interrupt handlers are currently misnamed
-	<quote>fast</quote> interrupt handlers since the
-	<constant>INTR_FAST</constant> flag used in earlier versions
-	of the kernel is used to mark these handlers.  The only
-	interrupts which currently use these types of interrupt
-	handlers are clock interrupts and serial I/O device
-	interrupts.  Since these handlers do not have their own
-	context, they may not acquire blocking locks and thus may only
-	use spin mutexes.</para>
-
-      <para>Finally, there is one optional optimization that can be
-	added in MD code called lightweight context switches.  Since
-	an interrupt thread executes in a kernel context, it can
-	borrow the vmspace of any process.  Thus, in a lightweight
-	context switch, the switch to the interrupt thread does not
-	switch vmspaces but borrows the vmspace of the interrupted
-	thread.  In order to ensure that the vmspace of the
-	interrupted thread does not disappear out from under us, the
-	interrupted thread is not allowed to execute until the
-	interrupt thread is no longer borrowing its vmspace.  This can
-	happen when the interrupt thread either blocks or finishes.
-	If an interrupt thread blocks, then it will use its own
-	context when it is made runnable again.  Thus, it can release
-	the interrupted thread.</para>
-
-      <para>The cons of this optimization are that they are very
-	machine specific and complex and thus only worth the effort if
-	their is a large performance improvement.  At this point it is
-	probably too early to tell, and in fact, will probably hurt
-	performance as almost all interrupt handlers will immediately
-	block on Giant and require a thread fix-up when they block.
-	Also, an alternative method of interrupt handling has been
-	proposed by Mike Smith that works like so:</para>
-
-      <orderedlist>
-	<listitem>
-	  <para>Each interrupt handler has two parts: a predicate
-	    which runs in primary interrupt context and a handler
-	    which runs in its own thread context.</para>
-	</listitem>
-
-	<listitem>
-	  <para>If an interrupt handler has a predicate, then when an
-	    interrupt is triggered, the predicate is run.  If the
-	    predicate returns true then the interrupt is assumed to be
-	    fully handled and the kernel returns from the interrupt.
-	    If the predicate returns false or there is no predicate,
-	    then the threaded handler is scheduled to run.</para>
-	</listitem>
-      </orderedlist>
-
-      <para>Fitting light weight context switches into this scheme
-	might prove rather complicated.  Since we may want to change
-	to this scheme at some point in the future, it is probably
-	best to defer work on light weight context switches until we
-	have settled on the final interrupt handling architecture and
-	determined how light weight context switches might or might
-	not fit into it.</para>
-    </sect2>
-
-    <sect2>
-      <title>Kernel Preemption and Critical Sections</title>
-
-      <sect3>
-	<title>Kernel Preemption in a Nutshell</title>
-
-	<para>Kernel preemption is fairly simple.  The basic idea is
-	  that a CPU should always be doing the highest priority work
-	  available.  Well, that is the ideal at least.  There are a
-	  couple of cases where the expense of achieving the ideal is
-	  not worth being perfect.</para>
-
-	<para>Implementing full kernel preemption is very
-	  straightforward: when you schedule a thread to be executed
-	  by putting it on a runqueue, you check to see if it's
-	  priority is higher than the currently executing thread.  If
-	  so, you initiate a context switch to that thread.</para>
-
-	<para>While locks can protect most data in the case of a
-	  preemption, not all of the kernel is preemption safe.  For
-	  example, if a thread holding a spin mutex preempted and the
-	  new thread attempts to grab the same spin mutex, the new
-	  thread may spin forever as the interrupted thread may never
-	  get a chance to execute.  Also, some code such as the code
-	  to assign an address space number for a process during
-	  exec() on the Alpha needs to not be preempted as it supports
-	  the actual context switch code.  Preemption is disabled for
-	  these code sections by using a critical section.</para>
-      </sect3>
-
-      <sect3>
-	<title>Critical Sections</title>
-
-	<para>The responsibility of the critical section API is to
-	  prevent context switches inside of a critical section.  With
-	  a fully preemptive kernel, every
-	  <function>setrunqueue</function> of a thread other than the
-	  current thread is a preemption point.  One implementation is
-	  for <function>critical_enter</function> to set a per-thread
-	  flag that is cleared by its counterpart.  If
-	  <function>setrunqueue</function> is called with this flag
-	  set, it does not preempt regardless of the priority of the new
-	  thread relative to the current thread.  However, since
-	  critical sections are used in spin mutexes to prevent
-	  context switches and multiple spin mutexes can be acquired,
-	  the critical section API must support nesting.  For this
-	  reason the current implementation uses a nesting count
-	  instead of a single per-thread flag.</para>
-
-	<para>In order to minimize latency, preemptions inside of a
-	  critical section are deferred rather than dropped.  If a
-	  thread is made runnable that would normally be preempted to
-	  outside of a critical section, then a per-thread flag is set
-	  to indicate that there is a pending preemption.  When the
-	  outermost critical section is exited, the flag is checked.
-	  If the flag is set, then the current thread is preempted to
-	  allow the higher priority thread to run.</para>
-
-	<para>Interrupts pose a problem with regards to spin mutexes.
-	  If a low-level interrupt handler needs a lock, it needs to
-	  not interrupt any code needing that lock to avoid possible
-	  data structure corruption.  Currently, providing this
-	  mechanism is piggybacked onto critical section API by means
-	  of the <function>cpu_critical_enter</function> and
-	  <function>cpu_critical_exit</function> functions.  Currently
-	  this API disables and re-enables interrupts on all of
-	  FreeBSD's current platforms.  This approach may not be
-	  purely optimal, but it is simple to understand and simple to
-	  get right. Theoretically, this second API need only be used
-	  for spin mutexes that are used in primary interrupt context.
-	  However, to make the code simpler, it is used for all spin
-	  mutexes and even all critical sections.  It may be desirable
-	  to split out the MD API from the MI API and only use it in
-	  conjunction with the MI API in the spin mutex
-	  implementation.  If this approach is taken, then the MD API
-	  likely would need a rename to show that it is a separate API
-	  now.</para>
-      </sect3>
-
-      <sect3>
-	<title>Design Tradeoffs</title>
-
-	<para>As mentioned earlier, a couple of trade-offs have been
-	  made to sacrifice cases where perfect preemption may not
-	  always provide the best performance.</para>
-
-	<para>The first trade-off is that the preemption code does not
-	  take other CPUs into account.  Suppose we have a two CPU's A
-	  and B with the priority of A's thread as 4 and the priority
-	  of B's thread as 2.  If CPU B makes a thread with priority 1
-	  runnable, then in theory, we want CPU A to switch to the new
-	  thread so that we will be running the two highest priority
-	  runnable threads.  However, the cost of determining which
-	  CPU to enforce a preemption on as well as actually signaling
-	  that CPU via an IPI along with the synchronization that
-	  would be required would be enormous.  Thus, the current code
-	  would instead force CPU B to switch to the higher priority
-	  thread. Note that this still puts the system in a better
-	  position as CPU B is executing a thread of priority 1 rather
-	  than a thread of priority 2.</para>
-
-	<para>The second trade-off limits immediate kernel preemption
-	  to real-time priority kernel threads.  In the simple case of
-	  preemption defined above, a thread is always preempted
-	  immediately (or as soon as a critical section is exited) if
-	  a higher priority thread is made runnable.  However, many
-	  threads executing in the kernel only execute in a kernel
-	  context for a short time before either blocking or returning
-	  to userland.  Thus, if the kernel preempts these threads to
-	  run another non-realtime kernel thread, the kernel may
-	  switch out the executing thread just before it is about to
-	  sleep or execute.  The cache on the CPU must then adjust to
-	  the new thread.  When the kernel returns to the interrupted
-	  CPU, it must refill all the cache information that was lost.
-	  In addition, two extra context switches are performed that
-	  could be avoided if the kernel deferred the preemption until
-	  the first thread blocked or returned to userland.  Thus, by
-	  default, the preemption code will only preempt immediately
-	  if the higher priority thread is a real-time priority
-	  thread.</para>
-
-	<para>Turning on full kernel preemption for all kernel threads
-	  has value as a debugging aid since it exposes more race
-	  conditions.  It is especially useful on UP systems were many
-	  races are hard to simulate otherwise.  Thus, there will be a
-	  kernel option to enable preemption for all kernel threads
-	  that can be used for debugging purposes.</para>
-      </sect3>
-    </sect2>
-
-    <sect2>
-      <title>Thread Migration</title>
-
-      <para>Simply put, a thread migrates when it moves from one CPU
-	to another.  In a non-preemptive kernel this can only happen
-	at well-defined points such as when calling
-	<function>tsleep</function> or returning to userland.
-	However, in the preemptive kernel, an interrupt can force a
-	preemption and possible migration at any time.  This can have
-	negative affects on per-CPU data since with the exception of
-	<varname>curthread</varname> and <varname>curpcb</varname> the
-	data can change whenever you migrate.  Since you can
-	potentially migrate at any time this renders per-CPU data
-	rather useless. Thus it is desirable to be able to disable
-	migration for sections of code that need per-CPU data to be
-	stable.</para>
-
-      <para>Critical sections currently prevent migration since they
-	do not allow context switches.  However, this may be too strong
-	of a requirement to enforce in some cases since a critical
-	section also effectively blocks interrupt threads on the
-	current processor.  As a result, it may be desirable to
-	provide an API whereby code may indicate that if the current
-	thread is preempted it should not migrate to another
-	CPU.</para>
-
-      <para>One possible implementation is to use a per-thread nesting
-	count <varname>td_pinnest</varname> along with a
-	<varname>td_pincpu</varname> which is updated to the current
-	CPU on each context switch.  Each CPU has its own run queue
-	that holds threads pinned to that CPU.  A thread is pinned
-	when its nesting count is greater than zero and a thread
-	starts off unpinned with a nesting count of zero.  When a
-	thread is put on a runqueue, we check to see if it is pinned.
-	If so, we put it on the per-CPU runqueue, otherwise we put it
-	on the global runqueue.  When
-	<function>choosethread</function> is called to retrieve the
-	next thread, it could either always prefer bound threads to
-	unbound threads or use some sort of bias when comparing
-	priorities.  If the nesting count is only ever written to by
-	the thread itself and is only read by other threads when the
-	owning thread is not executing but while holding the
-	<varname>sched_lock</varname>, then
-	<varname>td_pinnest</varname> will not need any other locks.
-	The <function>migrate_disable</function> function would
-	increment the nesting count and
-	<function>migrate_enable</function> would decrement the
-	nesting count.  Due to the locking requirements specified
-	above, they will only operate on the current thread and thus
-	would not need to handle the case of making a thread
-	migrateable that currently resides on a per-CPU run
-	queue.</para>
-
-      <para>It is still debatable if this API is needed or if the
-	critical section API is sufficient by itself.  Many of the
-	places that need to prevent migration also need to prevent
-	preemption as well, and in those places a critical section
-	must be used regardless.</para>
-    </sect2>
-
-    <sect2>
-      <title>Callouts</title>
-
-      <para>The <function>timeout()</function> kernel facility permits
-	kernel services to register functions for execution as part
-	of the <function>softclock()</function> software interrupt.
-	Events are scheduled based on a desired number of clock
-	ticks, and callbacks to the consumer-provided function
-	will occur at approximately the right time.</para>
-
-      <para>The global list of pending timeout events is protected
-	by a global spin mutex, <varname>callout_lock</varname>;
-	all access to the timeout list must be performed with this
-	mutex held.  When <function>softclock()</function> is
-	woken up, it scans the list of pending timeouts for those
-	that should fire.  In order to avoid lock order reversal,
-	the <function>softclock</function> thread will release the
-	<varname>callout_lock</varname> mutex when invoking the
-	provided <function>timeout()</function> callback function.
-	If the <constant>CALLOUT_MPSAFE</constant> flag was not set
-	during registration, then Giant will be grabbed before
-	invoking the callout, and then released afterwards.  The
-	<varname>callout_lock</varname> mutex will be re-grabbed
-	before proceeding.  The <function>softclock()</function>
-	code is careful to leave the list in a consistent state
-	while releasing the mutex.  If <constant>DIAGNOSTIC</constant>
-	is enabled, then the time taken to execute each function is
-	measured, and a warning generated if it exceeds a
-	threshold.</para>
-    </sect2>
-  </sect1>
-
-  <sect1>
-    <title>Specific Locking Strategies</title>
-
-    <sect2>
-      <title>Credentials</title>
-
-      <para><structname>struct ucred</structname> is the kernel's
-	internal credential structure, and is generally used as the
-	basis for process-driven access control within the kernel.  
-	BSD-derived systems use a <quote>copy-on-write</quote> model for credential 
-	data: multiple references may exist for a credential structure, 
-	and when a change needs to be made, the structure is duplicated,
-	modified, and then the reference replaced.  Due to wide-spread
-	caching of the credential to implement access control on open,
-	this results in substantial memory savings.  With a move to
-	fine-grained SMP, this model also saves substantially on
-	locking operations by requiring that modification only occur
-	on an unshared credential, avoiding the need for explicit   
-	synchronization when consuming a known-shared
-	credential.</para>
-
-      <para>Credential structures with a single reference are
-	considered mutable; shared credential structures must not be  
-	modified or a race condition is risked.  A mutex,
-	<structfield>cr_mtxp</structfield> protects the reference 
-	count of <structname>struct ucred</structname> so as to
-	maintain consistency.  Any use of the structure requires a
-	valid reference for the duration of the use, or the structure
-	may be released out from under the illegitimate
-	consumer.</para>
-
-      <para>The <structname>struct ucred</structname> mutex is a leaf
-	mutex, and for performance reasons, is implemented via a mutex
-	pool.</para>
-
-      <para>Usually, credentials are used in a read-only manner for access
-	control decisions, and in this case <structfield>td_ucred</structfield>
-	is generally preferred because it requires no locking.  When a
-	process' credential is updated the <literal>proc</literal> lock
-	must be held across the check and update operations thus avoid
-	races.  The process credential <structfield>p_ucred</structfield>
-	must be used for check and update operations to prevent
-	time-of-check, time-of-use races.</para>
-
-      <para>If system call invocations will perform access control after
-	an update to the process credential, the value of
-	<structfield>td_ucred</structfield> must also be refreshed to
-	the current process value.  This will prevent use of a stale
-	credential following a change.  The kernel automatically
-	refreshes the <structfield>td_ucred</structfield> pointer in
-	the thread structure from the process
-	<structfield>p_ucred</structfield> whenever a process enters
-	the kernel, permitting use of a fresh credential for kernel
-	access control.</para>
-    </sect2>
-
-    <sect2>
-      <title>File Descriptors and File Descriptor Tables</title>
-
-      <para>Details to follow.</para>
-    </sect2>
-
-    <sect2>
-      <title>Jail Structures</title>
-
-      <para><structname>struct prison</structname> stores
-	administrative details pertinent to the maintenance of jails
-	created using the &man.jail.2; API.  This includes the
-	per-jail hostname, IP address, and related settings.  This
-	structure is reference-counted since pointers to instances of
-	the structure are shared by many credential structures.  A
-	single mutex, <structfield>pr_mtx</structfield> protects read
-	and write access to the reference count and all mutable
-	variables inside the struct jail.  Some variables are set only
-	when the jail is created, and a valid reference to the
-	<structname>struct prison</structname> is sufficient to read
-	these values.  The precise locking of each entry is documented
-	via comments in <filename>sys/jail.h</filename>.</para>
-    </sect2>
-
-    <sect2>
-      <title>MAC Framework</title>
-
-      <para>The TrustedBSD MAC Framework maintains data in a variety
-	of kernel objects, in the form of <structname>struct
-	label</structname>.  In general, labels in kernel objects
-	are protected by the same lock as the remainder of the kernel
-	object.  For example, the <structfield>v_label</structfield>
-	label in <structname>struct vnode</structname> is protected
-	by the vnode lock on the vnode.</para>
-
-      <para>In addition to labels maintained in standard kernel objects,
-	the MAC Framework also maintains a list of registered and
-	active policies.  The policy list is protected by a global
-	mutex (<varname>mac_policy_list_lock</varname>) and a busy
-	count (also protected by the mutex).  Since many access
-	control checks may occur in parallel, entry to the framework
-	for a read-only access to the policy list requires holding the
-	mutex while incrementing (and later decrementing) the busy
-	count.  The mutex need not be held for the duration of the
-	MAC entry operation--some operations, such as label operations
-	on file system objects--are long-lived.  To modify the policy
-	list, such as during policy registration and de-registration,
-	the mutex must be held and the reference count must be zero,
-	to prevent modification of the list while it is in use.</para>
-
-      <para>A condition variable,
-	<varname>mac_policy_list_not_busy</varname>, is available to
-	threads that need to wait for the list to become unbusy, but
-	this condition variable must only be waited on if the caller is
-	holding no other locks, or a lock order violation may be
-	possible.  The busy count, in effect, acts as a form of
-	shared/exclusive lock over access to the framework: the difference
-	is that, unlike with an sx lock, consumers waiting for the list
-	to become unbusy may be starved, rather than permitting lock
-	order problems with regards to the busy count and other locks
-	that may be held on entry to (or inside) the MAC Framework.</para>
-    </sect2>
-
-    <sect2>
-      <title>Modules</title>
-
-      <para>For the module subsystem there exists a single lock that is
-	used to protect the shared data.  This lock is a shared/exclusive
-	(SX) lock and has a good chance of needing to be acquired (shared
-	or exclusively), therefore there are a few macros that have been
-	added to make access to the lock more easy.  These macros can be
-	located in <filename>sys/module.h</filename> and are quite basic
-	in terms of usage.  The main structures protected under this lock
-	are the <structname>module_t</structname> structures (when shared)
-	and the global <structname>modulelist_t</structname> structure,
-	modules.  One should review the related source code in
-	<filename>kern/kern_module.c</filename> to further understand the
-	locking strategy.</para>
-    </sect2>
-
-    <sect2>
-      <title>Newbus Device Tree</title>
-
-      <para>The newbus system will have one sx lock.  Readers will
-	hold a shared (read) lock (&man.sx.slock.9;) and writers will hold
-	an exclusive (write) lock (&man.sx.xlock.9;).  Internal functions 
-	will not do locking at all.  Externally visible ones will lock as 
-	needed.
-	Those items that do not matter if the race is won or lost will
-	not be locked, since they tend to be read all over the place
-	(e.g. &man.device.get.softc.9;).  There will be relatively few
-	changes to the newbus data structures, so a single lock should
-	be sufficient and not impose a performance penalty.</para>
-    </sect2>
-
-    <sect2>
-      <title>Pipes</title>
-
-      <para>...</para>
-    </sect2>
-
-    <sect2>
-      <title>Processes and Threads</title>
-
-      <para>- process hierarchy</para>
-      <para>- proc locks, references</para>
-      <para>- thread-specific copies of proc entries to freeze during system
-	calls, including td_ucred</para>
-      <para>- inter-process operations</para>
-      <para>- process groups and sessions</para>
-    </sect2>
-
-    <sect2>
-      <title>Scheduler</title>
-
-      <para>Lots of references to <varname>sched_lock</varname> and notes
-	pointing at specific primitives and related magic elsewhere in the
-	document.</para>
-    </sect2>
-
-    <sect2>
-      <title>Select and Poll</title>
-
-      <para>The select() and poll() functions permit threads to block
-	waiting on events on file descriptors--most frequently, whether
-	or not the file descriptors are readable or writable.</para>
-
-      <para>...</para>
-    </sect2>
-
-    <sect2>
-      <title>SIGIO</title>
-
-      <para>The SIGIO service permits processes to request the delivery
-	of a SIGIO signal to its process group when the read/write status
-	of specified file descriptors changes.  At most one process or
-	process group is permitted to register for SIGIO from any given
-	kernel object, and that process or group is referred to as
-	the owner.  Each object supporting SIGIO registration contains
-	pointer field that is NULL if the object is not registered, or
-	points to a <structname>struct sigio</structname> describing
-	the registration.  This field is protected by a global mutex,
-	<varname>sigio_lock</varname>.  Callers to SIGIO maintenance
-	functions must pass in this field <quote>by reference</quote> so that local
-	register copies of the field are not made when unprotected by
-	the lock.</para>
-
-      <para>One <structname>struct sigio</structname> is allocated for
-	each registered object associated with any process or process
-	group, and contains back-pointers to the object, owner, signal
-	information, a credential, and the general disposition of the
-	registration.  Each process or progress group contains a list of
-	registered <structname>struct sigio</structname> structures,
-	<structfield>p_sigiolst</structfield> for processes, and
-	<structfield>pg_sigiolst</structfield> for process groups.
-	These lists are protected by the process or process group
-	locks respectively.  Most fields in each <structname>struct
-	sigio</structname> are constant for the duration of the
-	registration, with the exception of the
-	<structfield>sio_pgsigio</structfield> field which links the
-	<structname>struct sigio</structname> into the process or
-	process group list.  Developers implementing new kernel
-	objects supporting SIGIO will, in general, want to avoid
-	holding structure locks while invoking SIGIO supporting
-	functions, such as <function>fsetown()</function>
-	or <function>funsetown()</function> to avoid
-	defining a lock order between structure locks and the global
-	SIGIO lock.  This is generally possible through use of an
-	elevated reference count on the structure, such as reliance
-	on a file descriptor reference to a pipe during a pipe
-	operation.<para>
-    </sect2>
-
-    <sect2>
-      <title>Sysctl</title>
-
-      <para>The <function>sysctl()</function> MIB service is invoked
-	from both within the kernel and from userland applications
-	using a system call.  At least two issues are raised in locking:
-	first, the protection of the structures maintaining the
-	namespace, and second, interactions with kernel variables and
-	functions that are accessed by the sysctl interface.  Since
-	sysctl permits the direct export (and modification) of
-	kernel statistics and configuration parameters, the sysctl
-	mechanism must become aware of appropriate locking semantics
-	for those variables.  Currently, sysctl makes use of a
-	single global sx lock to serialize use of sysctl(); however, it
-	is assumed to operate under Giant and other protections are not
-	provided.  The remainder of this section speculates on locking
-	and semantic changes to sysctl.</para>
-
-      <para>- Need to change the order of operations for sysctl's that
-	update values from read old, copyin and copyout, write new to
-	copyin, lock, read old and write new, unlock, copyout.  Normal
-	sysctl's that just copyout the old value and set a new value
-	that they copyin may still be able to follow the old model.
-	However, it may be cleaner to use the second model for all of
-	the sysctl handlers to avoid lock operations.</para>
-
-      <para>- To allow for the common case, a sysctl could embed a
-	pointer to a mutex in the SYSCTL_FOO macros and in the struct.
-	This would work for most sysctl's.  For values protected by sx
-	locks, spin mutexes, or other locking strategies besides a
-	single sleep mutex, SYSCTL_PROC nodes could be used to get the
-	locking right.</para>
-    </sect2>
-
-    <sect2>
-      <title>Taskqueue</title>
-
-       <para> The taskqueue's interface has two basic locks associated
-	with it in order to protect the related shared data.  The
-	<varname>taskqueue_queues_mutex</varname> is meant to serve as a
-	lock to protect the <varname>taskqueue_queues</varname> TAILQ.
-	The other mutex lock associated with this system is the one in the
-	<structname>struct taskqueue</structname> data structure.  The
-	use of the synchronization primitive here is to protect the
-	integrity of the data in the <structname>struct
-	taskqueue</structname>.  It should be noted that there are no
-	separate macros to assist the user in locking down his/her own work
-	since these locks are most likely not going to be used outside of
-	<filename>kern/subr_taskqueue.c</filename>.</para>
-    </sect2>
-  </sect1>
-
-  <sect1>
-    <title>Implementation Notes</title>
-
-    <sect2>
-      <title>Details of the Mutex Implementation</title>
-
-      <para>- Should we require mutexes to be owned for mtx_destroy()
-	since we can not safely assert that they are unowned by anyone
-	else otherwise?</para>
-
-      <sect3>
-	<title>Spin Mutexes</title>
-
-	<para>- Use a critical section...</para>
-      </sect3>
-
-      <sect3>
-	<title>Sleep Mutexes</title>
-
-	<para>- Describe the races with contested mutexes</para>
-
-	<para>- Why it is safe to read mtx_lock of a contested mutex
-	  when holding sched_lock.</para>
-
-	<para>- Priority propagation</para>
-      </sect3>
-    </sect2>
-
-    <sect2>
-      <title>Witness</title>
-
-      <para>- What does it do</para>
-
-      <para>- How does it work</para>
-    </sect2>
-  </sect1>
-
-  <sect1>
-    <title>Miscellaneous Topics</title>
-
-    <sect2>
-      <title>Interrupt Source and ICU Abstractions</title>
-
-      <para>- struct isrc</para>
-
-      <para>- pic drivers</para>
-    </sect2>
-
-    <sect2>
-      <title>Other Random Questions/Topics</title>
-
-      <para>Should we pass an interlock into
-	<function>sema_wait</function>?</para>
-
-      <para>- Generic turnstiles for sleep mutexes and sx locks.</para>
-
-      <para>- Should we have non-sleepable sx locks?</para>
-    </sect2>
-  </sect1>
-
-  <glossary id="glossary">
-    <title>Glossary</title>
-
-    <glossentry id="atomic">
-      <glossterm>atomic</glossterm>
-      <glossdef>
-	<para>An operation is atomic if all of its effects are visible
-	  to other CPUs together when the proper access protocol is
-	  followed.  In the degenerate case are atomic instructions
-	  provided directly by machine architectures.  At a higher
-	  level, if several members of a structure are protected by a
-	  lock, then a set of operations are atomic if they are all
-	  performed while holding the lock without releasing the lock
-	  in between any of the operations.</para>
-
-	<glossseealso>operation</glossseealso>
-      </glossdef>
-    </glossentry>
-
-    <glossentry id="block">
-      <glossterm>block</glossterm>
-      <glossdef>
-	<para>A thread is blocked when it is waiting on a lock,
-	  resource, or condition.  Unfortunately this term is a bit
-	  overloaded as a result.</para>
-
-	<glossseealso>sleep</glossseealso>
-      </glossdef>
-    </glossentry>
-
-    <glossentry id="critical-section">
-      <glossterm>critical section</glossterm>
-      <glossdef>
-	<para>A section of code that is not allowed to be preempted.
-	  A critical section is entered and exited using the
-	  &man.critical.enter.9; API.</para>
-      </glossdef>
-    </glossentry>
-
-    <glossentry id="MD">
-      <glossterm>MD</glossterm>
-      <glossdef>
-	<para>Machine dependent.</para>
-
-	<glossseealso>MI</glossseealso>
-      </glossdef>
-    </glossentry>
-
-    <glossentry id="memory-operation">
-      <glossterm>memory operation</glossterm>
-      <glossdef>
-	<para>A memory operation reads and/or writes to a memory
-	  location.</para>
-      </glossdef>
-    </glossentry>
-
-    <glossentry id="MI">
-      <glossterm>MI</glossterm>
-      <glossdef>
-	<para>Machine independent.</para>
-
-	<glossseealso>MD</glossseealso>
-      </glossdef>
-    </glossentry>
-
-    <glossentry id="operation">
-      <glossterm>operation</glossterm>
-      <glosssee>memory operation</glosssee>
-    </glossentry>
-
-    <glossentry id="primary-interrupt-context">
-      <glossterm>primary interrupt context</glossterm>
-      <glossdef>
-	<para>Primary interrupt context refers to the code that runs
-	  when an interrupt occurs.  This code can either run an
-	  interrupt handler directly or schedule an asynchronous
-	  interrupt thread to execute the interrupt handlers for a
-	  given interrupt source.</para>
-      </glossdef>
-    </glossentry>
-
-    <glossentry>
-      <glossterm>realtime kernel thread</glossterm>
-      <glossdef>
-	<para>A high priority kernel thread.  Currently, the only
-	  realtime priority kernel threads are interrupt threads.</para>
-
-	<glossseealso>thread</glossseealso>
-      </glossdef>
-    </glossentry>
-
-    <glossentry id="sleep">
-      <glossterm>sleep</glossterm>
-      <glossdef>
-	<para>A thread is asleep when it is blocked on a condition
-	  variable or a sleep queue via <function>msleep</function> or
-	  <function>tsleep</function>.</para>
-
-	<glossseealso>block</glossseealso>
-      </glossdef>
-    </glossentry>
-
-    <glossentry id="sleepable-lock">
-      <glossterm>sleepable lock</glossterm>
-      <glossdef>
-	<para>A sleepable lock is a lock that can be held by a thread
-	  which is asleep.  Lockmgr locks and sx locks are currently
-	  the only sleepable locks in FreeBSD.  Eventually, some sx
-	  locks such as the allproc and proctree locks may become
-	  non-sleepable locks.</para>
-
-	<glossseealso>sleep</glossseealso>
-      </glossdef>
-    </glossentry>
-
-    <glossentry id="thread">
-      <glossterm>thread</glossterm>
-      <glossdef>
-	<para>A kernel thread represented by a struct thread.  Threads own
-	  locks and hold a single execution context.</para>
-      </glossdef>
-    </glossentry>
-  </glossary>
-</article>