aboutsummaryrefslogtreecommitdiff
path: root/share/man/man9
diff options
context:
space:
mode:
Diffstat (limited to 'share/man/man9')
-rw-r--r--share/man/man9/VFS.93
-rw-r--r--share/man/man9/atomic.954
-rw-r--r--share/man/man9/buf.997
-rw-r--r--share/man/man9/bus_alloc_resource.924
-rw-r--r--share/man/man9/bus_attach_children.92
-rw-r--r--share/man/man9/callout.94
-rw-r--r--share/man/man9/insmntque.96
-rw-r--r--share/man/man9/make_dev.912
8 files changed, 140 insertions, 62 deletions
diff --git a/share/man/man9/VFS.9 b/share/man/man9/VFS.9
index a1d0a19bec13..6ea6570bbf6e 100644
--- a/share/man/man9/VFS.9
+++ b/share/man/man9/VFS.9
@@ -26,7 +26,7 @@
.\" (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
.\" THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.\"
-.Dd February 9, 2010
+.Dd November 3, 2025
.Dt VFS 9
.Os
.Sh NAME
@@ -42,6 +42,7 @@ function from
rather than implementing empty functions or casting to
.Fa eopnotsupp .
.Sh SEE ALSO
+.Xr dtrace_vfs 4 ,
.Xr VFS_CHECKEXP 9 ,
.Xr VFS_FHTOVP 9 ,
.Xr VFS_MOUNT 9 ,
diff --git a/share/man/man9/atomic.9 b/share/man/man9/atomic.9
index c9133c6311a5..b027a0ff0bca 100644
--- a/share/man/man9/atomic.9
+++ b/share/man/man9/atomic.9
@@ -182,35 +182,42 @@ This variant is the default.
The second variant has acquire semantics, and the third variant has release
semantics.
.Pp
-When an atomic operation has acquire semantics, the operation must have
+An atomic operation can only have
+.Em acquire
+semantics if it performs a load
+from memory.
+When an atomic operation has acquire semantics, a load performed as
+part of the operation must have
completed before any subsequent load or store (by program order) is
performed.
Conversely, acquire semantics do not require that prior loads or stores have
-completed before the atomic operation is performed.
-An atomic operation can only have acquire semantics if it performs a load
-from memory.
+completed before a load from the atomic operation is performed.
To denote acquire semantics, the suffix
.Dq Li _acq
is inserted into the function name immediately prior to the
.Dq Li _ Ns Aq Fa type
suffix.
-For example, to subtract two integers ensuring that the subtraction is
+For example, to subtract two integers ensuring that the load of
+the value from memory is
completed before any subsequent loads and stores are performed, use
.Fn atomic_subtract_acq_int .
.Pp
+An atomic operation can only have
+.Em release
+semantics if it performs a store to memory.
When an atomic operation has release semantics, all prior loads or stores
-(by program order) must have completed before the operation is performed.
-Conversely, release semantics do not require that the atomic operation must
+(by program order) must have completed before a store executed as part of
+the operation that is performed.
+Conversely, release semantics do not require that a store from the atomic
+operation must
have completed before any subsequent load or store is performed.
-An atomic operation can only have release semantics if it performs a store
-to memory.
To denote release semantics, the suffix
.Dq Li _rel
is inserted into the function name immediately prior to the
.Dq Li _ Ns Aq Fa type
suffix.
For example, to add two long integers ensuring that all prior loads and
-stores are completed before the addition is performed, use
+stores are completed before the store of the result is performed, use
.Fn atomic_add_rel_long .
.Pp
When a release operation by one thread
@@ -235,6 +242,33 @@ section.
However, they will not prevent the compiler or processor from moving loads
or stores into the critical section, which does not violate the semantics of
a mutex.
+.Ss Architecture-dependent caveats for compare-and-swap
+The
+.Fn atomic_[f]cmpset_<type>
+operations, specifically those without explicitly specified memory
+ordering, are defined as relaxed.
+Consequently, a thread's accesses to memory locations different from
+that of the atomic operation can be reordered in relation to the
+atomic operation.
+.Pp
+However, the implementation on the
+.Sy amd64
+and
+.Sy i386
+architectures provide sequentially consistent semantics.
+In particular, the reordering mentioned above cannot occur.
+.Pp
+On the
+.Sy arm64/aarch64
+architecture, the operation may include either acquire
+semantics on the constituent load or release semantics
+on the constituent store.
+This means that accesses to other locations in program order
+before the atomic, might be observed as executed after the load
+that is the part of the atomic operation (but not after the store
+from the operation due to release).
+Similarly, accesses after the atomic might be observed as executed
+before the store.
.Ss Thread Fence Operations
Alternatively, a programmer can use atomic thread fence operations to
constrain the reordering of accesses.
diff --git a/share/man/man9/buf.9 b/share/man/man9/buf.9
index ecd4a1487735..ff9a1d0d46e0 100644
--- a/share/man/man9/buf.9
+++ b/share/man/man9/buf.9
@@ -36,44 +36,70 @@ The kernel implements a KVM abstraction of the buffer cache which allows it
to map potentially disparate vm_page's into contiguous KVM for use by
(mainly file system) devices and device I/O.
This abstraction supports
-block sizes from DEV_BSIZE (usually 512) to upwards of several pages or more.
+block sizes from
+.Dv DEV_BSIZE
+(usually 512) to upwards of several pages or more.
It also supports a relatively primitive byte-granular valid range and dirty
range currently hardcoded for use by NFS.
The code implementing the
VM Buffer abstraction is mostly concentrated in
-.Pa /usr/src/sys/kern/vfs_bio.c .
+.Pa sys/kern/vfs_bio.c
+in the
+.Fx
+source tree.
.Pp
One of the most important things to remember when dealing with buffer pointers
-(struct buf) is that the underlying pages are mapped directly from the buffer
+.Pq Vt struct buf
+is that the underlying pages are mapped directly from the buffer
cache.
No data copying occurs in the scheme proper, though some file systems
such as UFS do have to copy a little when dealing with file fragments.
The second most important thing to remember is that due to the underlying page
-mapping, the b_data base pointer in a buf is always *page* aligned, not
-*block* aligned.
-When you have a VM buffer representing some b_offset and
-b_size, the actual start of the buffer is (b_data + (b_offset & PAGE_MASK))
-and not just b_data.
+mapping, the
+.Va b_data
+base pointer in a buf is always
+.Em page Ns -aligned ,
+not
+.Em block Ns -aligned .
+When you have a VM buffer representing some
+.Va b_offset
+and
+.Va b_size ,
+the actual start of the buffer is
+.Ql b_data + (b_offset & PAGE_MASK)
+and not just
+.Ql b_data .
Finally, the VM system's core buffer cache supports
-valid and dirty bits (m->valid, m->dirty) for pages in DEV_BSIZE chunks.
+valid and dirty bits
+.Pq Va m->valid , m->dirty
+for pages in
+.Dv DEV_BSIZE
+chunks.
Thus
a platform with a hardware page size of 4096 bytes has 8 valid and 8 dirty
bits.
These bits are generally set and cleared in groups based on the device
block size of the device backing the page.
Complete page's worth are often
-referred to using the VM_PAGE_BITS_ALL bitmask (i.e., 0xFF if the hardware page
+referred to using the
+.Dv VM_PAGE_BITS_ALL
+bitmask (i.e., 0xFF if the hardware page
size is 4096).
.Pp
VM buffers also keep track of a byte-granular dirty range and valid range.
This feature is normally only used by the NFS subsystem.
I am not sure why it
-is used at all, actually, since we have DEV_BSIZE valid/dirty granularity
+is used at all, actually, since we have
+.Dv DEV_BSIZE
+valid/dirty granularity
within the VM buffer.
-If a buffer dirty operation creates a 'hole',
+If a buffer dirty operation creates a
+.Dq hole ,
the dirty range will extend to cover the hole.
If a buffer validation
-operation creates a 'hole' the byte-granular valid range is left alone and
+operation creates a
+.Dq hole
+the byte-granular valid range is left alone and
will not take into account the new extension.
Thus the whole byte-granular
abstraction is considered a bad hack and it would be nice if we could get rid
@@ -81,16 +107,24 @@ of it completely.
.Pp
A VM buffer is capable of mapping the underlying VM cache pages into KVM in
order to allow the kernel to directly manipulate the data associated with
-the (vnode,b_offset,b_size).
+the
+.Pq Va vnode , b_offset , b_size .
The kernel typically unmaps VM buffers the moment
-they are no longer needed but often keeps the 'struct buf' structure
-instantiated and even bp->b_pages array instantiated despite having unmapped
+they are no longer needed but often keeps the
+.Vt struct buf
+structure
+instantiated and even
+.Va bp->b_pages
+array instantiated despite having unmapped
them from KVM.
If a page making up a VM buffer is about to undergo I/O, the
-system typically unmaps it from KVM and replaces the page in the b_pages[]
+system typically unmaps it from KVM and replaces the page in the
+.Va b_pages[]
array with a place-marker called bogus_page.
The place-marker forces any kernel
-subsystems referencing the associated struct buf to re-lookup the associated
+subsystems referencing the associated
+.Vt struct buf
+to re-lookup the associated
page.
I believe the place-marker hack is used to allow sophisticated devices
such as file system devices to remap underlying pages in order to deal with,
@@ -107,18 +141,29 @@ you wind up with pages marked clean that are actually still dirty.
If not
treated carefully, these pages could be thrown away!
Indeed, a number of
-serious bugs related to this hack were not fixed until the 2.2.8/3.0 release.
-The kernel uses an instantiated VM buffer (i.e., struct buf) to place-mark pages
+serious bugs related to this hack were not fixed until the
+.Fx 2.2.8 /
+.Fx 3.0
+release.
+The kernel uses an instantiated VM buffer (i.e.,
+.Vt struct buf )
+to place-mark pages
in this special state.
-The buffer is typically flagged B_DELWRI.
+The buffer is typically flagged
+.Dv B_DELWRI .
When a
-device no longer needs a buffer it typically flags it as B_RELBUF.
+device no longer needs a buffer it typically flags it as
+.Dv B_RELBUF .
Due to
-the underlying pages being marked clean, the B_DELWRI|B_RELBUF combination must
+the underlying pages being marked clean, the
+.Ql B_DELWRI|B_RELBUF
+combination must
be interpreted to mean that the buffer is still actually dirty and must be
written to its backing store before it can actually be released.
In the case
-where B_DELWRI is not set, the underlying dirty pages are still properly
+where
+.Dv B_DELWRI
+is not set, the underlying dirty pages are still properly
marked as dirty and the buffer can be completely freed without losing that
clean/dirty state information.
(XXX do we have to check other flags in
@@ -128,7 +173,9 @@ The kernel reserves a portion of its KVM space to hold VM Buffer's data
maps.
Even though this is virtual space (since the buffers are mapped
from the buffer cache), we cannot make it arbitrarily large because
-instantiated VM Buffers (struct buf's) prevent their underlying pages in the
+instantiated VM Buffers
+.Pq Vt struct buf Ap s
+prevent their underlying pages in the
buffer cache from being freed.
This can complicate the life of the paging
system.
diff --git a/share/man/man9/bus_alloc_resource.9 b/share/man/man9/bus_alloc_resource.9
index 84a4c9c530c9..5d309229a34e 100644
--- a/share/man/man9/bus_alloc_resource.9
+++ b/share/man/man9/bus_alloc_resource.9
@@ -26,7 +26,7 @@
.\" (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
.\" THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.\"
-.Dd May 20, 2016
+.Dd October 30, 2025
.Dt BUS_ALLOC_RESOURCE 9
.Os
.Sh NAME
@@ -43,14 +43,14 @@
.In machine/resource.h
.Ft struct resource *
.Fo bus_alloc_resource
-.Fa "device_t dev" "int type" "int *rid" "rman_res_t start" "rman_res_t end"
+.Fa "device_t dev" "int type" "int rid" "rman_res_t start" "rman_res_t end"
.Fa "rman_res_t count" "u_int flags"
.Fc
.Ft struct resource *
-.Fn bus_alloc_resource_any "device_t dev" "int type" "int *rid" "u_int flags"
+.Fn bus_alloc_resource_any "device_t dev" "int type" "int rid" "u_int flags"
.Ft struct resource *
.Fo bus_alloc_resource_anywhere
-.Fa "device_t dev" "int type" "int *rid" "rman_res_t count" "u_int flags"
+.Fa "device_t dev" "int type" "int rid" "rman_res_t count" "u_int flags"
.Fc
.Sh DESCRIPTION
This is an easy interface to the resource-management functions.
@@ -106,15 +106,13 @@ for I/O memory
.El
.It
.Fa rid
-points to a bus specific handle that identifies the resource being allocated.
+is a bus specific handle that identifies the resource being allocated.
For ISA this is an index into an array of resources that have been setup
for this device by either the PnP mechanism, or via the hints mechanism.
For PCCARD, this is an index into the array of resources described by the PC Card's
CIS entry.
For PCI, the offset into PCI config space which has the BAR to use to access
the resource.
-The bus methods are free to change the RIDs that they are given as a parameter.
-You must not depend on the value you gave it earlier.
.It
.Fa start
and
@@ -175,20 +173,12 @@ A pointer to
is returned on success, a null pointer otherwise.
.Sh EXAMPLES
This is some example code that allocates a 32 byte I/O port range and an IRQ.
-The values of
-.Va portid
-and
-.Va irqid
-should be saved in the softc of the device after these calls.
.Bd -literal
struct resource *portres, *irqres;
- int portid, irqid;
- portid = 0;
- irqid = 0;
- portres = bus_alloc_resource(dev, SYS_RES_IOPORT, &portid,
+ portres = bus_alloc_resource(dev, SYS_RES_IOPORT, 0,
0ul, ~0ul, 32, RF_ACTIVE);
- irqres = bus_alloc_resource_any(dev, SYS_RES_IRQ, &irqid,
+ irqres = bus_alloc_resource_any(dev, SYS_RES_IRQ, 0,
RF_ACTIVE | RF_SHAREABLE);
.Ed
.Sh SEE ALSO
diff --git a/share/man/man9/bus_attach_children.9 b/share/man/man9/bus_attach_children.9
index 5e3ca4c5e906..81a24a428d8e 100644
--- a/share/man/man9/bus_attach_children.9
+++ b/share/man/man9/bus_attach_children.9
@@ -105,7 +105,7 @@ Detached devices are not deleted.
.Pp
.Fn bus_detach_children
is typically called at the start of a bus driver's
-.Xr DEVICE_ATTACH 9
+.Xr DEVICE_DETACH 9
method to give child devices a chance to veto the detach request.
It is usually paired with a later call to
.Fn device_delete_children 9
diff --git a/share/man/man9/callout.9 b/share/man/man9/callout.9
index 0e59ef8ab2b1..637049ec1ef5 100644
--- a/share/man/man9/callout.9
+++ b/share/man/man9/callout.9
@@ -27,7 +27,7 @@
.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
.\" POSSIBILITY OF SUCH DAMAGE.
.\"
-.Dd January 22, 2024
+.Dd November 4, 2025
.Dt CALLOUT 9
.Os
.Sh NAME
@@ -789,6 +789,8 @@ and
functions return a value of one if the callout was still pending when it was
called, a zero if the callout could not be stopped and a negative one is it
was either not running or has already completed.
+.Sh SEE ALSO
+.Xr dtrace_callout_execute 4
.Sh HISTORY
.Fx
initially used the long standing
diff --git a/share/man/man9/insmntque.9 b/share/man/man9/insmntque.9
index 869d8767632b..33ba697b10b9 100644
--- a/share/man/man9/insmntque.9
+++ b/share/man/man9/insmntque.9
@@ -24,7 +24,7 @@
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
.\" DAMAGE.
.\"
-.Dd January 29, 2022
+.Dd October 24, 2025
.Dt INSMNTQUE 9
.Os
.Sh NAME
@@ -56,7 +56,7 @@ The vnode must be exclusively locked.
.Pp
On failure,
.Fn insmntque
-resets vnode' operation vector to the vector of
+resets vnode's operations vector to the vector of
.Xr deadfs 9 ,
clears
.Va v_data ,
@@ -71,7 +71,7 @@ failure is needed, the
function may be used instead.
It does not do any cleanup following a failure, leaving all
the work to the caller.
-In particular, the operation vector
+In particular, the operations vector
.Va v_op
and
.Va v_data
diff --git a/share/man/man9/make_dev.9 b/share/man/man9/make_dev.9
index de56f350faa5..9f2c36fb39a4 100644
--- a/share/man/man9/make_dev.9
+++ b/share/man/man9/make_dev.9
@@ -25,7 +25,7 @@
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
-.Dd January 19, 2025
+.Dd November 4, 2025
.Dt MAKE_DEV 9
.Os
.Sh NAME
@@ -387,14 +387,18 @@ function is the same as:
destroy_dev_sched_cb(cdev, NULL, NULL);
.Ed
.Pp
-The
+Neither the
.Fn d_close
-driver method cannot call
+driver method, nor a
+.Xr devfs_cdevpriv 9
+.Fa dtr
+method can
.Fn destroy_dev
directly.
Doing so causes deadlock when
.Fn destroy_dev
-waits for all threads to leave the driver methods.
+waits for all threads to leave the driver methods and finish executing any
+per-open destructors.
Also, because
.Fn destroy_dev
sleeps, no non-sleepable locks may be held over the call.