| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add an implementation of inotify_init(), inotify_add_watch(),
inotify_rm_watch(), source-compatible with Linux. This provides
functionality similar to kevent(2)'s EVFILT_VNODE, i.e., it lets
applications monitor filesystem files for accesses. Compared to
inotify, however, EVFILT_VNODE has the limitation of requiring the
application to open the file to be monitored. This means that activity
on a newly created file cannot be monitored reliably, and that a file
descriptor per file in the hierarchy is required.
inotify on the other hand allows a directory and its entries to be
monitored at once. It introduces a new file descriptor type to which
"watches" can be attached; a watch is a pseudo-file descriptor
associated with a file or directory and a set of events to watch for.
When a watched vnode is accessed, a description of the event is queued
to the inotify descriptor, readable with read(2). Events for files in a
watched directory include the file name.
A watched vnode has its usecount bumped, so name cache entries
originating from a watched directory are not evicted. Name cache
entries are used to populate inotify events for files with a link in a
watched directory. In particular, if a file is accessed with, say,
read(2), an IN_ACCESS event will be generated for any watched hard link
of the file.
The inotify_add_watch_at() variant is included so that this
functionality is available in capability mode; plain inotify_add_watch()
is disallowed in capability mode.
When a file in a nullfs mount is watched, the watch is attached to the
lower vnode, such that accesses via either layer generate inotify
events.
Many thanks to Gleb Popov for testing this patch and finding lots of
bugs.
PR: 258010, 215011
Reviewed by: kib
Tested by: arrowd
MFC after: 3 months
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D50315
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Suppose a process has its cwd pointing to a nullfs directory, where the
lower directory is also visible in the jail's filesystem namespace.
Suppose that the lower directory vnode is moved out from under the
nullfs mount. The nullfs vnode still shadows the lower vnode, and
dotdot lookups relative to that directory will instantiate new nullfs
vnodes outside of the nullfs mountpoint, effectively shadowing the lower
filesystem.
This phenomenon can be abused to escape a chroot, since the nullfs
vnodes instantiated by these dotdot lookups defeat the root vnode check
in vfs_lookup(), which uses vnode pointer equality to test for the
process root.
Fix this by extending nullfs and unionfs to perform the same check,
exploiting the fact that the passed componentname is embedded in a
nameidata structure to avoid changing the VOP_LOOKUP interface. That
is, add a flag to indicate that containerof can be used to get the full
nameidata structure, and perform the root vnode check on the lower vnode
when performing a dotdot lookup.
PR: 262180
Reviewed by: olce, kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D50418
|
|
|
|
|
|
|
| |
Reviewed by: olce
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D50390
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Flags should not propagate from the lower fs. Behavior for the upper fs
is determined by flags from its mount point structure. When lower fs
acts according to its mount configuration, it is reported up as VOP
errors.
PR: 283425
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D48150
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of casting a vop_F_args object to vop_generic_args, use a
vop_F_args.a_gen field when calling null_bypass(). This way we don't
hardcode the vop_generic_args data type in the callers of null_bypass().
Before this change, there were 3 null_bypass() calls using
a vop_F_args.a_gen field and 5 null_bypass() calls using a cast to
vop_generic_args. This change makes all null_bypass() calls consistent
and easier to maintain.
Pointed out by: jrtc27
Reviewed by: kib, oshogbo
Accepted by: oshogbo (mentor)
Differential Revision: https://reviews.freebsd.org/D37359
|
|
|
|
|
|
| |
MFC after: 3 days
Reviewed by: allanjude, kib
Differential Revision: https://reviews.freebsd.org/D44773
|
| |
|
|
|
|
|
|
|
| |
to allow overwrite global default if needed.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
|
|
|
|
|
| |
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
|
|
|
|
|
| |
Differential revision: https://reviews.freebsd.org/D44217
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
|
| |
There must be no callers of VOP_COPY_FILE_RANGE() except
vn_copy_file_range(), which does enough to find the write-vnodes where
to call the VOP.
Reviewed by: markj, Olivier Certner <olce.freebsd@certner.fr>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D42603
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is similar to VOP_GETWRITEMOUNT(), and for given vnode vp should
return the lower vnode which would actually handle write to vp.
Flags allow to specify FREAD or FWRITE for benefit of possible unionfs
implementation.
Reviewed by: markj, Olivier Certner <olce.freebsd@certner.fr>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D42603
|
|
|
|
|
|
|
|
| |
Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.
Sponsored by: Netflix
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The MNT_IGNORE flag can be used to mark certain filesystem mounts so
that utilities such as df(1) and mount(8) can filter out those mounts by
default. This can be used, for instance, to reduce the noise from
running container workloads inside jails which often have at least three
and sometimes as many as ten mounts per container.
The flag is supplied by the nmount(2) system call and is recorded so
that it can be reported by statfs(2). Unfortunately several filesystems
override the default behaviour and mask out the flag, defeating its
purpose. This change preserves the MNT_IGNORE flag for those filesystems
so that it can be reported correctly.
MFC after: 1 week
|
|
|
|
| |
Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/
|
|
|
|
| |
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
|
|
|
|
|
| |
Reported by: clang 15
Sponsored by: Rubicon Communications, LLC ("Netgate")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To quote from a comment above vput_final:
<quote>
* XXX Some filesystems pass in an exclusively locked vnode and strongly depend
* on the lock being held all the way until VOP_INACTIVE. This in particular
* happens with UFS which adds half-constructed vnodes to the hash, where they
* can be found by other code.
</quote>
As is there is no mechanism which allows filesystems to denote that a
vnode is fully initialized, consequently problems like the above are
only found the hard way(tm).
Add rudimentary support for state transitions, which in particular allow
to assert the vnode is not legally unlocked until its fate is decided
(either construction finishes or vgone is called to abort it).
The new field lands in a 1-byte hole, thus it does not grow the struct.
Bump __FreeBSD_version to 1400077
Reviewed by: kib (previous version)
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D37759
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The main use-case for this is to support mounting config files and
secrets into OCI containers. My current workaround copies the files into
the container which is messy and risks secrets leaking into container
images if the cleanup fails.
This adds a VFCF flag to indicate whether the filesystem supports file
mounts and allows fspath to be either a directory or a file if the flag
is set.
Test Plan:
$ sudo mkdir -p /mnt
$ sudo touch /mnt/foo
$ sudo mount -t nullfs /COPYRIGHT /mnt/foo
Reviewed by: mjg, kib
Tested by: pho
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the lower filesystem directory hierarchy is the same as the nullfs
mount point (admittedly not likely to be a useful situation in
practice), nullfs is subject to the exact deadlock between the busy
count drain and the covered vnode lock that VV_CROSSLOCK is intended
to address.
Reviewed by: kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D37458
|
|
|
|
|
|
| |
- s/examing/examining/
MFC after: 3 days
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
null_nodeget() needs a valid mount point data, otherwise we might
race and dereference NULL.
Using MBF_NOWAIT makes non-forced unmount non-transparent for
vn_fullpath() over nullfs, but we make no guarantee that fullpath
calculation succeeds anyway.
Reported and tested by: pho
Reviewed by: jah
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D35477
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Use the hash lock to serialize instead.
This enables shared-locked ".." lookups.
Reviewed by: markj
Tested by: pho (previous version)
Differential Revision: https://reviews.freebsd.org/D34466
|
|
|
|
|
|
|
|
|
|
| |
Also remove once-used functions to clean up after failed insmntque1(),
which were destructor callbacks in previous life.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D34071
|
|
|
|
|
|
|
|
|
|
|
|
| |
I was somehow convinced that insmntque calls insmntque1 with a NULL
destructor. Unfortunately this worked well enough to not immediately
blow up in simple testing.
Keep not using the destructor in previously patched filesystems though
as it avoids unnecessary casts.
Noted by: kib
Reported by: pho
|
| |
|
|
|
|
| |
It adds nothing of value over insmntque.
|
|
|
|
|
|
| |
See b4a58fbf640409a1 ("vfs: remove cn_thread")
Bump __FreeBSD_version to 1400043.
|
|
|
|
|
|
|
|
|
|
|
| |
fdvp and fvp vnodes are not locked, and race with reclaim cannot be handled
by the generic bypass routine.
Reported and tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31310
|
|
|
|
|
|
|
| |
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31310
|
|
|
|
|
|
|
|
|
|
|
|
| |
Caller of VOP_LOOKUP() passes dvp locked and expect it locked on return.
Relock of lower vnode in any case could leave upper vnode reclaimed and
unlocked.
Reported and tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31310
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The upper vnode reference to the lower vnode is the only reference that
keeps our pointer to the lower vnode alive. If lower vnode is relocked
during the VOP call, upper vnode might become unlocked and reclaimed,
which invalidates our reference.
Add a transient vhold around VOP call.
Reported and tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31310
|
|
|
|
|
|
|
|
|
|
|
|
| |
The advlock VOP takes the vnode unlocked, which makes the normal bypass
function racy. Same as null_pgcache_read(), nullfs implementation needs
to take interlock and reference lower vnode under it.
Reported and tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31310
|
|
|
|
|
|
|
| |
Reivewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31310
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In certain emergency cases such as media failure or removal, UFS will
initiate a forced unmount in order to prevent dirty buffers from
accumulating against the no-longer-usable filesystem. The presence
of a stacked filesystem such as nullfs or unionfs above the UFS mount
will prevent this forced unmount from succeeding.
This change addreses the situation by allowing stacked filesystems to
be recursively unmounted on a taskqueue thread when the MNT_RECURSE
flag is specified to dounmount(). This call will block until all upper
mounts have been removed unless the caller specifies the MNT_DEFERRED
flag to indicate the base filesystem should also be unmounted from the
taskqueue.
To achieve this, the recently-added vfs_pin_from_vp()/vfs_unpin() KPIs
have been combined with the existing 'mnt_uppers' list used by nullfs
and renamed to vfs_register_upper_from_vp()/vfs_unregister_upper().
The format of the mnt_uppers list has also been changed to accommodate
filesystems such as unionfs in which a given mount may be stacked atop
more than one lower mount. Additionally, management of lower FS
reclaim/unlink notifications has been split into a separate list
managed by a separate set of KPIs, as registration of an upper FS no
longer implies interest in these notifications.
Reviewed by: kib, mckusick
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D31016
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is aimed at preventing stacked filesystems like nullfs and unionfs
from "losing" their lower mounts due to forced unmount. Otherwise,
VFS operations that are passed through to the lower filesystem(s) may
crash or otherwise cause unpredictable behavior.
Introduce two new functions: vfs_pin_from_vp() and vfs_unpin().
which are intended to be called on the lower mount(s) when the stacked
filesystem is mounted and unmounted, respectively.
Much as registration in the mnt_uppers list previously did, pinning
will prevent even forced unmount of the lower FS and will allow the
stacked FS to freely operate on the lower mount either by direct
use of the struct mount* or indirect use through a properly-referenced
vnode's v_mount field.
vfs_pin_from_vp() is modeled after vfs_ref_from_vp() in that it uses
the mount interlock coupled with re-checking vp->v_mount to ensure
that it will fail in the face of a pending unmount request, even if
the concurrent unmount fully completes.
Adopt these new functions in both nullfs and unionfs.
Reviewed By: kib, markj
Differential Revision: https://reviews.freebsd.org/D30401
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of requiring all implementations of vfs_quotactl to unbusy
the mount for Q_QUOTAON and Q_QUOTAOFF, add an "mp_busy" in/out param
to VFS_QUOTACTL(9). The implementation may then indicate to the caller
whether it needed to unbusy the mount.
Also, add stbool.h to libprocstat modules which #define _KERNEL
before including sys/mount.h. Otherwise they'll pull in sys/types.h
before defining _KERNEL and therefore won't have the bool definition
they need for mp_busy.
Reviewed By: kib, markj
Differential Revision: https://reviews.freebsd.org/D30556
|
|
|
|
|
|
|
| |
Parts of libprocstat like to pretend they're kernel components for the
sake of including mount.h, and including sys/types.h in the _KERNEL
case doesn't fix the build for some reason. Revert both the
VFS_QUOTACTL() change and the follow-up "fix" for now.
|
|
|
|
|
|
|
|
|
|
| |
Instead of requiring all implementations of vfs_quotactl to unbusy
the mount for Q_QUOTAON and Q_QUOTAOFF, add an "mp_busy" in/out param
to VFS_QUOTACTL(9). The implementation may then indicate to the caller
whether it needed to unbusy the mount.
Reviewed By: kib, markj
Differential Revision: https://reviews.freebsd.org/D30218
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise pages are cleaned some time later when the lower fs decides
that it is time to do it. This mostly manifests itself as delayed
mtime update, e.g. breaking make-like programs.
Reported by: mav
Tested by: mav, pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The VFS conventions is that VOP_LOOKUP() methods do not need to handle
ISDOTDOT lookups for VV_ROOT vnodes (since they cannot, after all). Nullfs
bypasses VOP_LOOKUP() to lower filesystem, and there, due to user actions,
it is possible to get into situation where
- upper vnode does not have VV_ROOT set
- lower vnode is root
- ISDOTDOT is requested
User just needs to nullfs-mount non-root of some filesystem, and then move
some directory under mount, out of mount, using lower filesystem.
In this case, nullfs cannot do much, but we still should and can ensure
internal kernel structures are consistent. Avoid ISDOTDOT lookup forwarding
when VV_ROOT is set on lower dvp, return somewhat arbitrary ENOENT.
PR: 253593
Reported by: Gregor Koscak <elogin41@gmail.com>
Test by: Patrick Sullivan <sulli00777@gmail.com>
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We might own the last use reference, and then vrele() at the end would
need to take the dvp vnode lock to inactivate, which causes deadlock
with vp. We cannot vrele() dvp from start since this might unlock ldvp.
Handle it by holding the vnode and dropping use ref after lowerfs
VOP_VPUT_PAIR() ended. This effectivaly requires unlock of the vp vnode
after VOP_VPUT_PAIR(), so the call is changed to set unlock_vp to true
unconditionally. This opens more opportunities for vp to be reclaimed,
if lvp is still alive we reinstantiate vp with null_nodeget().
Reported and tested by: pho
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D29178
|
|
|
|
|
|
|
|
|
| |
Generic bypass cannot understand the rules of liveness for the VOP.
Reviewed by: chs, mckusick
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
|
|
|
|
|
| |
Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D27793
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Normal bypass expects locked vnode, which is not true for
VOP_READ_PGCACHE(). Ensure liveness of the lower vnode by taking the
upper vnode interlock, which is also taked by null_reclaim() when
setting v_data to NULL.
Reported and tested by: pho
Reviewed by: markj, mjg
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D27327
Notes:
svn path=/head/; revision=368077
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
instead of mount_nullfs(8).
Obviously you'd need to force mount(8) to not call
mount_nullfs(8) to make use of it.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26934
Notes:
svn path=/head/; revision=367137
|
|
|
|
| |
Notes:
svn path=/head/; revision=366869
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If lower VOP relocked the lower vnode, it is possible that nullfs
vnode was reclaimed meantime. In this case nullfs vnode no longer
shares lock with lower vnode, which breaks locking protocol.
Check for the condition and acquire nullfs vnode lock if detected.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Notes:
svn path=/head/; revision=366849
|
|
|
|
| |
Notes:
svn path=/head/; revision=365070
|