aboutsummaryrefslogtreecommitdiff
path: root/sys/fs/nullfs
Commit message (Collapse)AuthorAgeFilesLines
* vfs: Initial revision of inotifyMark Johnston2025-07-042-1/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add an implementation of inotify_init(), inotify_add_watch(), inotify_rm_watch(), source-compatible with Linux. This provides functionality similar to kevent(2)'s EVFILT_VNODE, i.e., it lets applications monitor filesystem files for accesses. Compared to inotify, however, EVFILT_VNODE has the limitation of requiring the application to open the file to be monitored. This means that activity on a newly created file cannot be monitored reliably, and that a file descriptor per file in the hierarchy is required. inotify on the other hand allows a directory and its entries to be monitored at once. It introduces a new file descriptor type to which "watches" can be attached; a watch is a pseudo-file descriptor associated with a file or directory and a set of events to watch for. When a watched vnode is accessed, a description of the event is queued to the inotify descriptor, readable with read(2). Events for files in a watched directory include the file name. A watched vnode has its usecount bumped, so name cache entries originating from a watched directory are not evicted. Name cache entries are used to populate inotify events for files with a link in a watched directory. In particular, if a file is accessed with, say, read(2), an IN_ACCESS event will be generated for any watched hard link of the file. The inotify_add_watch_at() variant is included so that this functionality is available in capability mode; plain inotify_add_watch() is disallowed in capability mode. When a file in a nullfs mount is watched, the watch is attached to the lower vnode, such that accesses via either layer generate inotify events. Many thanks to Gleb Popov for testing this patch and finding lots of bugs. PR: 258010, 215011 Reviewed by: kib Tested by: arrowd MFC after: 3 months Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D50315
* namei: Make stackable filesystems check harder for jail rootsMark Johnston2025-05-231-10/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | Suppose a process has its cwd pointing to a nullfs directory, where the lower directory is also visible in the jail's filesystem namespace. Suppose that the lower directory vnode is moved out from under the nullfs mount. The nullfs vnode still shadows the lower vnode, and dotdot lookups relative to that directory will instantiate new nullfs vnodes outside of the nullfs mountpoint, effectively shadowing the lower filesystem. This phenomenon can be abused to escape a chroot, since the nullfs vnodes instantiated by these dotdot lookups defeat the root vnode check in vfs_lookup(), which uses vnode pointer equality to test for the process root. Fix this by extending nullfs and unionfs to perform the same check, exploiting the fact that the passed componentname is embedded in a nameidata structure to avoid changing the VOP_LOOKUP interface. That is, add a flag to indicate that containerof can be used to get the full nameidata structure, and perform the root vnode check on the lower vnode when performing a dotdot lookup. PR: 262180 Reviewed by: olce, kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D50418
* nullfs lookup: cn_flags is 64bitKonstantin Belousov2025-05-181-3/+3
| | | | | | | Reviewed by: olce Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D50390
* nullfs: stop lying about mount flags in statfs(2)Konstantin Belousov2024-12-201-5/+0
| | | | | | | | | | | | | Flags should not propagate from the lower fs. Behavior for the upper fs is determined by flags from its mount point structure. When lower fs acts according to its mount configuration, it is reported up as VOP errors. PR: 283425 Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D48150
* nullfs: Use an a_gen field to cast to vop_generic_argsKonrad Witaszczyk2024-07-261-5/+5
| | | | | | | | | | | | | | | | Instead of casting a vop_F_args object to vop_generic_args, use a vop_F_args.a_gen field when calling null_bypass(). This way we don't hardcode the vop_generic_args data type in the callers of null_bypass(). Before this change, there were 3 null_bypass() calls using a vop_F_args.a_gen field and 5 null_bypass() calls using a cast to vop_generic_args. This change makes all null_bypass() calls consistent and easier to maintain. Pointed out by: jrtc27 Reviewed by: kib, oshogbo Accepted by: oshogbo (mentor) Differential Revision: https://reviews.freebsd.org/D37359
* nullfs: Show correct exported flag.Dag-Erling Smørgrav2024-04-131-3/+4
| | | | | | MFC after: 3 days Reviewed by: allanjude, kib Differential Revision: https://reviews.freebsd.org/D44773
* nullfs_mount(): fix whitespaceKonstantin Belousov2024-03-081-1/+1
|
* nullfs: add -o cacheKonstantin Belousov2024-03-081-6/+9
| | | | | | | to allow overwrite global default if needed. Sponsored by: The FreeBSD Foundation MFC after: 1 week
* nullfs_mount(): remove unneeded castKonstantin Belousov2024-03-081-2/+2
| | | | | Sponsored by: The FreeBSD Foundation MFC after: 1 week
* nullfs: Add the vfs.nullfs.cache_nodes sysctl to control nocache defaultSeigo Tanimura2024-03-071-1/+10
| | | | | Differential revision: https://reviews.freebsd.org/D44217 MFC after: 1 week
* nullfs: do not allow bypass on copy_file_range()Konstantin Belousov2023-11-281-0/+1
| | | | | | | | | | | There must be no callers of VOP_COPY_FILE_RANGE() except vn_copy_file_range(), which does enough to find the write-vnodes where to call the VOP. Reviewed by: markj, Olivier Certner <olce.freebsd@certner.fr> Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D42603
* VFS: add VOP_GETLOWVNODE()Konstantin Belousov2023-11-281-0/+18
| | | | | | | | | | | | It is similar to VOP_GETWRITEMOUNT(), and for given vnode vp should return the lower vnode which would actually handle write to vp. Flags allow to specify FREAD or FWRITE for benefit of possible unionfs implementation. Reviewed by: markj, Olivier Certner <olce.freebsd@certner.fr> Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D42603
* sys: Remove ancient SCCS tags.Warner Losh2023-11-274-12/+0
| | | | | | | | Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script. Sponsored by: Netflix
* Fix MNT_IGNORE for devfs, fdescfs and nullfsDoug Rabson2023-08-261-1/+1
| | | | | | | | | | | | | | | | The MNT_IGNORE flag can be used to mark certain filesystem mounts so that utilities such as df(1) and mount(8) can filter out those mounts by default. This can be used, for instance, to reduce the noise from running container workloads inside jails which often have at least three and sometimes as many as ten mounts per container. The flag is supplied by the nmount(2) system call and is recorded so that it can be reported by statfs(2). Unfortunately several filesystems override the default behaviour and mask out the flag, defeating its purpose. This change preserves the MNT_IGNORE flag for those filesystems so that it can be reported correctly. MFC after: 1 week
* sys: Remove $FreeBSD$: one-line .h patternWarner Losh2023-08-161-1/+0
| | | | Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/
* sys: Remove $FreeBSD$: two-line .h patternWarner Losh2023-08-163-6/+0
| | | | Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
* nullfs: ansifyMateusz Guzik2023-02-071-36/+10
| | | | | Reported by: clang 15 Sponsored by: Rubicon Communications, LLC ("Netgate")
* vfs: add the concept of vnode state transitionsMateusz Guzik2022-12-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | To quote from a comment above vput_final: <quote> * XXX Some filesystems pass in an exclusively locked vnode and strongly depend * on the lock being held all the way until VOP_INACTIVE. This in particular * happens with UFS which adds half-constructed vnodes to the hash, where they * can be found by other code. </quote> As is there is no mechanism which allows filesystems to denote that a vnode is fully initialized, consequently problems like the above are only found the hard way(tm). Add rudimentary support for state transitions, which in particular allow to assert the vnode is not legally unlocked until its fate is decided (either construction finishes or vgone is called to abort it). The new field lands in a 1-byte hole, thus it does not grow the struct. Bump __FreeBSD_version to 1400077 Reviewed by: kib (previous version) Tested by: pho Differential Revision: https://reviews.freebsd.org/D37759
* Add support for mounting single files in nullfsDoug Rabson2022-12-191-1/+12
| | | | | | | | | | | | | | | | | | | The main use-case for this is to support mounting config files and secrets into OCI containers. My current workaround copies the files into the container which is messy and risks secrets leaking into container images if the cleanup fails. This adds a VFCF flag to indicate whether the filesystem supports file mounts and allows fspath to be either a directory or a file if the flag is set. Test Plan: $ sudo mkdir -p /mnt $ sudo touch /mnt/foo $ sudo mount -t nullfs /COPYRIGHT /mnt/foo Reviewed by: mjg, kib Tested by: pho
* nullfs: adopt VV_CROSSLOCKJason A. Harmening2022-12-111-0/+11
| | | | | | | | | | | | When the lower filesystem directory hierarchy is the same as the nullfs mount point (admittedly not likely to be a useful situation in practice), nullfs is subject to the exact deadlock between the busy count drain and the covered vnode lock that VV_CROSSLOCK is intended to address. Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D37458
* nullfs(5): Fix a typo in a source code commentGordon Bergling2022-08-071-1/+1
| | | | | | - s/examing/examining/ MFC after: 3 days
* null_vptocnp(): busy nullfs mp instead of refing itKonstantin Belousov2022-06-141-5/+7
| | | | | | | | | | | | | | | null_nodeget() needs a valid mount point data, otherwise we might race and dereference NULL. Using MBF_NOWAIT makes non-forced unmount non-transparent for vn_fullpath() over nullfs, but we make no guarantee that fullpath calculation succeeds anyway. Reported and tested by: pho Reviewed by: jah Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35477
* vfs: NDFREE(&nd, NDF_ONLY_PNBUF) -> NDFREE_PNBUF(&nd)Mateusz Guzik2022-03-241-1/+1
|
* nullfs: hash insertion without vnode lock upgradeMateusz Guzik2022-03-192-63/+53
| | | | | | | | | | Use the hash lock to serialize instead. This enables shared-locked ".." lookups. Reviewed by: markj Tested by: pho (previous version) Differential Revision: https://reviews.freebsd.org/D34466
* insmntque1(): remove useless argumentsKonstantin Belousov2022-01-311-1/+1
| | | | | | | | | | Also remove once-used functions to clean up after failed insmntque1(), which were destructor callbacks in previous life. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D34071
* Revert b58ca5df0bb7 ("vfs: remove the now unused insmntque1")Mateusz Guzik2022-01-271-2/+2
| | | | | | | | | | | | I was somehow convinced that insmntque calls insmntque1 with a NULL destructor. Unfortunately this worked well enough to not immediately blow up in simple testing. Keep not using the destructor in previously patched filesystems though as it avoids unnecessary casts. Noted by: kib Reported by: pho
* nullfs: ansify fs/nullfs/null_subr.cMateusz Guzik2022-01-271-20/+7
|
* nullfs: stop using insmntque1Mateusz Guzik2022-01-261-11/+6
| | | | It adds nothing of value over insmntque.
* vfs: remove the unused thread argument from NDINIT*Mateusz Guzik2021-11-251-1/+1
| | | | | | See b4a58fbf640409a1 ("vfs: remove cn_thread") Bump __FreeBSD_version to 1400043.
* nullfs: provide custom null_rename bypassKonstantin Belousov2021-07-271-11/+57
| | | | | | | | | | | fdvp and fvp vnodes are not locked, and race with reclaim cannot be handled by the generic bypass routine. Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31310
* null_rename: some styleKonstantin Belousov2021-07-271-5/+7
| | | | | | | Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31310
* null_lookup: restore dvp lock always, not only on successKonstantin Belousov2021-07-271-5/+6
| | | | | | | | | | | | Caller of VOP_LOOKUP() passes dvp locked and expect it locked on return. Relock of lower vnode in any case could leave upper vnode reclaimed and unlocked. Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31310
* null_bypass(): prevent loosing the only reference to the lower vnodeKonstantin Belousov2021-07-271-5/+20
| | | | | | | | | | | | | | | The upper vnode reference to the lower vnode is the only reference that keeps our pointer to the lower vnode alive. If lower vnode is relocked during the VOP call, upper vnode might become unlocked and reclaimed, which invalidates our reference. Add a transient vhold around VOP call. Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31310
* nullfs: provide custom null_advlock bypassKonstantin Belousov2021-07-271-0/+23
| | | | | | | | | | | | The advlock VOP takes the vnode unlocked, which makes the normal bypass function racy. Same as null_pgcache_read(), nullfs implementation needs to take interlock and reference lower vnode under it. Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31310
* null_bypass(): some styleKonstantin Belousov2021-07-271-14/+16
| | | | | | | Reivewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31310
* Allow stacked filesystems to be recursively unmountedJason A. Harmening2021-07-242-19/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | In certain emergency cases such as media failure or removal, UFS will initiate a forced unmount in order to prevent dirty buffers from accumulating against the no-longer-usable filesystem. The presence of a stacked filesystem such as nullfs or unionfs above the UFS mount will prevent this forced unmount from succeeding. This change addreses the situation by allowing stacked filesystems to be recursively unmounted on a taskqueue thread when the MNT_RECURSE flag is specified to dounmount(). This call will block until all upper mounts have been removed unless the caller specifies the MNT_DEFERRED flag to indicate the base filesystem should also be unmounted from the taskqueue. To achieve this, the recently-added vfs_pin_from_vp()/vfs_unpin() KPIs have been combined with the existing 'mnt_uppers' list used by nullfs and renamed to vfs_register_upper_from_vp()/vfs_unregister_upper(). The format of the mnt_uppers list has also been changed to accommodate filesystems such as unionfs in which a given mount may be stacked atop more than one lower mount. Additionally, management of lower FS reclaim/unlink notifications has been split into a separate list managed by a separate set of KPIs, as registration of an upper FS no longer implies interest in these notifications. Reviewed by: kib, mckusick Tested by: pho Differential Revision: https://reviews.freebsd.org/D31016
* Add a generic mechanism for preventing forced unmountJason A. Harmening2021-06-061-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | This is aimed at preventing stacked filesystems like nullfs and unionfs from "losing" their lower mounts due to forced unmount. Otherwise, VFS operations that are passed through to the lower filesystem(s) may crash or otherwise cause unpredictable behavior. Introduce two new functions: vfs_pin_from_vp() and vfs_unpin(). which are intended to be called on the lower mount(s) when the stacked filesystem is mounted and unmounted, respectively. Much as registration in the mnt_uppers list previously did, pinning will prevent even forced unmount of the lower FS and will allow the stacked FS to freely operate on the lower mount either by direct use of the struct mount* or indirect use through a properly-referenced vnode's v_mount field. vfs_pin_from_vp() is modeled after vfs_ref_from_vp() in that it uses the mount interlock coupled with re-checking vp->v_mount to ensure that it will fail in the face of a pending unmount request, even if the concurrent unmount fully completes. Adopt these new functions in both nullfs and unionfs. Reviewed By: kib, markj Differential Revision: https://reviews.freebsd.org/D30401
* VFS_QUOTACTL(9): allow implementation to indicate busy state changesJason A. Harmening2021-05-301-2/+28
| | | | | | | | | | | | | | | Instead of requiring all implementations of vfs_quotactl to unbusy the mount for Q_QUOTAON and Q_QUOTAOFF, add an "mp_busy" in/out param to VFS_QUOTACTL(9). The implementation may then indicate to the caller whether it needed to unbusy the mount. Also, add stbool.h to libprocstat modules which #define _KERNEL before including sys/mount.h. Otherwise they'll pull in sys/types.h before defining _KERNEL and therefore won't have the bool definition they need for mp_busy. Reviewed By: kib, markj Differential Revision: https://reviews.freebsd.org/D30556
* Revert commits 6d3e78ad6c11 and 54256e7954d7Jason A. Harmening2021-05-301-28/+2
| | | | | | | Parts of libprocstat like to pretend they're kernel components for the sake of including mount.h, and including sys/types.h in the _KERNEL case doesn't fix the build for some reason. Revert both the VFS_QUOTACTL() change and the follow-up "fix" for now.
* VFS_QUOTACTL(9): allow implementation to indicate busy state changesJason A. Harmening2021-05-291-2/+28
| | | | | | | | | | Instead of requiring all implementations of vfs_quotactl to unbusy the mount for Q_QUOTAON and Q_QUOTAOFF, add an "mp_busy" in/out param to VFS_QUOTACTL(9). The implementation may then indicate to the caller whether it needed to unbusy the mount. Reviewed By: kib, markj Differential Revision: https://reviews.freebsd.org/D30218
* nullfs: dirty v_object must imply the need for inactivationKonstantin Belousov2021-05-221-1/+1
| | | | | | | | | | | Otherwise pages are cleaned some time later when the lower fs decides that it is time to do it. This mostly manifests itself as delayed mtime update, e.g. breaking make-like programs. Reported by: mav Tested by: mav, pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
* nullfs: protect against user creating inconsistent stateKonstantin Belousov2021-04-021-4/+15
| | | | | | | | | | | | | | | | | | | | | | | The VFS conventions is that VOP_LOOKUP() methods do not need to handle ISDOTDOT lookups for VV_ROOT vnodes (since they cannot, after all). Nullfs bypasses VOP_LOOKUP() to lower filesystem, and there, due to user actions, it is possible to get into situation where - upper vnode does not have VV_ROOT set - lower vnode is root - ISDOTDOT is requested User just needs to nullfs-mount non-root of some filesystem, and then move some directory under mount, out of mount, using lower filesystem. In this case, nullfs cannot do much, but we still should and can ensure internal kernel structures are consistent. Avoid ISDOTDOT lookup forwarding when VV_ROOT is set on lower dvp, return somewhat arbitrary ENOENT. PR: 253593 Reported by: Gregor Koscak <elogin41@gmail.com> Test by: Patrick Sullivan <sulli00777@gmail.com> Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
* null_vput_pair(): release use reference on dvp earlierKonstantin Belousov2021-03-121-14/+31
| | | | | | | | | | | | | | | | | | We might own the last use reference, and then vrele() at the end would need to take the dvp vnode lock to inactivate, which causes deadlock with vp. We cannot vrele() dvp from start since this might unlock ldvp. Handle it by holding the vnode and dropping use ref after lowerfs VOP_VPUT_PAIR() ended. This effectivaly requires unlock of the vp vnode after VOP_VPUT_PAIR(), so the call is changed to set unlock_vp to true unconditionally. This opens more opportunities for vp to be reclaimed, if lvp is still alive we reinstantiate vp with null_nodeget(). Reported and tested by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178
* nullfs: provide special bypass for VOP_VPUT_PAIRKonstantin Belousov2021-02-121-0/+49
| | | | | | | | | Generic bypass cannot understand the rules of liveness for the VOP. Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* vfs: add v_irflag accessorsMateusz Guzik2021-01-032-10/+6
| | | | | Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D27793
* nullfs: provide custom bypass for VOP_READ_PGCACHE().Konstantin Belousov2020-11-261-0/+23
| | | | | | | | | | | | | | | Normal bypass expects locked vnode, which is not true for VOP_READ_PGCACHE(). Ensure liveness of the lower vnode by taking the upper vnode interlock, which is also taked by null_reclaim() when setting v_data to NULL. Reported and tested by: pho Reviewed by: markj, mjg Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27327 Notes: svn path=/head/; revision=368077
* Make it possible to mount nullfs(5) using plain mount(8)Edward Tomasz Napierala2020-10-291-1/+3
| | | | | | | | | | | | | | | instead of mount_nullfs(8). Obviously you'd need to force mount(8) to not call mount_nullfs(8) to make use of it. Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D26934 Notes: svn path=/head/; revision=367137
* vfs: drop spurious cred argument from VOP_VPTOCNPMateusz Guzik2020-10-201-2/+1
| | | | Notes: svn path=/head/; revision=366869
* nullfs: ensure correct lock is taken after bypass.Konstantin Belousov2020-10-191-0/+18
| | | | | | | | | | | | | | | If lower VOP relocked the lower vnode, it is possible that nullfs vnode was reclaimed meantime. In this case nullfs vnode no longer shares lock with lower vnode, which breaks locking protocol. Check for the condition and acquire nullfs vnode lock if detected. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=366849
* fs: clean up empty lines in .c and .h filesMateusz Guzik2020-09-011-1/+0
| | | | Notes: svn path=/head/; revision=365070