aboutsummaryrefslogtreecommitdiff
path: root/sys/fs/nullfs/null_vnops.c
Commit message (Collapse)AuthorAgeFilesLines
* vfs: Initial revision of inotifyMark Johnston2025-07-041-1/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add an implementation of inotify_init(), inotify_add_watch(), inotify_rm_watch(), source-compatible with Linux. This provides functionality similar to kevent(2)'s EVFILT_VNODE, i.e., it lets applications monitor filesystem files for accesses. Compared to inotify, however, EVFILT_VNODE has the limitation of requiring the application to open the file to be monitored. This means that activity on a newly created file cannot be monitored reliably, and that a file descriptor per file in the hierarchy is required. inotify on the other hand allows a directory and its entries to be monitored at once. It introduces a new file descriptor type to which "watches" can be attached; a watch is a pseudo-file descriptor associated with a file or directory and a set of events to watch for. When a watched vnode is accessed, a description of the event is queued to the inotify descriptor, readable with read(2). Events for files in a watched directory include the file name. A watched vnode has its usecount bumped, so name cache entries originating from a watched directory are not evicted. Name cache entries are used to populate inotify events for files with a link in a watched directory. In particular, if a file is accessed with, say, read(2), an IN_ACCESS event will be generated for any watched hard link of the file. The inotify_add_watch_at() variant is included so that this functionality is available in capability mode; plain inotify_add_watch() is disallowed in capability mode. When a file in a nullfs mount is watched, the watch is attached to the lower vnode, such that accesses via either layer generate inotify events. Many thanks to Gleb Popov for testing this patch and finding lots of bugs. PR: 258010, 215011 Reviewed by: kib Tested by: arrowd MFC after: 3 months Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D50315
* namei: Make stackable filesystems check harder for jail rootsMark Johnston2025-05-231-10/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | Suppose a process has its cwd pointing to a nullfs directory, where the lower directory is also visible in the jail's filesystem namespace. Suppose that the lower directory vnode is moved out from under the nullfs mount. The nullfs vnode still shadows the lower vnode, and dotdot lookups relative to that directory will instantiate new nullfs vnodes outside of the nullfs mountpoint, effectively shadowing the lower filesystem. This phenomenon can be abused to escape a chroot, since the nullfs vnodes instantiated by these dotdot lookups defeat the root vnode check in vfs_lookup(), which uses vnode pointer equality to test for the process root. Fix this by extending nullfs and unionfs to perform the same check, exploiting the fact that the passed componentname is embedded in a nameidata structure to avoid changing the VOP_LOOKUP interface. That is, add a flag to indicate that containerof can be used to get the full nameidata structure, and perform the root vnode check on the lower vnode when performing a dotdot lookup. PR: 262180 Reviewed by: olce, kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D50418
* nullfs lookup: cn_flags is 64bitKonstantin Belousov2025-05-181-3/+3
| | | | | | | Reviewed by: olce Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D50390
* nullfs: Use an a_gen field to cast to vop_generic_argsKonrad Witaszczyk2024-07-261-5/+5
| | | | | | | | | | | | | | | | Instead of casting a vop_F_args object to vop_generic_args, use a vop_F_args.a_gen field when calling null_bypass(). This way we don't hardcode the vop_generic_args data type in the callers of null_bypass(). Before this change, there were 3 null_bypass() calls using a vop_F_args.a_gen field and 5 null_bypass() calls using a cast to vop_generic_args. This change makes all null_bypass() calls consistent and easier to maintain. Pointed out by: jrtc27 Reviewed by: kib, oshogbo Accepted by: oshogbo (mentor) Differential Revision: https://reviews.freebsd.org/D37359
* nullfs: do not allow bypass on copy_file_range()Konstantin Belousov2023-11-281-0/+1
| | | | | | | | | | | There must be no callers of VOP_COPY_FILE_RANGE() except vn_copy_file_range(), which does enough to find the write-vnodes where to call the VOP. Reviewed by: markj, Olivier Certner <olce.freebsd@certner.fr> Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D42603
* VFS: add VOP_GETLOWVNODE()Konstantin Belousov2023-11-281-0/+18
| | | | | | | | | | | | It is similar to VOP_GETWRITEMOUNT(), and for given vnode vp should return the lower vnode which would actually handle write to vp. Flags allow to specify FREAD or FWRITE for benefit of possible unionfs implementation. Reviewed by: markj, Olivier Certner <olce.freebsd@certner.fr> Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D42603
* sys: Remove ancient SCCS tags.Warner Losh2023-11-271-4/+0
| | | | | | | | Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script. Sponsored by: Netflix
* sys: Remove $FreeBSD$: two-line .h patternWarner Losh2023-08-161-2/+0
| | | | Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
* nullfs(5): Fix a typo in a source code commentGordon Bergling2022-08-071-1/+1
| | | | | | - s/examing/examining/ MFC after: 3 days
* null_vptocnp(): busy nullfs mp instead of refing itKonstantin Belousov2022-06-141-5/+7
| | | | | | | | | | | | | | | null_nodeget() needs a valid mount point data, otherwise we might race and dereference NULL. Using MBF_NOWAIT makes non-forced unmount non-transparent for vn_fullpath() over nullfs, but we make no guarantee that fullpath calculation succeeds anyway. Reported and tested by: pho Reviewed by: jah Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35477
* nullfs: provide custom null_rename bypassKonstantin Belousov2021-07-271-11/+57
| | | | | | | | | | | fdvp and fvp vnodes are not locked, and race with reclaim cannot be handled by the generic bypass routine. Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31310
* null_rename: some styleKonstantin Belousov2021-07-271-5/+7
| | | | | | | Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31310
* null_lookup: restore dvp lock always, not only on successKonstantin Belousov2021-07-271-5/+6
| | | | | | | | | | | | Caller of VOP_LOOKUP() passes dvp locked and expect it locked on return. Relock of lower vnode in any case could leave upper vnode reclaimed and unlocked. Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31310
* null_bypass(): prevent loosing the only reference to the lower vnodeKonstantin Belousov2021-07-271-5/+20
| | | | | | | | | | | | | | | The upper vnode reference to the lower vnode is the only reference that keeps our pointer to the lower vnode alive. If lower vnode is relocked during the VOP call, upper vnode might become unlocked and reclaimed, which invalidates our reference. Add a transient vhold around VOP call. Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31310
* nullfs: provide custom null_advlock bypassKonstantin Belousov2021-07-271-0/+23
| | | | | | | | | | | | The advlock VOP takes the vnode unlocked, which makes the normal bypass function racy. Same as null_pgcache_read(), nullfs implementation needs to take interlock and reference lower vnode under it. Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31310
* null_bypass(): some styleKonstantin Belousov2021-07-271-14/+16
| | | | | | | Reivewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31310
* nullfs: dirty v_object must imply the need for inactivationKonstantin Belousov2021-05-221-1/+1
| | | | | | | | | | | Otherwise pages are cleaned some time later when the lower fs decides that it is time to do it. This mostly manifests itself as delayed mtime update, e.g. breaking make-like programs. Reported by: mav Tested by: mav, pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
* nullfs: protect against user creating inconsistent stateKonstantin Belousov2021-04-021-4/+15
| | | | | | | | | | | | | | | | | | | | | | | The VFS conventions is that VOP_LOOKUP() methods do not need to handle ISDOTDOT lookups for VV_ROOT vnodes (since they cannot, after all). Nullfs bypasses VOP_LOOKUP() to lower filesystem, and there, due to user actions, it is possible to get into situation where - upper vnode does not have VV_ROOT set - lower vnode is root - ISDOTDOT is requested User just needs to nullfs-mount non-root of some filesystem, and then move some directory under mount, out of mount, using lower filesystem. In this case, nullfs cannot do much, but we still should and can ensure internal kernel structures are consistent. Avoid ISDOTDOT lookup forwarding when VV_ROOT is set on lower dvp, return somewhat arbitrary ENOENT. PR: 253593 Reported by: Gregor Koscak <elogin41@gmail.com> Test by: Patrick Sullivan <sulli00777@gmail.com> Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
* null_vput_pair(): release use reference on dvp earlierKonstantin Belousov2021-03-121-14/+31
| | | | | | | | | | | | | | | | | | We might own the last use reference, and then vrele() at the end would need to take the dvp vnode lock to inactivate, which causes deadlock with vp. We cannot vrele() dvp from start since this might unlock ldvp. Handle it by holding the vnode and dropping use ref after lowerfs VOP_VPUT_PAIR() ended. This effectivaly requires unlock of the vp vnode after VOP_VPUT_PAIR(), so the call is changed to set unlock_vp to true unconditionally. This opens more opportunities for vp to be reclaimed, if lvp is still alive we reinstantiate vp with null_nodeget(). Reported and tested by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178
* nullfs: provide special bypass for VOP_VPUT_PAIRKonstantin Belousov2021-02-121-0/+49
| | | | | | | | | Generic bypass cannot understand the rules of liveness for the VOP. Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* vfs: add v_irflag accessorsMateusz Guzik2021-01-031-5/+3
| | | | | Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D27793
* nullfs: provide custom bypass for VOP_READ_PGCACHE().Konstantin Belousov2020-11-261-0/+23
| | | | | | | | | | | | | | | Normal bypass expects locked vnode, which is not true for VOP_READ_PGCACHE(). Ensure liveness of the lower vnode by taking the upper vnode interlock, which is also taked by null_reclaim() when setting v_data to NULL. Reported and tested by: pho Reviewed by: markj, mjg Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27327 Notes: svn path=/head/; revision=368077
* vfs: drop spurious cred argument from VOP_VPTOCNPMateusz Guzik2020-10-201-2/+1
| | | | Notes: svn path=/head/; revision=366869
* nullfs: ensure correct lock is taken after bypass.Konstantin Belousov2020-10-191-0/+18
| | | | | | | | | | | | | | | If lower VOP relocked the lower vnode, it is possible that nullfs vnode was reclaimed meantime. In this case nullfs vnode no longer shares lock with lower vnode, which breaks locking protocol. Check for the condition and acquire nullfs vnode lock if detected. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=366849
* fs: clean up empty lines in .c and .h filesMateusz Guzik2020-09-011-1/+0
| | | | Notes: svn path=/head/; revision=365070
* VMIO reads: enable for nullfs upper vnode if the lower vnode supports it.Konstantin Belousov2020-08-161-1/+10
| | | | | | | | | | Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D25968 Notes: svn path=/head/; revision=364288
* nullfs: add missing VOP_STAT handlingMateusz Guzik2020-08-101-1/+15
| | | | | | | Tested by: pho Notes: svn path=/head/; revision=364063
* vfs: remove the never set VDESC_VPP_WILLRELE flagMateusz Guzik2020-02-021-3/+0
| | | | Notes: svn path=/head/; revision=357403
* vfs: remove the never set VDESC_NOMAP_VPP flagMateusz Guzik2020-01-301-3/+1
| | | | Notes: svn path=/head/; revision=357287
* vfs: drop the mostly unused flags argument from VOP_UNLOCKMateusz Guzik2020-01-031-5/+5
| | | | | | | | | | | Filesystems which want to use it in limited capacity can employ the VOP_UNLOCK_FLAGS macro. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D21427 Notes: svn path=/head/; revision=356337
* vfs: flatten vop vectorsMateusz Guzik2019-12-161-0/+1
| | | | | | | | | | | | | | | This eliminates the following loop from all VOP calls: while(vop != NULL && \ vop->vop_spare2 == NULL && vop->vop_bypass == NULL) vop = vop->vop_default; Reviewed by: jeff Tesetd by: pho Differential Revision: https://reviews.freebsd.org/D22738 Notes: svn path=/head/; revision=355790
* vfs: introduce v_irflag and make v_type smallerMateusz Guzik2019-12-081-1/+1
| | | | | | | | | | | | | | | | | | The current vnode layout is not smp-friendly by having frequently read data avoidably sharing cachelines with very frequently modified fields. In particular v_iflag inspected for VI_DOOMED can be found in the same line with v_usecount. Instead make it available in the same cacheline as the v_op, v_data and v_type which all get read all the time. v_type is avoidably 4 bytes while the necessary data will easily fit in 1. Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new flag field with a new value: VIRF_DOOMED. Reviewed by: kib, jeff Differential Revision: https://reviews.freebsd.org/D22715 Notes: svn path=/head/; revision=355537
* nullfs: reduce areas protected by vnode interlock in null_lockMateusz Guzik2019-09-011-9/+10
| | | | | | | | | | | | | Similarly to the other routine stop taking the interlock for the lower vnode. The interlock for nullfs vnode is still taken to ensure stability of ->v_data. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21480 Notes: svn path=/head/; revision=351651
* nullfs: use VOP_NEED_INACTIVEMateusz Guzik2019-08-301-4/+22
| | | | | | | | | Reviewed by: kib Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=351617
* vfs: add VOP_NEED_INACTIVEMateusz Guzik2019-08-281-0/+1
| | | | | | | | | | | | | | | | | | | | | vnode usecount drops to 0 all the time (e.g. for directories during path lookup). When that happens the kernel would always lock the exclusive lock for the vnode in order to call vinactive(). This blocks other threads who want to use the vnode for looukp. vinactive is very rarely needed and can be tested for without the vnode lock held. This patch gives filesytems an opportunity to do it, sample total wait time for tmpfs over 500 minutes of poudriere -j 104: before: 557563641706 (lockmgr:tmpfs) after: 46309603301 (lockmgr:tmpfs) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21371 Notes: svn path=/head/; revision=351584
* nullfs: reduce areas protected by vnode interlockMateusz Guzik2019-08-251-21/+4
| | | | | | | | | | | | | | Some places only take the interlock to hold the vnode, which was a requiremnt before they started being manipulated with atomics. Use the newly introduced vholdnz to bump the count. Reviewed by: kib Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21358 Notes: svn path=/head/; revision=351472
* nullfs: lock the vnode with LK_SHARED in null_vptocnpMateusz Guzik2019-08-211-5/+1
| | | | | | | | | | | | | | null_nodeget which follows almost always finds the target vnode in the hash, avoiding insmntque1 altogether. Should it be needed, it already checks if the lock needs to be upgraded. Reviewed by: kib Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20244 Notes: svn path=/head/; revision=351360
* Manually clear text references on reclaim for nullfs and tmpfs.Konstantin Belousov2019-06-051-0/+2
| | | | | | | | | | | | | | Both filesystems do no use vnode_pager_dealloc() which would handle this case otherwise. Nullfs because vnode vm_object handle never points to nullfs vnode. Tmpfs because its vm_object is never vnode object at all. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=348698
* Switch to use shared vnode locks for text files during image activation.Konstantin Belousov2019-05-051-10/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kern_execve() locks text vnode exclusive to be able to set and clear VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0 condition. The change removes VV_TEXT, replacing it with the condition v_writecount <= -1, and puts v_writecount under the vnode interlock. Each text reference decrements v_writecount. To clear the text reference when the segment is unmapped, it is recorded in the vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and v_writecount is incremented on the map entry removal The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that v_writecount does not contradict the desired change. vn_writecheck() is now racy and its use was eliminated everywhere except access. Atomic check for writeability and increment of v_writecount is performed by the VOP. vn_truncate() now increments v_writecount around VOP_SETATTR() call, lack of which is arguably a bug on its own. nullfs bypasses v_writecount to the lower vnode always, so nullfs vnode has its own v_writecount correct, and lower vnode gets all references, since object->handle is always lower vnode. On the text vnode' vm object dealloc, the v_writecount value is reset to zero, and deadfs vop_unset_text short-circuit the operation. Reclamation of lowervp always reclaims all nullfs vnodes referencing lowervp first, so no stray references are left. Reviewed by: markj, trasz Tested by: mjg, pho Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D19923 Notes: svn path=/head/; revision=347151
* Un null_vptocnp(), cache vp->v_mount and use it for null_nodeget() call.Konstantin Belousov2019-02-081-1/+7
| | | | | | | | | | | | | | The vp vnode is unlocked during the execution of the VOP method and can be reclaimed, zeroing vp->v_data. Caching allows to use the correct mount point. Reported and tested by: pho PR: 235549 Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=343899
* sys: further adoption of SPDX licensing ID tags.Pedro F. Giffuni2017-11-201-0/+2
| | | | | | | | | | | | | | | | | Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point. Notes: svn path=/head/; revision=326023
* Renumber copyright clause 4Warner Losh2017-02-281-1/+1
| | | | | | | | | | | | Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96 Notes: svn path=/head/; revision=314436
* Fix improper use of "its".Bryan Drewery2016-11-081-2/+2
| | | | | | | Sponsored by: Dell EMC Isilon Notes: svn path=/head/; revision=308457
* nullfs: plug vnode ref leak in null_vptocnpMateusz Guzik2016-09-091-1/+0
| | | | | | | | | | | | The lower vnode is already referenced and nodeget is supposed to consume the reference. Thus the extra vref call was causing a leak. Reported by: pho Reviewed by: kib MFC after: 1 week Notes: svn path=/head/; revision=305659
* nullfs: stop special-casing directories in null_vptocnpMateusz Guzik2016-09-061-3/+0
| | | | | | | | | | | The previous code was forcing an expensive walk in vop_stdvptocnp, which was causing performance issues on highly contended zfs. No objections: kib MFC after: 2 weeks Notes: svn path=/head/; revision=305504
* sys/fs: spelling fixes in comments.Pedro F. Giffuni2016-04-291-1/+1
| | | | | | | No functional change. Notes: svn path=/head/; revision=298806
* After nullfs rmdir operation, reclaim the directory vnode which wasKonstantin Belousov2016-02-171-0/+9
| | | | | | | | | | | | unlinked. Otherwise the vnode stays cached, causing leak. This is similar to r292961 for regular files. Reported and tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=295717
* Force nullfs vnode reclaim after unlinking, to potentially unlinkKonstantin Belousov2015-12-301-3/+5
| | | | | | | | | | | | | lower vnode. Otherwise, reference to the lower vnode from the upper one prevents final unlink. PR: 178238 Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=292961
* VOP_LOOKUP() may relock the directory vnode for some reasons. SinceKonstantin Belousov2014-08-081-4/+40
| | | | | | | | | | | | | | | | | | | | | nullfs vnode shares vnode lock with lower vnode, this allows the reclamation of nullfs directory vnode in null_lookup(). In this situation, VOP must return ENOENT. More, since after the reclamation, the locks of nullfs directory vnode and lower vnode are no longer shared, the relock of the ldvp does not restore the correct locking state of dvp, and leaks ldvp lock. Correct this by unlocking ldvp and locking dvp. Use cached value of dvp->v_mount. Reported by: bdrewery Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=269708
* Assert that nullfs vnode has VV_ROOT set whenever lower vnode has.Konstantin Belousov2014-07-281-0/+4
| | | | | | | | | | Assert that dotdot lookup on the root vnode is not performed. Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=269187