summaryrefslogtreecommitdiff
path: root/sys/kern/vfs_mount.c
Commit message (Collapse)AuthorAgeFilesLines
* More careful handling of the mount failure.Konstantin Belousov2020-11-261-4/+21
| | | | | | | | | | | | | | | | | | - VFS_UNMOUNT() requires vn_start_write() around it [*]. - call VFS_PURGE() before unmount. - do not destroy mp if cleanup unmount did not succeed. - set MNTK_UNMOUNT, and indicate forced unmount with MNTK_UNMOUNTF for VFS_UNMOUNT() in cleanup. PR: 251320 [*] Reported by: Tong Zhang <ztong0001@gmail.com> Reviewed by: markj, mjg Discussed with: rmacklem Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27327 Notes: svn path=/head/; revision=368075
* vfs: group mount per-cpu vars into one structMateusz Guzik2020-11-091-49/+62
| | | | | | | | | | | While here move frequently read stuff into the same cacheline. This shrinks struct mount by 64 bytes. Tested by: pho Notes: svn path=/head/; revision=367535
* Suspend all writeable local filesystems on power suspend.Konstantin Belousov2020-11-051-0/+64
| | | | | | | | | | | | | | | | | | | | | This ensures that no writes are pending in memory, either metadata or user data, but not including dirty pages not yet converted to fs writes. Only filesystems declared local are suspended. Note that this does not guarantee absence of the metadata errors or leaks if resume is not done: for instance, on UFS unlinked but opened inodes are leaked and require fsck to gc. Reviewed by: markj Discussed with: imp Tested by: imp (previous version), pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D27054 Notes: svn path=/head/; revision=367398
* Rationalize per-cpu zones.Mateusz Guzik2020-11-051-8/+8
| | | | | | | | | | | | | The 2 provided zones had inconsistent naming between each other ("int" and "64") and other allocator zones (which use bytes). Follow malloc by naming them "pcpu-" + size in bytes. This is a step towards replacing ad-hoc per-cpu zones with general slabs. Notes: svn path=/head/; revision=367384
* vfs: annotate mountlist_mtx with __exclusive_cache_lineMateusz Guzik2020-10-171-1/+1
| | | | Notes: svn path=/head/; revision=366783
* cache: drop the force flag from purgevfsMateusz Guzik2020-09-231-1/+0
| | | | | | | | | The optional scan is wasteful, thus it is removed altogether from unmount. Callers which always want it anyway remain unaffected. Notes: svn path=/head/; revision=366071
* Fix a "v_seqc_users == 0 not met" panic when VFS_STATFS() fails during mount.Rick Macklem2020-08-261-1/+8
| | | | | | | | | | | | | | | | r363210 introduced v_seqc_users to the vnodes. This change requires a vn_seqc_write_end() to match the vn_seqc_write_begin() in vfs_cache_root_clear(). mjg@ provided this patch which seems to fix the panic. Tested for an NFS mount where the VFS_STATFS() call will fail. Submitted by: mjg Reviewed by: mjg Differential Revision: https://reviews.freebsd.org/D26160 Notes: svn path=/head/; revision=364844
* Use devctl.h instead of bus.h to reduce newbus pollution.Warner Losh2020-08-211-1/+1
| | | | | | | | | | There's no need for these parts of the kernel to know about newbus, so narrow what is included to devctl.h for device_notify_*. Suggested by: kib@ Notes: svn path=/head/; revision=364442
* Use names suggested by kib@ in review D25969, move call for unmount to not callWarner Losh2020-08-201-10/+11
| | | | | | | | | | | | | with vnode locked, use NOWAIT alloc and only report when we don't overflow. These changes were accidentally omitted from r364402, except for the not reporting on overflow. They were lumped in with a debugging commit in my tree that I omitted w/o realizing this. Other issues from the review are pending some other changes I need to do first. Notes: svn path=/head/; revision=364425
* Add VFS FS events for mount and unmount to devctl/devdWarner Losh2020-08-191-0/+74
| | | | | | | | | | | | | Report when a filesystem is mounted, remounted or unmounted via devd, along with details about the mount point and mount options. Discussed with: kib@ Reviewed by: kirk@ (prior version) Sponsored by: Netflix Diffential Revision: https://reviews.freebsd.org/D25969 Notes: svn path=/head/; revision=364402
* vfs: sanity check mount counters in vfs_op_enterMateusz Guzik2020-08-191-0/+3
| | | | Notes: svn path=/head/; revision=364371
* vfs: introduce vnode sequence countersMateusz Guzik2020-07-251-15/+40
| | | | | | | | | | | Modified on each permission change and link/unlink. Reviewed by: kib Tested by: pho (in a patchset) Differential Revision: https://reviews.freebsd.org/D25573 Notes: svn path=/head/; revision=363517
* vfs: avoid spurious memcpy in vfs_statfsMateusz Guzik2020-07-101-1/+2
| | | | | | | It is quite often called for the very same buffer. Notes: svn path=/head/; revision=363068
* Apply default security flavor in vfs_exportRyan Moeller2020-06-161-5/+0
| | | | | | | | | | | | | | | | | | There may be some version of mountd out there that does not supply a default security flavor when none is given for an export. Set the default security flavor in vfs_export if none is given, and remove the workaround for oexport compat. Reported by: npn Reviewed by: rmacklem Approved by: mav (mentor) MFC after: 3 days Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D25300 Notes: svn path=/head/; revision=362252
* Fix export_args ex_flags field so that is 64bits, the same as mnt_flags.Rick Macklem2020-06-141-23/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since mnt_flags was upgraded to 64bits there has been a quirk in "struct export_args", since it hold a copy of mnt_flags in ex_flags, which is an "int" (32bits). This happens to currently work, since all the flag bits used in ex_flags are defined in the low order 32bits. However, new export flags cannot be defined. Also, ex_anon is a "struct xucred", which limits it to 16 additional groups. This patch revises "struct export_args" to make ex_flags 64bits and replaces ex_anon with ex_uid, ex_ngroups and ex_groups (which points to a groups list, so it can be malloc'd up to NGROUPS in size. This requires that the VFS_CHECKEXP() arguments change, so I also modified the last "secflavors" argument to be an array pointer, so that the secflavors could be copied in VFS_CHECKEXP() while the export entry is locked. (Without this patch VFS_CHECKEXP() returns a pointer to the secflavors array and then it is used after being unlocked, which is potentially a problem if the exports entry is changed. In practice this does not occur when mountd is run with "-S", but I think it is worth fixing.) This patch also deleted the vfs_oexport_conv() function, since do_mount_update() does the conversion, as required by the old vfs_cmount() calls. Reviewed by: kib, freqlabs Relnotes: yes Differential Revision: https://reviews.freebsd.org/D25088 Notes: svn path=/head/; revision=362158
* Fix build issue introduced by r361699.Rick Macklem2020-06-021-0/+3
| | | | | | | Reported by: cy (and others) Notes: svn path=/head/; revision=361711
* Assign default security flavor when converting old export argsRyan Moeller2020-06-011-1/+13
| | | | | | | | | | | | | | | | | | vfs_export requires security flavors be explicitly listed when exporting as of r360900. Use the default AUTH_SYS flavor when converting old export args to ensure compatibility with the legacy mount syscall. Reported by: rmacklem Reviewed by: rmacklem Approved by: mav (mentor) MFC after: 3 days Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D25045 Notes: svn path=/head/; revision=361699
* Fix an NFS mount attempt where VFS_STATFS() fails.Rick Macklem2020-03-221-1/+4
| | | | | | | | | | | | | | r353150 added mnt_rootvnode and this seems to have broken NFS mounts when the VFS_STATFS() called just after VFS_MOUNT() returns an error. Then the code calls VFS_UNMOUNT(), which calls vflush(), which returns EBUSY. Then the thread get stuck sleeping on "mntref" in vfs_mount_destroy(). This patch fixes this problem. Reviewed by: kib, mjg Differential Revision: https://reviews.freebsd.org/D24022 Notes: svn path=/head/; revision=359219
* vfs: drop remaining zpcpu castsMateusz Guzik2020-02-121-4/+4
| | | | Notes: svn path=/head/; revision=357811
* vfs: switch to smp_rendezvous_cpus_retry for vfs_op_thread_enter/exitMateusz Guzik2020-02-121-19/+42
| | | | | | | | | In particular on amd64 this eliminates an atomic op in the common case, trading it for IPIs in the uncommon case of catching CPUs executing the code while the filesystem is getting suspended or unmounted. Notes: svn path=/head/; revision=357810
* vfs: remove now useless ENODEV handling from vn_fullpath consumersMateusz Guzik2020-02-081-3/+2
| | | | | | | Noted by: ngie Notes: svn path=/head/; revision=357679
* Add kern_unmount() and use in Linuxulator. No functional changes.Edward Tomasz Napierala2020-01-241-5/+12
| | | | | | | | | | Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22646 Notes: svn path=/head/; revision=357075
* Peter Holm reports that his test that does an umount(8) on an activeKirk McKusick2020-01-151-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mount point while numerous tests are running that are writing to files on that mount point cause the unmount(8) to hang forever. The unmount(8) system call is handled in the kernel by the dounmount() function. The cause of the hang is that prior to dounmount() calling VFS_UNMOUNT() it is calling VFS_SYNC(mp, MNT_WAIT). The MNT_WAIT flag indicates that VFS_SYNC() should not return until all the dirty buffers associated with the mount point have been written to disk. Because user processes are allowed to continue writing and can do so faster than the data can be written to disk, the call to VFS_SYNC() can never finish. Unlike VFS_SYNC(), the VFS_UNMOUNT() routine can suspend all processes when they request to do a write thus having a finite number of dirty buffers to write that cannot be expanded. There is no need to call VFS_SYNC() before calling VFS_UNMOUNT(), because VFS_UNMOUNT() needs to flush everything again anyway after suspending writes, to catch anything that was dirtied between the VFS_SYNC() and writes being suspended. The fix is to simply remove the unnecessary call to VFS_SYNC() from dounmount(). Reported by: Peter Holm Analysis by: Chuck Silvers Tested by: Peter Holm MFC after: 7 days Sponsored by: Netflix Notes: svn path=/head/; revision=356763
* vfs: rework vnode list managementMateusz Guzik2020-01-131-8/+0
| | | | | | | | | | | | | | | | | | | | | | | The current notion of an active vnode is eliminated. Vnodes transition between 0<->1 hold counts all the time and the associated traversal between different lists induces significant scalability problems in certain workloads. Introduce a global list containing all allocated vnodes. They get unlinked only when UMA reclaims memory and are only requeued when hold count reaches 0. Sample result from an incremental make -s -j 104 bzImage on tmpfs: stock: 118.55s user 3649.73s system 7479% cpu 50.382 total patched: 122.38s user 1780.45s system 6242% cpu 30.480 total Reviewed by: jeff Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D22997 Notes: svn path=/head/; revision=356672
* vfs: add per-mount vnode lazy list and use it for deferred inactive + msyncMateusz Guzik2020-01-131-0/+4
| | | | | | | | | | | | | | | | | | | | | This obviates the need to scan the entire active list looking for vnodes of interest. msync is handled by adding all vnodes with write count to the lazy list. deferred inactive directly adds vnodes as it sets the VI_DEFINACT flag. Vnodes get dequeued from the list when their hold count reaches 0. Newly added MNT_VNODE_FOREACH_LAZY* macros support filtering so that spurious locking is avoided in the common case. Reviewed by: jeff Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D22995 Notes: svn path=/head/; revision=356670
* vfs: reimplement deferred inactive to use a dedicated flag (VI_DEFINACT)Mateusz Guzik2020-01-071-1/+1
| | | | | | | | | | | | | | The previous behavior of leaving VI_OWEINACT vnodes on the active list without a hold count is eliminated. Hold count is kept and inactive processing gets explicitly deferred by setting the VI_DEFINACT flag. The syncer is then responsible for vdrop. Reviewed by: kib (previous version) Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D23036 Notes: svn path=/head/; revision=356441
* vfs: drop the mostly unused flags argument from VOP_UNLOCKMateusz Guzik2020-01-031-10/+10
| | | | | | | | | | | Filesystems which want to use it in limited capacity can employ the VOP_UNLOCK_FLAGS macro. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D21427 Notes: svn path=/head/; revision=356337
* vfs: add optional root vnode cachingMateusz Guzik2019-10-061-1/+16
| | | | | | | | | | | | | Root vnodes looekd up all the time, e.g. when crossing a mount point. Currently used routines always perform a costly lookup which can be trivially avoided. Reviewed by: jeff (previous version), kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21646 Notes: svn path=/head/; revision=353150
* Check the vfs option length is valid before accessing throughAndrew Turner2019-09-271-2/+2
| | | | | | | | | | | | | | | | When a VFS option passed to nmount is present but NULL the kernel will place an empty option in its internal list. This will have a NULL pointer and a length of 0. When we come to read one of these the kernel will try to load from the last address of virtual memory. This is normally invalid so will fault resulting in a kernel panic. Fix this by checking if the length is valid before dereferencing. MFC after: 3 days Sponsored by: DARPA, AFRL Notes: svn path=/head/; revision=352796
* Add two options to allow mount to avoid covering up existing mount points.Sean Eric Fagan2019-09-231-5/+34
| | | | | | | | | | | | | | | | | | | The two options are * nocover/cover: Prevent/allow mounting over an existing root mountpoint. E.g., "mount -t ufs -o nocover /dev/sd1a /usr/local" will fail if /usr/local is already a mountpoint. * emptydir/noemptydir: Prevent/allow mounting on a non-empty directory. E.g., "mount -t ufs -o emptydir /dev/sd1a /usr" will fail. Neither of these options is intended to be a default, for historical and compatibility reasons. Reviewed by: allanjude, kib Differential Revision: https://reviews.freebsd.org/D21458 Notes: svn path=/head/; revision=352614
* vfs: group fields used for per-cpu ops in one cachelineMateusz Guzik2019-09-191-1/+1
| | | | | | | Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=352543
* vfs: convert struct mount counters to per-cpuMateusz Guzik2019-09-161-2/+114
| | | | | | | | | | | | | | | | | | | | | There are 3 counters modified all the time in this structure - one for keeping the structure alive, one for preventing unmount and one for tracking active writers. Exact values of these counters are very rarely needed, which makes them a prime candidate for conversion to a per-cpu scheme, resulting in much better performance. Sample benchmark performing fstatfs (modifying 2 out of 3 counters) on a 104-way 2 socket Skylake system: before: 852393 ops/s after: 76682077 ops/s Reviewed by: kib, jeff Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21637 Notes: svn path=/head/; revision=352427
* vfs: manage mnt_ref with atomicsMateusz Guzik2019-09-161-1/+103
| | | | | | | | | | | | | | | | | | New primitive is introduced to denote sections can operate locklessly on aspects of struct mount, but which can also be disabled if necessary. This provides an opportunity to start scaling common case modifications while providing stable state of the struct when facing unmount, write suspendion or other events. mnt_ref is the first counter to start being managed in this manner with the intent to make it per-cpu. Reviewed by: kib, jeff Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21425 Notes: svn path=/head/; revision=352424
* De-commision the MNTK_NOINSMNTQ kernel mount flag.Konstantin Belousov2019-08-231-4/+2
| | | | | | | | | | | | | | After all the changes, its dynamic scope is same as for MNTK_UNMOUNT, but to allow the syncer vnode to be re-installed on unmount failure. But the case of syncer was already handled by using the VV_FORCEINSMQ flag for quite some time. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=351435
* vfs: fix up r351193 ("stop always overwriting ->mnt_stat in VFS_STATFS")Mateusz Guzik2019-08-191-0/+6
| | | | | | | | | | | | | | | | | | fs-specific part of vfs_statfs routines only fill in small portion of the structure. Previous code was always copying everything at a higher layer to acoomodate it and this patch does the same. 'df' (no arguments) worked fine because the caller uses mnt_stat itself as the target buffer, making all the copying a no-op for its own case. 'df /' and similar use a different consumer which passes its own buffer and this is where you can run into trouble. Reported by: cy Fixes: r351193 Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=351215
* vfs: stop always overwriting ->mnt_stat in VFS_STATFSMateusz Guzik2019-08-181-5/+8
| | | | | | | | | | | | | | | | | | | | The struct is already populated on each mount (and remount). Fields are either constant or not used by filesystem in the first place. Some infrequently used functions use it to avoid having to allocate a new buffer and are left alone. The current code results in an avoidable copying single-threaded and significant cache line bouncing multithreaded While here deduplicate initial filling of the struct. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21317 Notes: svn path=/head/; revision=351193
* Include ktr.h in more compilation unitsConrad Meyer2019-05-211-0/+1
| | | | | | | | | | | | | | | | | | Similar to r348026, exhaustive search for uses of CTRn() and cross reference ktr.h includes. Where it was obvious that an OS compat header of some kind included ktr.h indirectly, .c files were left alone. Some of these files clearly got ktr.h via header pollution in some scenarios, or tinderbox would not be passing prior to this revision, but go ahead and explicitly include it in files using it anyway. Like r348026, these CUs did not show up in tinderbox as missing the include. Reported by: peterj (arm64/mp_machdep.c) X-MFC-With: r347984 Sponsored by: Dell EMC Isilon Notes: svn path=/head/; revision=348064
* Some filesystems (like cd9660 and ext3) require that VFS_STATFS()Kirk McKusick2018-12-211-1/+1
| | | | | | | | | | | | | | be called before VFS_ROOT() is called. Move the call for VFS_STATFS() so that it is done after VFS_MOUNT(), but before VFS_ROOT(). This change actually improves the robustness of the mount system call because it returns an error rather than failing silently when VFS_STATFS() returns failure. Reported by: Rebecca Cran <rebecca@bluestop.org> Sponsored by: Netflix Notes: svn path=/head/; revision=342290
* Under UFS/FFS the VFS_ROOT() function will return an error if the inodeKirk McKusick2018-12-151-5/+11
| | | | | | | | | | | | check-hash fails. Panic'ing is not an appropriate response. So, check for an error return from VFS_ROOT() and when an error is reported, unwind and return the error. Reported by: Gary Jennejohn (gj) Sponsored by: Netflix Notes: svn path=/head/; revision=342135
* Remove unused argument to priv_check_cred.Mateusz Guzik2018-12-111-1/+1
| | | | | | | | | | | | | | | | Patch mostly generated with cocinnelle: @@ expression E1,E2; @@ - priv_check_cred(E1,E2,0) + priv_check_cred(E1,E2) Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=341827
* Add FALLTHROUGH comments to appease Coverity.Mark Johnston2018-10-251-8/+7
| | | | | | | | CID: 1017862-1017864, 1017866-1017868 MFC after: 2 weeks Notes: svn path=/head/; revision=339731
* Correct condition to detect mount(2) support by a filesystem.Konstantin Belousov2018-10-241-2/+4
| | | | | | | | | Reported and tested by: cy Sponsored by: The FreeBSD Foundation Approved by: re (rgrimes) Notes: svn path=/head/; revision=339694
* Only call sigdeferstop() for NFS.Konstantin Belousov2018-10-231-3/+5
| | | | | | | | | | | | | | | | | | | | Use bypass to catch any NFS VOP dispatch and route it through the wrapper which does sigdeferstop() and then dispatches original VOP. NFS does not need a bypass below it, which is not supported. The vop offset in the vop_vector is added since otherwise it is impossible to get vop_op_t from the internal table, and I did not wanted to create the layered fs only to wrap NFS VOPs. VFS_OP()s wrap is straightforward. Requested and reviewed by: mjg (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D17658 Notes: svn path=/head/; revision=339672
* Make it easier for filesystems to count themselves as jail-enabled,Jamie Gritton2018-05-041-2/+10
| | | | | | | | | | | | | | | by doing most of the work in a new function prison_add_vfs in kern_jail.c Now a jail-enabled filesystem need only mark itself with VFCF_JAIL, and the rest is taken care of. This includes adding a jail parameter like allow.mount.foofs, and a sysctl like security.jail.mount_foofs_allowed. Both of these used to be a static list of known filesystems, with predefined permission bits. Reviewed by: kib Differential Revision: D14681 Notes: svn path=/head/; revision=333263
* vfs_donmount: in certain cases try r/o mount if r/w mount failsAndriy Gapon2018-03-271-3/+58
| | | | | | | | | | | | | | | | | | | | | | If the operation is not an update, if neither r/w nor r/o mode is explicitly requested, if the error code hints at the possibility of the media being read-only, and if the fallback is allowed, then we can try to automatically downgrade to the readonly mode. This is especially useful for auto-mounting of removable media that sometimes can happen to be write-protected. The fallback to r/o is not enabled by default. It can be requested on a per-mount basis with a new mount option, 'autoro'. Or it can be globally allowed by setting vfs.default_autoro. Reviewed by: cem, kib MFC after: 3 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D13361 Notes: svn path=/head/; revision=331616
* Use EVENTHANDLER_DIRECT_INVOKE for [un]mount events, for better performance.Ian Lepore2018-01-071-2/+6
| | | | Notes: svn path=/head/; revision=327679
* sys: further adoption of SPDX licensing ID tags.Pedro F. Giffuni2017-11-201-0/+2
| | | | | | | | | | | | | | | | | Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point. Notes: svn path=/head/; revision=326023
* remove process and jail directory machinations from dounmountAndriy Gapon2017-10-131-28/+5
| | | | | | | | | | | | | | | | | | The manipulations done by mountcheckdirs() are not that useful during the unmount, they can bring about unexpected security consequences. Thic change effectively reverts the change in r73241. The change also allows to simplify the handling of rootvnode global variable. Discussed with: mckusick, mjg, kib Reviewed by: trasz MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D12366 Notes: svn path=/head/; revision=324591
* Do not vrele() covered vnode under the mp mutex.Konstantin Belousov2017-09-191-2/+2
| | | | | | | | | | | | If vrele() changes the hold count to zero, it needs to acquire the vnode lock. Sponsored by: The FreeBSD Foundation Discussed with: avg X-MFC with: r323578 Notes: svn path=/head/; revision=323769
* dounmount: do not release the mount point's reference on the covered vnodeAndriy Gapon2017-09-141-1/+4
| | | | | | | | | | | | | | As long as mnt_ref is not zero there can be a consumer that might try to access mnt_vnodecovered. For this reason the covered vnode must not be freed until mnt_ref goes to zero. So, move the release of the covered vnode to vfs_mount_destroy. Reviewed by: kib MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D12329 Notes: svn path=/head/; revision=323578