aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* vdev_disk: disable flushes if device does not support itvendor/openzfs/masterRob N2024-05-022-2/+32
| | | | | | | | | | | | | If the underlying device doesn't have a write-back cache, the kernel will just return a successful response. This doesn't hurt anything, but it's extra work on the IO taskqs that are unnecessary. So, detect this when we open the device for the first time. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16148
* Improve write issue taskqs utilizationAlexander Motin2024-05-018-47/+98
| | | | | | | | | | | | | | | | | | | | | | | | - Reduce number of allocators on small system down to one per 4 CPU cores, keeping maximum at 4 on 16+ core systems. Small systems should not have the lock contention multiple allocators supposed to solve, while having several metaslabs open and modified each TXG is not free. - Reduce number of write issue taskqs down to one per 16 CPU cores and an integer fraction of number of allocators. On mid- sized systems, where multiple allocators already make sense, too many write issue taskqs may reduce write speed on single-file workloads, since single file is handled by only one taskq to reduce fragmentation. On large systems, that can actually benefit from many taskq's better IOPS, the bottleneck is less important, since in worst case there will be at least 16 cores to handle it. - Distribute dnodes between allocators (and taskqs) in a round- robin fashion instead of relying on sync taskqs to be balanced. The last is not guarantied and may depend on scheduling. - Remove io_wr_iss_tq from struct zio. io_allocator is enough. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16130
* Slightly improve dnode hashAlexander Motin2024-05-011-3/+3
| | | | | | | | | | | | As I understand just for being less predictable dnode hash includes 8 bits of objset pointer, starting at 6. But since objset_t is more than 1KB in size, its allocations are likely aligned to 2KB, that means 11 lower bits provide no entropy. Just take the 8 bits starting from 11. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16131
* libspl/assert: use libunwind for backtrace when availableRob Norris2024-05-014-3/+79
| | | | | | | | | | | libunwind seems to do a better job of resolving a symbols than backtrace(), and is also useful on platforms that don't have backtrace() (eg musl). If it's available, use it. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16140
* libspl/assert: dump backtrace in assertRob Norris2024-05-014-0/+37
| | | | | | | | | | Adds a check for the backtrace() function. If available, uses it to show a stack backtrace in the assertion output. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16140
* libspl/assert: add lock around assertion outputRob Norris2024-05-011-0/+6
| | | | | | | | | | | | | | | | | | | | | If multiple threads trip an assertion at the same moment (quite common), they can be printing at the same time, and their output gets messy. This adds a simple lock around the whole thing, to prevent a second task printing assert output before the first has finished. Additionally, if libspl_assert_ok is not set, abort() is called without dropping the lock, so that any other asserting tasks will be killed before starting any output, rather than only getting part-way through. This is a tradeoff; it's assumed that multiple threads asserting at the same moment are likely the same fault in different instances of a thread, and so there won't be any more useful information from the other tasks anyway. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16140
* libspl/assert: show process/task details in assert outputRob Norris2024-05-012-3/+35
| | | | | | | | | | | | | | Makes it much easier to see what thing complained. Getting thread id, program name and thread name vary wildly between Linux and FreeBSD, so those are set up in macros. pthread_getname_np() did not appear in musl until very recently, but the same info has always been available via prctl(PR_GET_NAME), so we use that instead. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16140
* libzpool: set thread namesRob Norris2024-05-013-7/+10
| | | | | | | | | | | | | Arrange for the thread/task name to be set when new threads are created. This makes them visible in the process table etc. pthread_setname_np() is generally available in glibc, musl and FreeBSD, so no test is required. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16140
* find_system_library: fix var cleanup when library not foundRob Norris2024-05-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The "not found" path is attempting to clear SOMELIB_CFLAGS and SOMELIB_LIBS by resetting them in AC_SUBST(). However, the second arg to AC_SUBST is expanded in autoconf with `m4_ifvaln([$2], [[$1]=$2])`, which is defined as "if the first arg is non-empty". The m4 "empty" construction is [], therefore, the existing AC_SUBST calls never modify the variables at all. The effect of this is that leftovers from the library test can leak out. At least, if a library header is found in the first stage, but the library itself is not, -lsomelib is added to SOMELIB_LIBS and further tests done. If that library is not found, SOMELIB_LIBS will not be cleared. For most of our library tests this hasn't been a problem, as they're either always found properly via pkg-config or set directly, or the calling test immediately aborts configure. For an optional dependency however, an apparent "partial" result where the header is found but no corresponding library causes link errors later. I think a complete fix should probably not be setting SOMELIB_xxx until the final result is known, but for now, adjusting the AC_SUBST calls to explictly set the empty shell string (which is not "empty" to m4) at least restores the intent. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16140
* zio: try to execute TYPE_NULL ZIOs on the current taskRob N2024-04-291-4/+6
| | | | | | | | | | | | | | | | | | Many TYPE_NULL ZIOs are used to provide a sync point for child ZIOs, and do not do any actual work themselves. However, they are still dispatched to a dedicated, single-thread taskq, which leads to their execution being entirely task switch and dequeue overhead for no actual reason. This commit changes it so that when selecting a parent ZIO to execute, if the parent is TYPE_NULL and has no done function (that is, no additional work), it is executed on the same thread. This reduces task switches and frees up CPU cores for other work. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16134
* vdev probe to slow disk can stall mmp write checkerDon Brady2024-04-2916-52/+242
| | | | | | | | | | | | | | Simplify vdev probes in the zio_vdev_io_done context to avoid holding the spa config lock for a long duration. Also allow zpool clear if no evidence of another host is using the pool. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Don Brady <don.brady@klarasystems.com> Closes #15839
* Fix arcstats for FreeBSD after zfetch supportAmeer Hamza2024-04-291-2/+8
| | | | | | Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #16141
* Overflowing refreservation is badRich Ercolani2024-04-291-1/+14
| | | | | | | | | | | Someone came to me and pointed out that you could pretty readily cause the refreservation calculation to exceed 2**64, given the 2**17 multiplier in it, and produce refreservations wildly less than the actual volsize in cases where it should have failed. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #15996
* GCC: Fixes for gcc 14 on Fedora 40Tony Hutter2024-04-293-7/+14
| | | | | | | | | | | - Workaround dangling pointer in uu_list.c (#16124) - Fix calloc() transposed arguments in zpool_vdev_os.c - Make some temp variables unsigned to prevent triggering a '-Werror=alloc-size-larger-than' error. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #16124 Closes #16125
* Fix updating the zvol_htable when renaming a zvolAlan Somers2024-04-252-2/+2
| | | | | | | | | | | | When renaming a zvol, insert it into zvol_htable using the new name, not the old name. Otherwise some operations won't work. For example, "zfs set volsize" while the zvol is open. Sponsored by: Axcient Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alek Pinchuk <apinchuk@axcient.com> Signed-off-by: Alan Somers <asomers@FreeBSD.org> Closes #16127 Closes #16128
* Python 3.12 deprecated python3-distutilsBrian Behlendorf2024-04-253-117/+235
| | | | | | | | | | | | | | As for python-3.12 the distutils package has been deprecated. The latest ax_python_devel.m4 macro from the autoconf archive has been updated accordingly so let's pull in the new version. We can also drop the changes made to our customized version to continue if the development version is not installed since this functionality has been included upstream. Reviewed-by: Rich Ercolani <rincebrain@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #16126 Closes #16129
* Fast Dedup: ZAP ShrinkingAllan Jude2024-04-247-12/+488
| | | | | | | | | | | | | | | | | This allows ZAPs to shrink. When there are two empty sibling leafs, one of them is collapsed and its storage space is reused. This improved performance on directories that at one time contained a large number of files, but many or all of those files have since been deleted. This also applies to all other types of ZAPs as well. Sponsored-by: iXsystems, Inc. Sponsored-by: Klara, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Alexander Stetsenko <alex.stetsenko@klarasystems.com> Closes #15888
* Make more taskq parameters writableAlexander Motin2024-04-242-6/+11
| | | | | | | | | | | There is no reason for these module parameters to be read-only. Being modified they just apply on next pool import/creation, that is useful for testing different values. Reviewed-by: Rich Ercolani <rincebrain@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16118
* L2ARC: Cleanup buffer re-compressionAlexander Motin2024-04-231-39/+20
| | | | | | | | | | | | | | When compressed ARC is disabled, we may have to re-compress when writing into L2ARC. If doing so we can't fit it into the original physical size, we should just fail immediately, since even if it may still fit into allocation size, its checksum will never match. While there, refactor the code similar to other compression places without using abd_return_buf_copy(). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16038
* zfs-kmod: fix empty rpm requires/conflictsTodd2024-04-231-1/+1
| | | | | | | | | | Fix an error in zfs-kmod.spec that causes kmod-zfs packages not to include the correct RPM requires/conflicts relationships. With this change applied, RPM correctly no longer allows kmod-zfs & zfs-dkms packages to be installed together. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Todd Seidelmann <18294602+seidelma@users.noreply.github.com> Closes #16121
* Refactor dbuf_read() for safer decryptionAlexander Motin2024-04-221-110/+104
| | | | | | | | | | | | | | | | | | | | In dbuf_read_verify_dnode_crypt(): - We don't need original dbuf locked there. Instead take a lock on a dnode dbuf, that is actually manipulated. - Block decryption for a dnode dbuf if it is currently being written. ARC hash lock does not protect anonymous buffers, so arc_untransform() is unsafe when used on buffers being written, that may happen in case of encrypted dnode buffers, since they are not copied by dbuf_dirty()/dbuf_hold_copy(). In dbuf_read(): - If the buffer is in flight, recheck its compression/encryption status after it is cached, since it may need arc_untransform(). Tested-by: Rich Ercolani <rincebrain@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16104
* zfs get: add '-t fs' and '-t vol' optionsRyan2024-04-222-7/+26
| | | | | | | | Make `zfs get` accept `fs` for `filesystem` and `vol` for `volume`. Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ryan <errornointernet@envs.net> Closes #16117
* ztest: use ASSERT3P to compare pointersBrooks Davis2024-04-221-1/+1
| | | | | | | | | | | | With a sufficiently modern gcc (I saw this with gcc13), gcc complains when casting pointers to an integer of a different type (even a larger one). On 32-bt ASSERT3U does this on 32-bit systems by casting a 32-bit pointer to uint64_t so use ASSERT3P which uses uintptr_t. Fixes: 5caeef02fa53 RAID-Z expansion feature Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Brooks Davis <brooks.davis@sri.com> Closes #16115
* ZTS: user_namespace_004.ksh avoid error in cleanup if unsupportedSeth Troisi2024-04-221-2/+2
| | | | | Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Seth Troisi <sethtroisi@google.com> Closes #16114
* Add newline to two zpool messagesSeth Troisi2024-04-221-2/+2
| | | | | Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Seth Troisi <sethtroisi@google.com> Closes #16113
* Parallel pool importGeorge Wilson2024-04-2219-72/+818
| | | | | | | | | | | | | | | | | | | | | | | | This commit allow spa_load() to drop the spa_namespace_lock so that imports can happen concurrently. Prior to dropping the spa_namespace_lock, the import logic will set the spa_load_thread value to track the thread which is doing the import. Consumers of spa_lookup() retain the same behavior by blocking when either a thread is holding the spa_namespace_lock or the spa_load_thread value is set. This will ensure that critical concurrent operations cannot take place while a pool is being imported. The zpool command is also enhanced to provide multi-threaded support when invoking zpool import -a. Lastly, zinject provides a mechanism to insert artificial delays when importing a pool and new zfs tests are added to verify parallel import functionality. Contributions-by: Don Brady <don.brady@klarasystems.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: George Wilson <gwilson@delphix.com> Closes #16093
* abd_iter_page: rework to handle multipage scatterlistsRob N2024-04-191-46/+74
| | | | | | | | | | | | | | | | | | | Previously, abd_iter_page() would assume that every scatterlist would contain a single page (compound or no), because that's all we ever create in abd_alloc_chunks(). However, scatterlists can contain multiple pages of arbitrary provenance, and if we get one of those, we'd get all the math wrong. This reworks things to handle multiple pages in a scatterlist, by properly finding the right page within it for the given offset, and understanding better where the end of the page is and not crossing it. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reported-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16108
* Handle FLUSH errors as "expected"Alexander Motin2024-04-191-1/+2
| | | | | | | | | | | | Before #16061 zio_vdev_io_done() was not used for FLUSH requests. Addition of it triggers reprobe each TXG for vdevs not supporting them. Since those errors are often expected, they are normally handled by individual vdev drivers and should be ignored here. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16110
* tests/quota: consistently clear quota property between testsRob Norris2024-04-197-2/+17
| | | | | | | | | | | | | | | | | | | | | | When run in isolation, quota_005_pos would fail in cleanup because it would attempt restore the previous quota, which was 0, and so get an error (because you can't set quota to '0', you have to use 'none'). It worked as part of the quota tag set because the previous tests did not clean up their quota, so there was always a non-zero quota to return to. This adds a simple quota reset function, and has all quota tests run it at cleanup. For the ones that weren't cleaning up, they now do, and for quota_005_pos, which was trying to do the right thing, it now just resets it. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16097
* tests/quota_005_pos: use a long int for doubling the quota sizeRob Norris2024-04-191-4/+3
| | | | | | | | | | | | | | | | | | | | | | When run in isolation, quota_005_pos would see an empty ~300G dataset. Doubling it's space overflows a int32, which meant it was trying to then set the quota to a negative value, and would fail. When run as part of the quota tests, the filesystem appears to have stuff in it, and so a lower available space, which doesn't overflow, and so succeeds. The bare minimum fix seems to be to use a int64 for the available space, so it can be comfortably doubled. Here it is. (Also a typo fix and a tiny bit of cleanup). Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16097
* Add zfetch stats in arcstatsAmeer Hamza2024-04-191-5/+42
| | | | | | | | | | arc_summary also reports zfetch stats but it's inconvenient to monitor contiguously incrementing numbers. Adding them in arcstats allows us to observe streams more conveniently. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #16094
* Fix: FreeBSD Arm64 does not build currentlyTino Reichardt2024-04-192-2/+2
| | | | | | | | | | | | | The define LD_VERSION isn't defined on FreeBSD Arm64 when OpenZFS is build with the default compiler: clang. I used only gcc for testing - my fault. Fast fix as suggested by @mmatuska Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Martin Matuska <mm@FreeBSD.org> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #16103
* Linux 6.8 compat: META (#16099)Tony Hutter2024-04-171-1/+1
| | | | | | | Update the META file to reflect compatibility with the 6.8 kernel. Signed-off-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
* zts: add a debug option to get full test outputRob N2024-04-162-8/+25
| | | | | | | | | | | | | | | The test runner accumulates output from individual tests, then writes it to the log at the end. If a test hangs or crashes the system half way through, we get no insight into how it got to where it did. This adds a -D option for "debug". When set, all test output is written to stdout. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Akash B <akash-b@hpe.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16096
* Do no use .cfi_negate_ra_state within the assembly on Arm64Tino Reichardt2024-04-152-5/+21
| | | | | | | | | | | | | | | Compiling openzfs on aarch64 with gcc-8 and gcc-9 is failing currently. See issue #14965 for deeper context. On platforms without pointer authentication, .cfi_negate_ra_state can be defined to a no-op: https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/aarch64-tdep.c#l1413 I have tested this on Arm64 FreeBSD 13.2 and AlmaLinux-8. Reviewed-by: Andrew Turner <andrew.turner4@arm.com> Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Closes #14965 Closes #15784
* Add the BTI elf note to the AArch64 SHA2 assemblyAndrew Turner2024-04-152-0/+20
| | | | | | | | | | | | | | | | | | | On ELF platforms there is a note to specify when an application or library supports BTI. When linking one of these the linker needs all input object files to have the note. If not it will not include it in the output file. Normally the compiler would generate it, but for assembly files we need to do it our selves. Add the note to the aarch64 sha256 and sha512 assembly files. Tested by building with BTI enabled and using the -zbti-report=error flag to lld that makes it an error if the note is missing. Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Andrew Turner <andrew.turner4@arm.com> Closes #16086
* zinject: "no-op" error injectionRob N2024-04-154-6/+19
| | | | | | | | | | | | | When injected, this causes the matching IO to appear to succeed, but the actual work is never submitted to the physical device. This can be used to simulate a write-back cache servicing a write, but the backing device has failed and the cache cannot complete the operation in the background. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16085
* zts: allow running a single test by name onlyRob N2024-04-151-2/+13
| | | | | | | | | | | | | | | | | | Specifying a single test is kind of a hassle, because the full relative path under the test suite dir has to be included, but it's not always clear what that path even is. This change allows `-t` to take the name of a single test instead of a full path. If the value has no `/` characters, we search for a file of that name under the test root, and if found, use that as the full test path instead. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Akash B <akash-b@hpe.com> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16088
* bdev_discard_supported: understand discard_granularity=0Rob N2024-04-121-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kernel documentation for the discard_granularity property says: A discard_granularity of 0 means that the device does not support discard functionality. Some older kernels had drivers (notably loop, but also some USB-SATA adapters) that would set the QUEUE_FLAG_DISCARD capability flag, but have discard_granularity=0. Since 5.10 (torvalds/linux@b35fd7422c2f) the discard entry point blkdev_issue_discard() has had a check for this, which would immediately reject the call with EOPNOTSUPP, and throw a scary diagnostic message into the log. See #16068. Since 6.8, the block layer sets a non-zero default for discard_granularity (torvalds/linux@3c407dc723bb), and a future kernel will remove the check entirely[1]. As such, there's no good reason for us to enable discard when discard_granularity=0. The kernel will never let the request go in anyway; better that we just disable it so we can report it properly to the user. 1. https://patchwork.kernel.org/project/linux-block/patch/20240312144826.1045212-2-hch@lst.de/ Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16068 Closes #16082
* zio: rename ZIO_TYPE_IOCTL to ZIO_TYPE_FLUSHRob Norris2024-04-1213-47/+53
| | | | | | | | | | | | | The only possible ioctl is a flush, and any other kind of meta-operation introduced in the future is likely to have different semantics (much like trim did). So, lets just call it what it is. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16064
* dkio: remove kernel dkio.h compatibility headerRob Norris2024-04-125-76/+0
| | | | | | | | | | | | | Without DKIOCFLUSHWRITECACHE, we no longer need the compat header. Note that we're keeping the userspace SPL compat header, which is used by libefi. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16064
* zio: remove io_cmd and DKIOCFLUSHWRITECACHERob Norris2024-04-1210-157/+106
| | | | | | | | | | | | | | There's no other options, so we can just always assume its a flush. Includes some light refactoring where a switch statement was doing control flow that no longer works. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16064
* zio: remove zio_ioctl()Rob Norris2024-04-122-19/+17
| | | | | | | | | | | | It only had one user, zio_flush(), and there are no other vdev ioctls anyway. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16064
* Add support for zfs mount -R <filesystem>Umer Saleem2024-04-117-18/+216
| | | | | | | | | | | | | | | | This commit adds support for mounting a dataset along with all of it's children with '-R' flag for zfs mount. There can be scenarios where we want to mount all datasets under one hierarchy instead of mounting all datasets present on system with '-a' flag. '-R' flag should work on all root and non-root datasets. Usage information and man page has been updated for zfs mount. A test for verifying the behavior for '-R' flag is also added. Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Umer Saleem <usaleem@ixsystems.com> Closes #16015
* AUTHORS: refresh with recent new contributorsRob N2024-04-112-0/+53
| | | | | | Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16079
* vdev_disk: fix alignment check when buffer has non-zero starting offsetRob Norris2024-04-112-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a linear buffer spans multiple pages, and the first page has a non-zero starting offset, the checker would not include the offset, and so would think there was an alignment gap at the end of the first page, rather than at the start. That is, for a 16K buffer spread across five pages with an initial 512B offset: [.XXXXXXX][XXXXXXXX][XXXXXXXX][XXXXXXXX][XXXXXXX.] It would be interpreted as: [XXXXXXX.][XXXXXXXX]... And be rejected as misaligned. Since it's already a linear ABD, the "linearising" copy would just reuse the buffer as-is, and the second check would failing, tripping the VERIFY in vdev_disk_io_rw(). This commit fixes all this by including the offset in the check for end-of-page alignment. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16076
* tests: add test for vdev_disk page alignment checkRob Norris2024-04-116-0/+435
| | | | | | | | | | | | | | | | This provides a test driver and a set of test vectors for the page alignment check callback function vdev_disk_check_pages_cb(). Because there's no good facility for exposing this function to a userspace test right now, for now I'm just duplicating the function and adding commentary to remind people to keep them in sync. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16076
* Illumos#16463 zfs_ioc_recv leaks nvlistAndy Fiddaman2024-04-111-11/+19
| | | | | | | | | | | | | | | | In https://www.illumos.org/issues/16463 it was observed that an nvlist was being leaked in zfs_ioc_recv() due a missing call to nvlist_free for "hidden_args". For OpenZFS the same issue exists in zfs_ioc_recv_new() and is addressed by this PR. This change also properly frees nvlists in the unlikely event that a call to get_nvlist() fails. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Igor Kozhukhov <igor@dilos.org> Signed-off-by: Andy Fiddaman <illumos@fiddaman.net> Closes #16077
* return NULL at end of send_progress_threadJason Lee2024-04-101-0/+1
| | | | | | Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Jason Lee <jasonlee@lanl.gov> Closes #16074
* Add custom debug printing for your assertsRich Ercolani2024-04-104-31/+372
| | | | | | | | | Being able to print custom debug information on assert trip seems useful. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Paul Dagnelie <pcd@delphix.com> Signed-off-by: Rich Ercolani <rincebrain@gmail.com> Closes #15792