aboutsummaryrefslogtreecommitdiff
path: root/sys/vm/uma_core.c
Commit message (Collapse)AuthorAgeFilesLines
* uma_core: change listq to plinks.q in temp listsDoug Moore2025-05-011-10/+6
| | | | | | | | | | | | Change the two functions that use local tailq variables to use the plinks.q field, instead of the listq field, for the pointers. This will resolve one source of conflict when the tailq field and the object field come to share the same space in a future change to the vm_page definition. Reviewed by: alc, kib Differential Revision: https://reviews.freebsd.org/D50094
* vm_object: drop unnecessary vm_object.h headerDoug Moore2025-04-301-1/+0
| | | | | | | | The header vm_object.h is included in vm_phys.h and uma_core.h, where it is not necessary. Remove it. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D50081
* uma: Avoid excessive per-CPU drainingMark Johnston2025-03-171-0/+7
| | | | | | | | | | | | | | | | | | | After commit 389a3fa693ef, uma_reclaim_domain(UMA_RECLAIM_DRAIN_CPU) calls uma_zone_reclaim_domain(UMA_RECLAIM_DRAIN_CPU) twice on each zone in addition to globally draining per-CPU caches. This was unintended and is unnecessarily slow; in particular, draining per-CPU caches requires binding to each CPU. Stop draining per-CPU caches when visiting each zone, just do it once in pcpu_cache_drain_safe() to minimize the amount of expensive sched_bind() calls. Fixes: 389a3fa693ef ("uma: Add UMA_ZONE_UNMANAGED") MFC after: 1 week Sponsored by: Klara, Inc. Sponsored by: NetApp, Inc. Reviewed by: gallatin, kib Differential Revision: https://reviews.freebsd.org/D49349
* Introduce the UMA_ZONE_NOTRIM uma zone typeAndrew Gallatin2025-01-151-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | The ktls buffer zone allocates 16k contiguous buffers, and often needs to call vm_page_reclaim_contig_domain_ext() to free up contiguous memory, which can be expensive. Web servers which have a daily pattern of peaks and troughs end up having UMA trim the ktls_buffer_zone when they are in their trough, and end up re-building it on the way to their peak. Rather than calling vm_page_reclaim_contig_domain_ext() multiple times on a daily basis, lets mark the ktls_buffer_zone with a new UMA flag, UMA_ZONE_NOTRIM. This disables UMA_RECLAIM_TRIM on the zone, but allows UMA_RECLAIM_DRAIN* operations, so that if we become extremely short of memory (vm_page_count_severe()), the uma reclaim worker can still free up memory. Note that UMA_ZONE_UNMANAGED already exists, but can never be drained or trimmed, so it may hold on to memory during times of severe memory pressure. Using UMA_ZONE_NOTRIM rather than UMA_ZONE_UNMANAGED is an attempt to keep this zone more reactive in the face of severe memory pressure. Sponsored by: Netflix Reviewed by: jhb, kib, markj (via slack) Differential Revision: https://reviews.freebsd.org/D48451
* malloc(9): Introduce M_NEVERFREEDBojan Novković2024-07-301-0/+3
| | | | | | | | | This patch adds an additional malloc(9) flag to distinguish allocations that are never freed during runtime. Differential Revision: https://reviews.freebsd.org/D45045 Reviewed by: alc, kib, markj Tested by: alc
* uma: Fix improper uses of UMA_MD_SMALL_ALLOCBojan Novković2024-05-261-3/+3
| | | | | | | | | | | | UMA_MD_SMALL_ALLOC was recently replaced by UMA_USE_DMAP, but da76d349b6b1 missed some improper uses of the old symbol. This change makes sure that UMA_USE_DMAP is used properly in code that selects uma_small_alloc. Fixes: da76d349b6b1 Reported by: eduardo, rlibby Approved by: markj (mentor) Differential Revision: https://reviews.freebsd.org/D45368
* vm: Simplify startup page dumping conditionalBojan Novković2024-05-251-4/+2
| | | | | | | | | | This commit introduces the MINIDUMP_STARTUP_PAGE_TRACKING symbol and uses it to simplify several instances of a complex preprocessor conditional for adding pages allocated when bootstraping the kernel to minidumps. Reviewed by: markj, mhorne Approved by: markj (mentor) Differential Revision: https://reviews.freebsd.org/D45085
* uma: Deduplicate uma_small_allocBojan Novković2024-05-251-3/+40
| | | | | | | | | | | | | | | | | This commit refactors the UMA small alloc code and removes most UMA machine-dependent code. The existing machine-dependent uma_small_alloc code is almost identical across all architectures, except for powerpc where using the direct map addresses involved extra steps in some cases. The MI/MD split was replaced by a default uma_small_alloc implementation that can be overridden by architecture-specific code by defining the UMA_MD_SMALL_ALLOC symbol. Furthermore, UMA_USE_DMAP was introduced to replace most UMA_MD_SMALL_ALLOC uses. Reviewed by: markj, kib Approved by: markj (mentor) Differential Revision: https://reviews.freebsd.org/D45084
* uma: Improve memory modified after free panic messagesAlexander Motin2023-11-101-2/+2
| | | | | | | | | - Pass zone pointer to trash_ctor() and report zone name in the panic message. It may be difficult to figyre out zone just by the item size. - Do not pass user arguments to internal trash calls, pass thezone. - Report malloc type name in the same unified panic message. - Report corruption offset from the beginning of the items instead of the full pointer. It makes panic message shorter and more readable.
* uma: New check_align_mask(): Validate alignments (INVARIANTS)Olivier Certner2023-11-021-7/+18
| | | | | | | | | | | | | | | New function check_align_mask() asserts (under INVARIANTS) that the mask fits in a (signed) integer (see the comment) and that the corresponding alignment is a power of two. Use check_align_mask() in uma_set_align_mask() and also in uma_zcreate() to replace the KASSERT() there (that was checking only for a power of 2). Reviewed by: kib, markj MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D42263
* uma: Make the cache alignment mask unsignedOlivier Certner2023-11-021-5/+10
| | | | | | | | | | | | | In uma_set_align_mask(), ensure that the passed value doesn't have its highest bit set, which would lead to problems since keg/zone alignment is internally stored as signed integers. Such big values do not make sense anyway and indicate some programming error. A future commit will introduce checks for this case and other ones. Reviewed by: kib, markj MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D42262
* uma: UMA_ALIGN_CACHE: Resolve the proper value at use pointOlivier Certner2023-11-021-2/+1
| | | | | | | | | | | | | | | Having a special value of -1 that is resolved internally to 'uma_align_cache' provides no significant advantages and prevents changing that variable to an unsigned type, which is natural for an alignment mask. So suppress it and replace its use with a call to uma_get_align_mask(). The small overhead of the added function call is irrelevant since UMA_ALIGN_CACHE is only used when creating new zones, which is not performance critical. Reviewed by: markj, kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D42259
* uma: Hide 'uma_align_cache'; Create/rename accessorsOlivier Certner2023-11-021-5/+13
| | | | | | | | | | | | | | | | | | | | | | | | | Create the uma_get_cache_align_mask() accessor and put it in a separate private header so as to minimize namespace pollution in header/source files that need only this function and not the whole 'uma.h' header. Make sure the accessors have '_mask' as a suffix, so that callers are aware that the real alignment is the power of two that is the mask plus one. Rename the stem to something more explicit. Rename uma_set_cache_align_mask()'s single parameter to 'mask'. Hide 'uma_align_cache' to ensure that it cannot be set in any other way then by a call to uma_set_cache_align_mask(), which will perform sanity checks in a further commit. While here, rename it to 'uma_cache_align_mask'. This is also in preparation for some further changes, such as improving the sanity checks, eliminating internal resolving of UMA_ALIGN_CACHE and changing the type of the 'uma_cache_align_mask' variable. Reviewed by: markj, kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D42258
* sys: Remove $FreeBSD$: one-line .c patternWarner Losh2023-08-161-2/+0
| | | | Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
* spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSDWarner Losh2023-05-121-1/+1
| | | | | | | | | The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch up to that fact and revert to their recommended match of BSD-2-Clause. Discussed with: pfg MFC After: 3 days Sponsored by: Netflix
* uma: Never pass cache zones to memguardMark Johnston2022-10-191-2/+4
| | | | | | | | | Items allocated from cache zones cannot usefully be protected by memguard. PR: 267151 Reported and tested by: pho MFC after: 1 week
* kmem_malloc/free: Use void * instead of vm_offset_t for kernel pointers.John Baldwin2022-09-221-2/+2
| | | | | | Reviewed by: kib, markj Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D36549
* sys/vm: Add TSLOG to some functionsColin Percival2022-08-121-0/+3
| | | | | | The functions pbuf_init, kva_alloc, and keg_alloc_slab are significant contributors to the kernel boot time when FreeBSD boots inside the Firecracker VMM. Instrument them so they show up on flamecharts.
* ddb: annotate some commands with DB_CMD_MEMSAFEMitchell Horne2022-07-181-2/+2
| | | | | | | | | | This is not completely exhaustive, but covers a large majority of commands in the tree. Reviewed by: markj Sponsored by: Juniper Networks, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D35583
* uma: Apply a missed piece of review feedback from D35738Mark Johnston2022-07-131-1/+1
| | | | Fixes: 93cd28ea82bb ("uma: Use a taskqueue to execute uma_timeout()")
* uma: Use a taskqueue to execute uma_timeout()Mark Johnston2022-07-111-6/+15
| | | | | | | | | | | | | | uma_timeout() has several responsibilities; it visits every UMA zone and as of recently will drain underutilized caches, so is rather expensive (>1ms in some cases). Currently it is executed by softclock threads and so will preempt most other CPU activity. None of this work requires a high scheduling priority, though, so defer it to a taskqueue so as to avoid stalling higher-priority work. Reviewed by: rlibby, alc, mav, kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D35738
* uma: Mark zeroed slabs as initialized for KMSANMark Johnston2022-06-201-0/+3
| | | | | | | Otherwise zone initializers can produce false positives, e.g., when lock_init() attempts to detect double initialization. Sponsored by: The FreeBSD Foundation
* uma_zfree_smr: uz_flags is only used if NUMA is defined.John Baldwin2022-04-091-2/+5
|
* uma: Don't allow a limit to be set in a warm zoneMark Johnston2022-03-301-0/+2
| | | | | | | The limit accounting in UMA does not tolerate this. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* uma: Use the correct type for a return valueMark Johnston2022-03-301-1/+1
| | | | | | | zone_alloc_bucket() returns a pointer, not a bool. MFC after: 1 week Sponsored by: The FreeBSD Foundation
* uma_zalloc_domain: call uma_zalloc_debug in multi-domain pathEric van Gyzen2022-03-261-6/+5
| | | | | | | | | | | It was only called in the non-NUMA and single-domain paths. Some of its assertions were duplicated in uma_zalloc_domain, but some things were missed, especially memguard. Reviewed by: markj, rstone MFC after: 1 week Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D34472
* uma_zalloc: assert M_NOWAIT ^ M_WAITOKEric van Gyzen2022-03-261-0/+28
| | | | | | | | | | | | The uma_zalloc functions expect exactly one of [M_NOWAIT, M_WAITOK]. If neither or both are passed, print an error and a stack dump. Only do this ten times, to prevent livelock. In the future, after this exposes enough bad callers, this will be changed to a KASSERT(). Reviewed by: rstone, markj MFC after: 1 month Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D34452
* uma: Add UMA_ZONE_UNMANAGEDMark Johnston2022-02-151-38/+35
| | | | | | | | | | | | | | Allow a zone to opt out of cache size management. In particular, uma_reclaim() and uma_reclaim_domain() will not reclaim any memory from the zone, nor will uma_timeout() purge cached items if the zone is idle. This effectively means that the zone consumer has control over when items are reclaimed from the cache. In particular, uma_zone_reclaim() will still reclaim cached items from an unmanaged zone. Reviewed by: hselasky, kib MFC after: 3 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D34142
* uma: Avoid polling for an invalid SMR sequence numberMark Johnston2022-01-141-1/+3
| | | | | | | | | | | | Buckets in an SMR-enabled zone can legitimately be tagged with SMR_SEQ_INVALID. This effectively means that the zone destructor (if any) was invoked on all items in the bucket, and the contained memory is safe to reuse. If the first bucket in the full bucket list was tagged this way, UMA would unnecessarily poll per-CPU state before attempting to fetch a full bucket from the list. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* Dump page tracking no longer needed on mipsKonstantin Belousov2022-01-061-2/+2
| | | | | | Reviewed by: imp Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D33763
* uma: with KTR trace allocs/frees from SMR zonesGleb Smirnoff2021-12-301-0/+6
|
* uma: with KTR report item being freed in uma_zfree_arg()Gleb Smirnoff2021-12-301-1/+2
|
* uma: remove unused *item argument from cache_free()Gleb Smirnoff2021-12-051-5/+4
| | | | | Reviewed by: markj Differential revision: https://reviews.freebsd.org/D33272
* uma: Fix handling of reserves in zone_import()Mark Johnston2021-11-011-1/+2
| | | | | | | | | | | | | | | | | Kegs with no items reserved have uk_reserve = 0. So the check keg->uk_reserve >= dom->ud_free_items will be true once all slabs are depleted. Then, rather than go and allocate a fresh slab, we return to the cache layer. The intent was to do this only when the keg actually has a reserve, so modify the check to verify this first. Another approach would be to make uk_reserve signed and set it to -1 until uma_zone_reserve() is called, but this requires a few casts elsewhere. Fixes: 1b2dcc8c54a8 ("uma: Avoid depleting keg reserves when filling a bucket") MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32516
* uma: Improve M_USE_RESERVE handling in keg_fetch_slab()Mark Johnston2021-11-011-9/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | M_USE_RESERVE is used in a couple of places in the VM to avoid unbounded recursion when the direct map is not available, as is the case on 32-bit platforms or when certain kernel sanitizers (KASAN and KMSAN) are enabled. For example, to allocate KVA, the kernel might allocate a kernel map entry, which might require a new slab, which requires KVA. For these zones, we use uma_prealloc() to populate a reserve of items, and then in certain serialized contexts M_USE_RESERVE can be used to guarantee a successful allocation. uma_prealloc() allocates the requested number of items, distributing them evenly among NUMA domains. Thus, in a first-touch zone, to satisfy an M_USE_RESERVE allocation we might have to check the slab lists of other domains than the current one to provide the semantics expected by consumers. So, try harder to find an item if M_USE_RESERVE is specified and the keg doesn't have anything for current (first-touch) domain. Specifically, fall back to a round-robin slab allocation. This change fixes boot-time panics on NUMA systems with KASAN or KMSAN enabled.[1] Alternately we could have uma_prealloc() allocate the requested number of items for each domain, but for some existing consumers this would be quite wasteful. In general I think keg_fetch_slab() should try harder to find free slabs in other domains before trying to allocate fresh ones, but let's limit this to M_USE_RESERVE for now. Also fix a separate problem that I noticed: in a non-round-robin slab allocation with M_WAITOK, rather than sleeping after a failed slab allocation we simply try again. Call vm_wait_domain() before retrying. Reported by: mjg, tuexen [1] Reviewed by: alc MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32515
* Remove some remaining references to VM_ALLOC_NOOBJMark Johnston2021-10-201-1/+1
| | | | | | Reviewed by: alc, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32037
* Convert consumers to vm_page_alloc_noobj_contig()Mark Johnston2021-10-201-12/+7
| | | | | | | | | Remove now-unneeded page zeroing. No functional change intended. Reviewed by: alc, hselasky, kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32006
* Convert vm_page_alloc() callers to use vm_page_alloc_noobj().Mark Johnston2021-10-201-11/+11
| | | | | | | | | | | | | | | Remove page zeroing code from consumers and stop specifying VM_ALLOC_NOOBJ. In a few places, also convert an allocation loop to simply use VM_ALLOC_WAITOK. Similarly, convert vm_page_alloc_domain() callers. Note that callers are now responsible for assigning the pindex. Reviewed by: alc, hselasky, kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31986
* uma: Show the count of free slabs in each per-domain keg's sysctl treeMark Johnston2021-09-171-1/+4
| | | | | | | | This is useful for measuring the number of pages that could be freed from a NOFREE zone under memory pressure. MFC after: 1 week Sponsored by: The FreeBSD Foundation
* uma: Add KMSAN hooksMark Johnston2021-08-111-3/+59
| | | | | | | | | | | | | For now, just hook the allocation path: upon allocation, items are marked as initialized (absent M_ZERO). Some zones are exempted from this when it would otherwise raise false positives. Use kmsan_orig() to update the origin map for UMA and malloc(9) allocations. This allows KMSAN to print the return address when an uninitialized UMA item is implicated in a report. For example: panic: MSan: Uninitialized UMA memory from m_getm2+0x7fe Sponsored by: The FreeBSD Foundation
* uma: Fix a few problems with KASAN integrationMark Johnston2021-07-101-3/+13
| | | | | | | | | | | | | | - Ensure that all items returned by UMA are aligned to KASAN_SHADOW_SCALE (8). This was true in practice since smaller alignments are not used by any consumers, but we should enforce it anyway. - Use a non-zero code for marking redzones that appear naturally in items that are not a multiple of the scale factor in size. Currently we do not modify keg layouts to force the creation of redzones. - Use a non-zero code for marking freed per-CPU items, otherwise accesses of freed per-CPU items are not detected by the runtime. Sponsored by: The FreeBSD Foundation
* realloc: Fix KASAN(9) shadow map updatesMark Johnston2021-05-051-0/+3
| | | | | | | | | | | | When copying from the old buffer to the new buffer, we don't know the requested size of the old allocation, but only the size of the allocation provided by UMA. This value is "alloc". Because the copy may access bytes in the old allocation's red zone, we must mark the full allocation valid in the shadow map. Do so using the correct size. Reported by: kp Tested by: kp Sponsored by: The FreeBSD Foundation
* Improve UMA cache reclamation.Alexander Motin2021-05-021-56/+123
| | | | | | | | | | | | | | | | | | | | | | | | | | | When estimating working set size, measure only allocation batches, not free batches. Allocation and free patterns can be very different. For example, ZFS on vm_lowmem event can free to UMA few gigabytes of memory in one call, but it does not mean it will request the same amount back that fast too, in fact it won't. Update working set size on every reclamation call, shrinking caches faster under pressure. Lack of this caused repeating vm_lowmem events squeezing more and more memory out of real consumers only to make it stuck in UMA caches. I saw ZFS drop ARC size in half before previous algorithm after periodic WSS update decided to reclaim UMA caches. Introduce voluntary reclamation of UMA caches not used for a long time. For each zdom track longterm minimal cache size watermark, freeing some unused items every UMA_TIMEOUT after first 15 minutes without cache misses. Freed memory can get better use by other consumers. For example, ZFS won't grow its ARC unless it see free memory, since it does not know it is not really used. And even if memory is not really needed, periodic free during inactivity periods should reduce its fragmentation. Reviewed by: markj, jeff (previous version) MFC after: 2 weeks Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D29790
* uma: Introduce per-domain reclamation functionsMark Johnston2021-04-141-62/+90
| | | | | | | | | | | | | | | | | | | | Make it possible to reclaim items from a specific NUMA domain. - Add uma_zone_reclaim_domain() and uma_reclaim_domain(). - Permit parallel reclamations. Use a counter instead of a flag to synchronize with zone_dtor(). - Use the zone lock to protect cache_shrink() now that parallel reclaims can happen. - Add a sysctl that can be used to trigger reclamation from a specific domain. Currently the new KPIs are unused, so there should be no functional change. Reviewed by: mav MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D29685
* uma: Split bucket_cache_drain() to permit per-domain reclamationMark Johnston2021-04-141-36/+42
| | | | | | | | | Note that the per-domain variant does not shrink the target bucket size. No functional change intended. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
* uma: Add KASAN state transitionsMark Johnston2021-04-131-20/+138
| | | | | | | | | | | | | | | | | | | | | - Add a UMA_ZONE_NOKASAN flag to indicate that items from a particular zone should not be sanitized. This is applied implicitly for NOFREE and cache zones. - Add KASAN call backs which get invoked: 1) when a slab is imported into a keg 2) when an item is allocated from a zone 3) when an item is freed to a zone 4) when a slab is freed back to the VM In state transitions 1 and 3, memory is poisoned so that accesses will trigger a panic. In state transitions 2 and 4, memory is marked valid. - Disable trashing if KASAN is enabled. It just adds extra CPU overhead to catch problems that are detected by KASAN. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D29456
* uma: allow uma_zfree_pcu(..., NULL)Kristof Provost2021-03-121-0/+5
| | | | | | | | | | | | We already allow free(NULL) and uma_zfree(..., NULL). Make uma_zfree_pcpu(..., NULL) work as well. This also means that counter_u64_free(NULL) will work. These make cleanup code simpler. MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D29189
* uma: Update the comment above startup_alloc() to reflect realityMark Johnston2021-02-221-3/+3
| | | | | | | | The scheme used for early slab allocations changed in commit a81c400e75. Reported by: alc Reviewed by: alc MFC after: 1 week
* uma: Avoid unmapping direct-mapped slabsMark Johnston2021-01-031-1/+7
| | | | | | | | | | | | | | | | startup_alloc() uses pmap_map() to map slabs used for bootstrapping the VM. pmap_map() may ignore the hint address and simply return a range from the direct map. In this case we must not unmap the range in startup_free(). UMA uses bootstart and bootmem to track the range of KVA into which slabs are mapped if the direct map is not used. Unmap a startup slab only if it was mapped into that range. Reported by: alc Reviewed by: alc, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27885
* uma dbg: catch more corruption with atomicsRyan Libby2020-12-311-5/+4
| | | | | | | | | | Use atomic testandset and testandclear to catch concurrent double free, and to reduce the number of atomic operations. Submitted by: jeff Reviewed by: cem, kib, markj (all previous version) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D22703