| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
Change the two functions that use local tailq variables to use the
plinks.q field, instead of the listq field, for the pointers.
This will resolve one source of conflict when the tailq field and the
object field come to share the same space in a future change to the
vm_page definition.
Reviewed by: alc, kib
Differential Revision: https://reviews.freebsd.org/D50094
|
|
|
|
|
|
|
|
| |
The header vm_object.h is included in vm_phys.h and uma_core.h, where
it is not necessary. Remove it.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D50081
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After commit 389a3fa693ef, uma_reclaim_domain(UMA_RECLAIM_DRAIN_CPU)
calls uma_zone_reclaim_domain(UMA_RECLAIM_DRAIN_CPU) twice on each zone
in addition to globally draining per-CPU caches. This was unintended
and is unnecessarily slow; in particular, draining per-CPU caches
requires binding to each CPU.
Stop draining per-CPU caches when visiting each zone, just do it once in
pcpu_cache_drain_safe() to minimize the amount of expensive sched_bind()
calls.
Fixes: 389a3fa693ef ("uma: Add UMA_ZONE_UNMANAGED")
MFC after: 1 week
Sponsored by: Klara, Inc.
Sponsored by: NetApp, Inc.
Reviewed by: gallatin, kib
Differential Revision: https://reviews.freebsd.org/D49349
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The ktls buffer zone allocates 16k contiguous buffers, and often needs
to call vm_page_reclaim_contig_domain_ext() to free up contiguous
memory, which can be expensive. Web servers which have a daily
pattern of peaks and troughs end up having UMA trim the
ktls_buffer_zone when they are in their trough, and end up re-building
it on the way to their peak.
Rather than calling vm_page_reclaim_contig_domain_ext() multiple times
on a daily basis, lets mark the ktls_buffer_zone with a new UMA flag,
UMA_ZONE_NOTRIM. This disables UMA_RECLAIM_TRIM on the zone, but
allows UMA_RECLAIM_DRAIN* operations, so that if we become extremely
short of memory (vm_page_count_severe()), the uma reclaim worker can
still free up memory.
Note that UMA_ZONE_UNMANAGED already exists, but can never be drained
or trimmed, so it may hold on to memory during times of severe memory
pressure. Using UMA_ZONE_NOTRIM rather than UMA_ZONE_UNMANAGED is an
attempt to keep this zone more reactive in the face of severe memory
pressure.
Sponsored by: Netflix
Reviewed by: jhb, kib, markj (via slack)
Differential Revision: https://reviews.freebsd.org/D48451
|
|
|
|
|
|
|
|
|
| |
This patch adds an additional malloc(9) flag to distinguish allocations
that are never freed during runtime.
Differential Revision: https://reviews.freebsd.org/D45045
Reviewed by: alc, kib, markj
Tested by: alc
|
|
|
|
|
|
|
|
|
|
|
|
| |
UMA_MD_SMALL_ALLOC was recently replaced by UMA_USE_DMAP, but
da76d349b6b1 missed some improper uses of the old symbol.
This change makes sure that UMA_USE_DMAP is used properly in
code that selects uma_small_alloc.
Fixes: da76d349b6b1
Reported by: eduardo, rlibby
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D45368
|
|
|
|
|
|
|
|
|
|
| |
This commit introduces the MINIDUMP_STARTUP_PAGE_TRACKING symbol and
uses it to simplify several instances of a complex preprocessor conditional
for adding pages allocated when bootstraping the kernel to minidumps.
Reviewed by: markj, mhorne
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D45085
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit refactors the UMA small alloc code and
removes most UMA machine-dependent code.
The existing machine-dependent uma_small_alloc code is almost identical
across all architectures, except for powerpc where using the direct
map addresses involved extra steps in some cases.
The MI/MD split was replaced by a default uma_small_alloc
implementation that can be overridden by architecture-specific code by
defining the UMA_MD_SMALL_ALLOC symbol. Furthermore, UMA_USE_DMAP was
introduced to replace most UMA_MD_SMALL_ALLOC uses.
Reviewed by: markj, kib
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D45084
|
|
|
|
|
|
|
|
|
| |
- Pass zone pointer to trash_ctor() and report zone name in the panic
message. It may be difficult to figyre out zone just by the item size.
- Do not pass user arguments to internal trash calls, pass thezone.
- Report malloc type name in the same unified panic message.
- Report corruption offset from the beginning of the items instead of
the full pointer. It makes panic message shorter and more readable.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
New function check_align_mask() asserts (under INVARIANTS) that the mask
fits in a (signed) integer (see the comment) and that the corresponding
alignment is a power of two.
Use check_align_mask() in uma_set_align_mask() and also in uma_zcreate()
to replace the KASSERT() there (that was checking only for a power of
2).
Reviewed by: kib, markj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42263
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In uma_set_align_mask(), ensure that the passed value doesn't have its
highest bit set, which would lead to problems since keg/zone alignment
is internally stored as signed integers. Such big values do not make
sense anyway and indicate some programming error. A future commit will
introduce checks for this case and other ones.
Reviewed by: kib, markj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42262
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Having a special value of -1 that is resolved internally to
'uma_align_cache' provides no significant advantages and prevents
changing that variable to an unsigned type, which is natural for an
alignment mask. So suppress it and replace its use with a call to
uma_get_align_mask(). The small overhead of the added function call is
irrelevant since UMA_ALIGN_CACHE is only used when creating new zones,
which is not performance critical.
Reviewed by: markj, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42259
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Create the uma_get_cache_align_mask() accessor and put it in a separate
private header so as to minimize namespace pollution in header/source
files that need only this function and not the whole 'uma.h' header.
Make sure the accessors have '_mask' as a suffix, so that callers are
aware that the real alignment is the power of two that is the mask plus
one. Rename the stem to something more explicit. Rename
uma_set_cache_align_mask()'s single parameter to 'mask'.
Hide 'uma_align_cache' to ensure that it cannot be set in any other way
then by a call to uma_set_cache_align_mask(), which will perform sanity
checks in a further commit. While here, rename it to
'uma_cache_align_mask'.
This is also in preparation for some further changes, such as improving
the sanity checks, eliminating internal resolving of UMA_ALIGN_CACHE and
changing the type of the 'uma_cache_align_mask' variable.
Reviewed by: markj, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D42258
|
|
|
|
| |
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
|
|
|
|
|
|
|
|
|
| |
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.
Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix
|
|
|
|
|
|
|
|
|
| |
Items allocated from cache zones cannot usefully be protected by
memguard.
PR: 267151
Reported and tested by: pho
MFC after: 1 week
|
|
|
|
|
|
| |
Reviewed by: kib, markj
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D36549
|
|
|
|
|
|
| |
The functions pbuf_init, kva_alloc, and keg_alloc_slab are significant
contributors to the kernel boot time when FreeBSD boots inside the
Firecracker VMM. Instrument them so they show up on flamecharts.
|
|
|
|
|
|
|
|
|
|
| |
This is not completely exhaustive, but covers a large majority of
commands in the tree.
Reviewed by: markj
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D35583
|
|
|
|
| |
Fixes: 93cd28ea82bb ("uma: Use a taskqueue to execute uma_timeout()")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
uma_timeout() has several responsibilities; it visits every UMA zone and
as of recently will drain underutilized caches, so is rather expensive
(>1ms in some cases). Currently it is executed by softclock threads
and so will preempt most other CPU activity. None of this work requires
a high scheduling priority, though, so defer it to a taskqueue so as to
avoid stalling higher-priority work.
Reviewed by: rlibby, alc, mav, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35738
|
|
|
|
|
|
|
| |
Otherwise zone initializers can produce false positives, e.g., when
lock_init() attempts to detect double initialization.
Sponsored by: The FreeBSD Foundation
|
| |
|
|
|
|
|
|
|
| |
The limit accounting in UMA does not tolerate this.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
|
|
|
|
|
|
|
| |
zone_alloc_bucket() returns a pointer, not a bool.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
|
|
|
|
|
|
|
|
|
|
|
| |
It was only called in the non-NUMA and single-domain paths.
Some of its assertions were duplicated in uma_zalloc_domain,
but some things were missed, especially memguard.
Reviewed by: markj, rstone
MFC after: 1 week
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D34472
|
|
|
|
|
|
|
|
|
|
|
|
| |
The uma_zalloc functions expect exactly one of [M_NOWAIT, M_WAITOK].
If neither or both are passed, print an error and a stack dump.
Only do this ten times, to prevent livelock. In the future, after
this exposes enough bad callers, this will be changed to a KASSERT().
Reviewed by: rstone, markj
MFC after: 1 month
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D34452
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allow a zone to opt out of cache size management. In particular,
uma_reclaim() and uma_reclaim_domain() will not reclaim any memory from
the zone, nor will uma_timeout() purge cached items if the zone is idle.
This effectively means that the zone consumer has control over when
items are reclaimed from the cache. In particular, uma_zone_reclaim()
will still reclaim cached items from an unmanaged zone.
Reviewed by: hselasky, kib
MFC after: 3 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34142
|
|
|
|
|
|
|
|
|
|
|
|
| |
Buckets in an SMR-enabled zone can legitimately be tagged with
SMR_SEQ_INVALID. This effectively means that the zone destructor (if
any) was invoked on all items in the bucket, and the contained memory is
safe to reuse. If the first bucket in the full bucket list was tagged
this way, UMA would unnecessarily poll per-CPU state before attempting
to fetch a full bucket from the list.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
|
|
|
|
|
|
| |
Reviewed by: imp
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D33763
|
| |
|
| |
|
|
|
|
|
| |
Reviewed by: markj
Differential revision: https://reviews.freebsd.org/D33272
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Kegs with no items reserved have uk_reserve = 0. So the check
keg->uk_reserve >= dom->ud_free_items will be true once all slabs are
depleted. Then, rather than go and allocate a fresh slab, we return to
the cache layer.
The intent was to do this only when the keg actually has a reserve, so
modify the check to verify this first. Another approach would be to
make uk_reserve signed and set it to -1 until uma_zone_reserve() is
called, but this requires a few casts elsewhere.
Fixes: 1b2dcc8c54a8 ("uma: Avoid depleting keg reserves when filling a bucket")
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32516
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
M_USE_RESERVE is used in a couple of places in the VM to avoid unbounded
recursion when the direct map is not available, as is the case on 32-bit
platforms or when certain kernel sanitizers (KASAN and KMSAN) are
enabled. For example, to allocate KVA, the kernel might allocate a
kernel map entry, which might require a new slab, which requires KVA.
For these zones, we use uma_prealloc() to populate a reserve of items,
and then in certain serialized contexts M_USE_RESERVE can be used to
guarantee a successful allocation. uma_prealloc() allocates the
requested number of items, distributing them evenly among NUMA domains.
Thus, in a first-touch zone, to satisfy an M_USE_RESERVE allocation we
might have to check the slab lists of other domains than the current one
to provide the semantics expected by consumers.
So, try harder to find an item if M_USE_RESERVE is specified and the keg
doesn't have anything for current (first-touch) domain. Specifically,
fall back to a round-robin slab allocation. This change fixes boot-time
panics on NUMA systems with KASAN or KMSAN enabled.[1]
Alternately we could have uma_prealloc() allocate the requested number
of items for each domain, but for some existing consumers this would be
quite wasteful. In general I think keg_fetch_slab() should try harder
to find free slabs in other domains before trying to allocate fresh
ones, but let's limit this to M_USE_RESERVE for now.
Also fix a separate problem that I noticed: in a non-round-robin slab
allocation with M_WAITOK, rather than sleeping after a failed slab
allocation we simply try again. Call vm_wait_domain() before retrying.
Reported by: mjg, tuexen [1]
Reviewed by: alc
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32515
|
|
|
|
|
|
| |
Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32037
|
|
|
|
|
|
|
|
|
| |
Remove now-unneeded page zeroing. No functional change intended.
Reviewed by: alc, hselasky, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove page zeroing code from consumers and stop specifying
VM_ALLOC_NOOBJ. In a few places, also convert an allocation loop to
simply use VM_ALLOC_WAITOK.
Similarly, convert vm_page_alloc_domain() callers.
Note that callers are now responsible for assigning the pindex.
Reviewed by: alc, hselasky, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31986
|
|
|
|
|
|
|
|
| |
This is useful for measuring the number of pages that could be freed
from a NOFREE zone under memory pressure.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For now, just hook the allocation path: upon allocation, items are
marked as initialized (absent M_ZERO). Some zones are exempted from
this when it would otherwise raise false positives.
Use kmsan_orig() to update the origin map for UMA and malloc(9)
allocations. This allows KMSAN to print the return address when an
uninitialized UMA item is implicated in a report. For example:
panic: MSan: Uninitialized UMA memory from m_getm2+0x7fe
Sponsored by: The FreeBSD Foundation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Ensure that all items returned by UMA are aligned to
KASAN_SHADOW_SCALE (8). This was true in practice since smaller
alignments are not used by any consumers, but we should enforce it
anyway.
- Use a non-zero code for marking redzones that appear naturally in
items that are not a multiple of the scale factor in size. Currently
we do not modify keg layouts to force the creation of redzones.
- Use a non-zero code for marking freed per-CPU items, otherwise
accesses of freed per-CPU items are not detected by the runtime.
Sponsored by: The FreeBSD Foundation
|
|
|
|
|
|
|
|
|
|
|
|
| |
When copying from the old buffer to the new buffer, we don't know the
requested size of the old allocation, but only the size of the
allocation provided by UMA. This value is "alloc". Because the copy
may access bytes in the old allocation's red zone, we must mark the full
allocation valid in the shadow map. Do so using the correct size.
Reported by: kp
Tested by: kp
Sponsored by: The FreeBSD Foundation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When estimating working set size, measure only allocation batches, not free
batches. Allocation and free patterns can be very different. For example,
ZFS on vm_lowmem event can free to UMA few gigabytes of memory in one call,
but it does not mean it will request the same amount back that fast too, in
fact it won't.
Update working set size on every reclamation call, shrinking caches faster
under pressure. Lack of this caused repeating vm_lowmem events squeezing
more and more memory out of real consumers only to make it stuck in UMA
caches. I saw ZFS drop ARC size in half before previous algorithm after
periodic WSS update decided to reclaim UMA caches.
Introduce voluntary reclamation of UMA caches not used for a long time. For
each zdom track longterm minimal cache size watermark, freeing some unused
items every UMA_TIMEOUT after first 15 minutes without cache misses. Freed
memory can get better use by other consumers. For example, ZFS won't grow
its ARC unless it see free memory, since it does not know it is not really
used. And even if memory is not really needed, periodic free during
inactivity periods should reduce its fragmentation.
Reviewed by: markj, jeff (previous version)
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D29790
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make it possible to reclaim items from a specific NUMA domain.
- Add uma_zone_reclaim_domain() and uma_reclaim_domain().
- Permit parallel reclamations. Use a counter instead of a flag to
synchronize with zone_dtor().
- Use the zone lock to protect cache_shrink() now that parallel reclaims
can happen.
- Add a sysctl that can be used to trigger reclamation from a specific
domain.
Currently the new KPIs are unused, so there should be no functional
change.
Reviewed by: mav
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29685
|
|
|
|
|
|
|
|
|
| |
Note that the per-domain variant does not shrink the target bucket size.
No functional change intended.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Add a UMA_ZONE_NOKASAN flag to indicate that items from a particular
zone should not be sanitized. This is applied implicitly for NOFREE
and cache zones.
- Add KASAN call backs which get invoked:
1) when a slab is imported into a keg
2) when an item is allocated from a zone
3) when an item is freed to a zone
4) when a slab is freed back to the VM
In state transitions 1 and 3, memory is poisoned so that accesses will
trigger a panic. In state transitions 2 and 4, memory is marked
valid.
- Disable trashing if KASAN is enabled. It just adds extra CPU overhead
to catch problems that are detected by KASAN.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29456
|
|
|
|
|
|
|
|
|
|
|
|
| |
We already allow free(NULL) and uma_zfree(..., NULL). Make
uma_zfree_pcpu(..., NULL) work as well.
This also means that counter_u64_free(NULL) will work.
These make cleanup code simpler.
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D29189
|
|
|
|
|
|
|
|
| |
The scheme used for early slab allocations changed in commit a81c400e75.
Reported by: alc
Reviewed by: alc
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
startup_alloc() uses pmap_map() to map slabs used for bootstrapping the
VM. pmap_map() may ignore the hint address and simply return a range
from the direct map. In this case we must not unmap the range in
startup_free().
UMA uses bootstart and bootmem to track the range of KVA into which
slabs are mapped if the direct map is not used. Unmap a startup slab
only if it was mapped into that range.
Reported by: alc
Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27885
|
|
|
|
|
|
|
|
|
|
| |
Use atomic testandset and testandclear to catch concurrent double free,
and to reduce the number of atomic operations.
Submitted by: jeff
Reviewed by: cem, kib, markj (all previous version)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D22703
|