summaryrefslogtreecommitdiff
path: root/lib/libc/stdlib/malloc.c
Commit message (Collapse)AuthorAgeFilesLines
* Fix cut/paste brain-o. Spell #endif correctly.Peter Wemm2014-08-071-1/+1
| | | | Notes: svn path=/stable/8/; revision=269684
* Like on stable/9 and later, don't redefine MALLOC_PRODUCTIONPeter Wemm2014-08-071-0/+2
| | | | Notes: svn path=/stable/8/; revision=269683
* MFC r203077:Ed Maste2010-02-091-0/+1
| | | | | | | | | | | Add missing return, in a rare case where we can't allocate memory in deallocate. Submitted by: Ryan Stone (rysto32 at gmail dot com) Approved by: jasone Notes: svn path=/stable/8/; revision=203701
* MFC r197524Alan Cox2009-11-021-0/+15
| | | | | | | Make malloc(3) superpage aware. Notes: svn path=/stable/8/; revision=198815
* MFC r196861:Konstantin Belousov2009-09-121-0/+9
| | | | | | | | | | Handle zero size for posix_memalign. Return NULL or unique address according to the 'V' option. Approved by: re (kensmith) Notes: svn path=/stable/8/; revision=197127
* Remove extra debugging support that is turned on for head but turned offKen Smith2009-09-101-1/+1
| | | | | | | | | | | | | | | | | | for stable branches: - shift to MALLOC_PRODUCTION - turn off automatic crash dumps - Remove kernel debuggers, INVARIANTS*[1], WITNESS* from GENERIC kernel config files[2] [1] INVARIANTS* left on for ia64 by request marcel [2] sun4v was left as-is Reviewed by: marcel, kib Approved by: re (implicit) Notes: svn path=/stable/8/; revision=197065
* Fix a lock order reversal bug that could cause deadlock during fork(2).Jason Evans2008-12-011-11/+37
| | | | | | | Reported by: kib Notes: svn path=/head/; revision=185514
* Adjust an assertion to handle the case where a lock is contested, butJason Evans2008-11-301-1/+1
| | | | | | | | | spinning is avoided due to running on a single-CPU system. Reported by: stefanf Notes: svn path=/head/; revision=185483
* Do not spin when trying to lock on a single-CPU system.Jason Evans2008-11-301-11/+13
| | | | | | | Reported by: davidxu Notes: svn path=/head/; revision=185468
* Revert to preferring mmap(2) over sbrk(2) when mapping memory, due toJason Evans2008-11-031-12/+17
| | | | | | | | | | potential extreme contention in the kernel for multi-threaded applications on SMP systems. Reported by: kris Notes: svn path=/head/; revision=184602
* Use PAGE_{SIZE,MASK,SHIFT} from machine/param.h rather than hard-codingJason Evans2008-09-101-120/+88
| | | | | | | | | page size and using sysconf(3). Suggested by: marcel Notes: svn path=/head/; revision=182906
* Unbreak ia64: pges are 8KB.Marcel Moolenaar2008-09-061-1/+1
| | | | Notes: svn path=/head/; revision=182809
* Add thread-specific caching for small size classes, based on magazines.Jason Evans2008-08-271-231/+1080
| | | | | | | | | | | | | | | | | | | | | | | | | | | This caching allows for completely lock-free allocation/deallocation in the steady state, at the expense of likely increased memory use and fragmentation. Reduce the default number of arenas to 2*ncpus, since thread-specific caching typically reduces arena contention. Modify size class spacing to include ranges of 2^n-spaced, quantum-spaced, cacheline-spaced, and subpage-spaced size classes. The advantages are: fewer size classes, reduced false cacheline sharing, and reduced internal fragmentation for allocations that are slightly over 512, 1024, etc. Increase RUN_MAX_SMALL, in order to limit fragmentation for the subpage-spaced size classes. Add a size-->bin lookup table for small sizes to simplify translating sizes to size classes. Include a hard-coded constant table that is used unless custom size class spacing is specified at run time. Add the ability to disable tiny size classes at compile time via MALLOC_TINY. Notes: svn path=/head/; revision=182225
* Move CPU_SPINWAIT into the innermost spin loop, in order to allow fasterJason Evans2008-08-141-2/+3
| | | | | | | | | preemption while busy-waiting. Submitted by: Mike Schuster <schuster@adobe.com> Notes: svn path=/head/; revision=181733
* Re-order the terms of an expression in arena_run_reg_dalloc() to correctlyJason Evans2008-08-141-2/+2
| | | | | | | | | detect whether the integer division table is large enough to handle the divisor. Before this change, the last two table elements were never used, thus causing the slow path to be used for those divisors. Notes: svn path=/head/; revision=181732
* Remove variables which are assigned values and never used thereafter.Colin Percival2008-08-081-5/+1
| | | | | | | | Found by: LLVM/Clang Static Checker Approved by: jasone Notes: svn path=/head/; revision=181438
* Enhance arena_chunk_map_t to directly support run coalescing, and useJason Evans2008-07-181-394/+338
| | | | | | | | | | the chunk map instead of red-black trees where possible. Remove the red-black trees and node objects that are obsoleted by this change. The net result is a ~1-2% memory savings, and a substantial allocation speed improvement. Notes: svn path=/head/; revision=180599
* In the error path through base_alloc(), release base_mtx [1].Jason Evans2008-06-101-3/+7
| | | | | | | | | Fix bit vector initialization for run headers. Submitted by: [1] Mike Schuster <schuster@adobe.com> Notes: svn path=/head/; revision=179704
* Add a separate tree to track arena chunks that contain dirty pages.Jason Evans2008-05-011-157/+133
| | | | | | | | | | This substantially improves worst case allocation performance, since O(lg n) tree search can be used instead of O(n) tree iteration. Use rb_wrap() instead of directly calling rb_*() macros. Notes: svn path=/head/; revision=178709
* Set QUANTUM_2POW_MIN and SIZEOF_PTR_2POW parameters for MIPSOleksandr Tymoshenko2008-04-291-0/+5
| | | | | | | Approved by: imp Notes: svn path=/head/; revision=178683
* Check for integer overflow before calling sbrk(2), since it uses aJason Evans2008-04-291-0/+7
| | | | | | | signed increment argument, but the size is an unsigned integer. Notes: svn path=/head/; revision=178645
* Implement red-black trees without using parent pointers, and store theJason Evans2008-04-231-116/+171
| | | | | | | | | | | | color bit in the least significant bit of the right child pointer, in order to reduce red-black tree linkage overhead by ~2X as compared to sys/tree.h. Use the new red-black tree implementation in malloc, which drops memory usage by ~0.5 or ~1%, for 32- and 64-bit systems, respectively. Notes: svn path=/head/; revision=178440
* Remove stale #include <machine/atomic.h>, which as needed by lazyJason Evans2008-03-071-4/+4
| | | | | | | deallocation. Notes: svn path=/head/; revision=176909
* Fix a race condition in arena_ralloc() for shrinking in-place largeJason Evans2008-02-171-25/+41
| | | | | | | | | | | reallocation, when junk filling is enabled. Junk filling must occur prior to shrinking, since any deallocated trailing pages are immediately available for use by other threads. Reported by: Mats Palmgren <mats.palmgren@bredband.net> Notes: svn path=/head/; revision=176369
* Remove support for lazy deallocation. Benchmarks across a wide range ofJason Evans2008-02-171-209/+3
| | | | | | | | | | | allocation patterns, number of CPUs, and MALLOC_OPTIONS settings indicate that lazy deallocation has the potential to worsen throughput dramatically. Performance degradation occurs when multiple threads try to clear the lazy free cache simultaneously. Various experiments to avoid this bottleneck failed to completely solve this problem, while adding yet more complexity. Notes: svn path=/head/; revision=176368
* Fix a bug in lazy deallocation that was introduced whenJason Evans2008-02-081-7/+10
| | | | | | | | | | | arena_dalloc_lazy_hard() was split out of arena_dalloc_lazy() in revision 1.162. Reduce thundering herd problems in lazy deallocation by randomly varying how many probes a thread does before taking the slow path. Notes: svn path=/head/; revision=176103
* Clean up manipulation of chunk page map elements to remove some tenuousJason Evans2008-02-081-362/+357
| | | | | | | | | | | | | | assumptions about whether bits are set at various times. This makes adding other flags safe. Reorganize functions in order to inline i{m,c,p,s,re}alloc(). This allows the entire fast-path call chains for malloc() and free() to be inlined. [1] Suggested by: [1] Stuart Parmenter <stuart@mozilla.com> Notes: svn path=/head/; revision=176100
* Track dirty unused pages so that they can be purged if they exceed aJason Evans2008-02-061-664/+956
| | | | | | | | | | | | | | | | | | | | | | | | | | | | threshold, according to the 'F' MALLOC_OPTIONS flag. This obsoletes the 'H' flag. Try to realloc() large objects in place. This substantially speeds up incremental large reallocations in the common case. Fix a bug in arena_ralloc() that caused relocation of sub-page objects even if the old and new sizes were in the same size class. Maintain trees of runs and simplify the per-chunk page map. This allows logarithmic-time searching for sufficiently large runs in arena_run_alloc(), whereas the previous algorithm required linear time in the worst case. Break various large functions into smaller sub-functions, and inline only the functions that are in the fast path for small object allocation/deallocation. Remove an unnecessary check in base_pages_alloc_mmap(). Avoid integer division in choose_arena() for the NO_TLS case on single-CPU systems. Notes: svn path=/head/; revision=176022
* Enable both sbrk(2)- and mmap(2)-based memory acquisition methods byJason Evans2008-01-031-7/+8
| | | | | | | | | | | | default. This has the disadvantage of rendering the datasize resource limit irrelevant, but without this change, legitimate uses of more memory than will fit in the data segment are thwarted by default. Fix chunk_alloc_mmap() to work correctly if initial mapping is not chunk-aligned and mapping extension fails. Notes: svn path=/head/; revision=175075
* Fix a major chunk-related memory leak in chunk_dealloc_dss_record(). [1]Jason Evans2007-12-311-65/+56
| | | | | | | | | | | Clean up DSS-related locking and protect all pertinent variables with dss_mtx (remove dss_chunks_mtx). This fixes race conditions that could cause chunk leaks. Reported by: [1] kris Notes: svn path=/head/; revision=175011
* Fix a bug related to sbrk() calls that could cause address space leaks.Jason Evans2007-12-311-186/+268
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a long-standing bug, but until recent changes it was difficult to trigger, and even then its impact was non-catastrophic, with the exception of revision 1.157. Optimize chunk_alloc_mmap() to avoid the need for unmapping pages in the common case. Thanks go to Kris Kennaway for a patch that inspired this change. Do not maintain a record of previously mmap'ed chunk address ranges. The original intent was to avoid the extra system call overhead in chunk_alloc_mmap(), which is no longer a concern. This also allows some simplifications for the tree of unused DSS chunks. Introduce huge_mtx and dss_chunks_mtx to replace chunks_mtx. There was no compelling reason to use the same mutex for these disjoint purposes. Avoid memset() for huge allocations when possible. Maintain two trees instead of one for tracking unused DSS address ranges. This allows scalable allocation of multi-chunk huge objects in the DSS. Previously, multi-chunk huge allocation requests failed if the DSS could not be extended. Notes: svn path=/head/; revision=175004
* Back out premature commit of previous version.Jason Evans2007-12-281-183/+113
| | | | Notes: svn path=/head/; revision=174957
* Maintain two trees instead of one (old_chunks --> old_chunks_{ad,szad}) inJason Evans2007-12-281-113/+183
| | | | | | | | | | | | order to support re-use of multi-chunk unused regions within the DSS for huge allocations. This generalization is important to correct function when mmap-based allocation is disabled. Avoid zeroing re-used memory in the DSS unless it really needs to be zeroed. Notes: svn path=/head/; revision=174956
* Release chunks_mtx for all paths through chunk_dealloc().Jason Evans2007-12-281-1/+4
| | | | | | | Reported by: kris Notes: svn path=/head/; revision=174953
* Add the 'D' and 'M' run time options, and use them to control whetherJason Evans2007-12-271-291/+435
| | | | | | | | | | | | | | | | | | | | | memory is acquired from the system via sbrk(2) and/or mmap(2). By default, use sbrk(2) only, in order to support traditional use of resource limits. Additionally, when both options are enabled, prefer the data segment to anonymous mappings, in order to coexist better with large file mappings in applications on 32-bit platforms. This change has the potential to increase memory fragmentation due to the linear nature of the data segment, but from a performance perspective this is mitigated by the use of madvise(2). [1] Add the ability to interpret integer prefixes in MALLOC_OPTIONS processing. For example, MALLOC_OPTIONS=lllllllll can now be specified as MALLOC_OPTIONS=9l. Reported by: [1] rwatson Design review: [1] alc, peter, rwatson Notes: svn path=/head/; revision=174950
* Use fixed point integer math instead of floating point math whenJason Evans2007-12-181-42/+47
| | | | | | | | | | | | | calculating run sizes. Use of the floating point unit was a potential pessimization to context switching for applications that do not otherwise use floating point math. [1] Reformat cpp macro-related comments to improve consistency. Submitted by: das Notes: svn path=/head/; revision=174745
* Refactor features a bit in order to make it possible to disable lazyJason Evans2007-12-171-52/+127
| | | | | | | | | | | | deallocation and dynamic load balancing via the MALLOC_LAZY_FREE and MALLOC_BALANCE knobs. This is a non-functional change, since these features are still enabled when possible. Clean up a few things that more pedantic compiler settings would cause complaints over. Notes: svn path=/head/; revision=174695
* Only zero large allocations when necessary (for calloc()).Jason Evans2007-11-281-1/+1
| | | | Notes: svn path=/head/; revision=174002
* Implement dynamic load balancing of thread-->arena mapping, based on lockJason Evans2007-11-271-58/+297
| | | | | | | | | | | | | | | | | | | | contention. The intent is to dynamically adjust to load imbalances, which can cause severe contention. Use pthread mutexes where possible instead of libc "spinlocks" (they aren't actually spin locks). Conceptually, this change is meant only to support the dynamic load balancing code by enabling the use of spin locks, but it has the added apparent benefit of substantially improving performance due to reduced context switches when there is moderate arena lock contention. Proper tuning parameter configuration for this change is a finicky business, and it is very much machine-dependent. One seemingly promising solution would be to run a tuning program during operating system installation that computes appropriate settings for load balancing. (The pthreads adaptive spin locks should probably be similarly tuned.) Notes: svn path=/head/; revision=173968
* Implement lazy deallocation of small objects. For each arena, maintain aJason Evans2007-11-271-0/+218
| | | | | | | | | | | | | | vector of slots for lazily freed objects. For each deallocation, before doing the hard work of locking the arena and deallocating, try several times to randomly insert the object into the vector using atomic operations. This approach is particularly effective at reducing contention for multi-threaded applications that use the producer-consumer model, wherein one producer thread allocates objects, then multiple consumer threads deallocate those objects. Notes: svn path=/head/; revision=173966
* Avoid re-zeroing memory in calloc() when possible.Jason Evans2007-11-271-143/+218
| | | | Notes: svn path=/head/; revision=173965
* Fix stats printing of the amount of memory currently consumed by hugeJason Evans2007-11-271-36/+37
| | | | | | | | | | | | | | allocations. [1] Fix calculation of the number of arenas when 'n' is specified via MALLOC_OPTIONS. Clean up various style inconsistencies. Obtained from: [1] NetBSD Notes: svn path=/head/; revision=173964
* Fix junk/zero filling for realloc(). Junk filling was missing in one case,Jason Evans2007-06-151-36/+48
| | | | | | | | | and zero filling was broken in a way that could cause memory corruption. Update comments. Notes: svn path=/head/; revision=170796
* Use size_t instead of unsigned for pagesize-related values, in order toJason Evans2007-03-291-4/+8
| | | | | | | | | | | | avoid downcasting issues. In particular, this change fixes posix_memalign(3) for alignments greater than 2^31 on LP64 systems. Make sure that NDEBUG is always set to be compatible with MALLOC_DEBUG. [1] Reported by: [1] Lee Hyo geol <hyogeollee@gmail.com> Notes: svn path=/head/; revision=168029
* Remove the run promotion/demotion machinery. Replace it with red-blackJason Evans2007-03-281-430/+219
| | | | | | | | | | | | | | | | | | | | | | | | | | trees that track all non-full runs for each bin. Use the red-black trees to be able to guarantee that each new allocation is placed in the lowest address available in any non-full run. This change completes the transition to allocating from low addresses in order to reduce the retention of sparsely used chunks. If the run in current use by a bin becomes empty, deallocate the run rather than retaining it for later use. The previous behavior had the tendency to spread empty runs across multiple chunks, thus preventing the release of chunks that were completely unused. Generalize base_chunk_alloc() (and rename it to base_pages_alloc()) to handle allocation sizes larger than the chunk size, so that it is possible to support chunk sizes that are smaller than an arena object. Reduce the minimum chunk size from 64kB to 8kB. Optimize tracking of addresses for deleted chunks. Fix a statistics bug for huge allocations. Notes: svn path=/head/; revision=168003
* Fix some subtle bugs for posix_memalign() having to do with integerJason Evans2007-03-241-18/+43
| | | | | | | | | | | rounding and overflow. Carefully document what the various overflow tests actually detect. The bugs mostly canceled out, such that the worst possible failure cases resulted in non-fatal over-allocations. Notes: svn path=/head/; revision=167872
* Fix posix_memalign() for large objects. Now that runs are extents ratherJason Evans2007-03-231-151/+297
| | | | | | | | | | | than binary buddies, the alignment guarantees are weaker, which requires a more complex aligned allocation algorithm, similar to that used for alignment greater than the chunk size. Reported by: matteo Notes: svn path=/head/; revision=167853
* Use extents rather than binary buddies to track free pages withinJason Evans2007-03-231-323/+332
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | chunks. This allows runs to be any multiple of the page size. The primary advantage is that large objects are no longer constrained to be 2^n pages, which can dramatically decrease internal fragmentation for large objects. This also allows the sizes for runs that back small objects to be more finely tuned. Free runs are searched for linearly using the chunk page map (with the help of some heuristic optimizations). This changes the allocation policy from "first best fit" to "first fit". A prototype red-black tree implementation for tracking free runs that implemented "first best fit" did not cause a measurable speed or memory usage difference for realistic chunk sizes (though of course it is possible to construct benchmarks that favor one allocation policy over another). Refine the handling of fullness constraints for small runs to be more tunable. Restructure the per chunk page map to contain only two fields per entry, rather than four. Also, increase each entry from 4 to 8 bytes, since it allows for 32-bit integers, without increasing the number of chunk header pages. Relax the maximum chunk size constraint. This is of no practical interest; it is merely fallout from the chunk page map restructuring. Revamp statistics gathering and reporting to be faster, clearer and more informative. Statistics gathering is fast enough now to have little to no impact on application speed, but it still requires approximately two extra pages of memory per arena (per process). This memory overhead may be acceptable for most systems, but we still need to leave statistics gathering disabled by default in RELENG branches. Rename NO_MALLOC_EXTRAS to MALLOC_PRODUCTION in order to make its intent clearer (i.e. it should be defined in RELENG branches). Notes: svn path=/head/; revision=167828
* Avoid using vsnprintf(3) unless MALLOC_STATS is defined, in order toJason Evans2007-03-201-152/+233
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | avoid substantial potential bloat for static binaries that do not otherwise use any printf(3)-family functions. [1] Rearrange arena_run_t so that the region bitmask can be minimally sized according to constraints related to each bin's size class. Previously, the region bitmask was the same size for all run headers, which wasted a measurable amount of memory. Rather than making runs for small objects as large as possible, make runs as small as possible such that header overhead stays below a certain bound. There are two exceptions that override the header overhead bound: 1) If the bound is impossible to honor, it is relaxed on a per-size-class basis. Since there is one bit of header overhead per object (plus a constant), it is impossible to achieve a header overhead less than or equal to 1/(# of bits per object). For the current setting of maximum 0.5% header overhead, this relaxation comes into play for {2, 4, 8, 16}-byte objects, for which header overhead is (on 64-bit systems) {7.1, 4.3, 2.2, 1.2}%, respectively. 2) There is still a cap on small run size, still set to 64kB. This comes into play for {1024, 2048}-byte objects, for which header overhead is {1.6, 3.1}%, respectively. In practice, this reduces the run sizes, which makes worst case low-water memory usage due to fragmentation less bad. It also reduces worst case high-water run fragmentation due to non-full runs, but this is only a constant improvement (most important to small short-lived processes). Reduce the default chunk size from 2MB to 1MB. Benchmarks indicate that the external fragmentation reduction makes 1MB the new sweet spot (as small as possible without adversely affecting performance). Reported by: [1] kientzle Notes: svn path=/head/; revision=167733
* Modify chunk_alloc() to prefer mmap()ed memory over sbrk()ed memory.Jason Evans2007-02-221-36/+40
| | | | | | | | | | | | | | This has no impact unless USE_BRK is defined (32-bit platforms), in which case user allocations are allocated via mmap() if at all possible, in order to avoid the possibility of unreclaimable chunks in the data segment. Fix an obscure bug in base_alloc() that could have allowed undefined behavior if an application were to use sbrk() in conjunction with a USE_BRK-enabled malloc. Notes: svn path=/head/; revision=166890