aboutsummaryrefslogtreecommitdiff
path: root/sys/vm/vm_init.c
Commit message (Collapse)AuthorAgeFilesLines
* vm: Assert that pagesizes[] is sortedAlan Cox2024-08-041-0/+22
| | | | | | | | Ensure that pmap_init() properly initialized pagesizes[]. In part, we are making this change to document the requirement that the non-zero elements of pagesizes[] must be in ascending order. Reviewed by: kib, markj
* Adjust comments referencing vm_mem_init()Mitchell Horne2024-05-271-1/+1
| | | | | | | | | I cannot find a time where the function was not named this. Reviewed by: kib, markj MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45383
* sys: Automated cleanup of cdefs and other formattingWarner Losh2023-11-271-1/+0
| | | | | | | | | | | | | | | | Apply the following automated changes to try to eliminate no-longer-needed sys/cdefs.h includes as well as now-empty blank lines in a row. Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/ Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/ Remove /\n+#if.*\n#endif.*\n+/ Remove /^#if.*\n#endif.*\n/ Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/ Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/ Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/ Sponsored by: Netflix
* sys: Remove ancient SCCS tags.Warner Losh2023-11-271-2/+0
| | | | | | | | Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script. Sponsored by: Netflix
* sys: Remove $FreeBSD$: one-line .c patternWarner Losh2023-08-161-2/+0
| | | | Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
* tslog: Annotate parts of SYSINIT cpuColin Percival2023-06-041-0/+2
| | | | | | | | | | | | | | | | | | | | | Booting an amd64 kernel on Firecracker with 1 CPU and 128 MB of RAM, SYSINIT cpu takes roughly 2770 us: * 2280 us in vm_ksubmap_init * 535 us in kmem_malloc * 450 us in pmap_zero_page * 1720 us in pmap_growkernel * 1620 us in pmap_zero_page * 80 us in bufinit * 480 us in cpu_setregs * 430 us in cpu_setregs calling load_cr0 Much of this is hypervisor overhead: load_cr0 is slow because it traps to the hypervisor, and 99% of the time in pmap_zero_page is spent when we first touch the page, presumably due to the host Linux kernel faulting in backing pages one by one. Sponsored by: https://www.patreon.com/cperciva Differential Revision: https://reviews.freebsd.org/D40327
* kmem_malloc/free: Use void * instead of vm_offset_t for kernel pointers.John Baldwin2022-09-221-16/+11
| | | | | | Reviewed by: kib, markj Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D36549
* vm: Initialize the transient buffer mapping arena with M_WAITOKMark Johnston2022-04-141-1/+1
| | | | | | | | | The wait flag is passed to UMA when allocating boundary tags for the initial span, and UMA expects either M_WAITOK or M_NOWAIT to be present. Reported by: cperciva MFC after: 1 week Sponsored by: The FreeBSD Foundation
* vm_ksubmap_init: pass M_WAITOK to vmem_init -> uma_zalloc_argEric van Gyzen2022-03-261-1/+1
| | | | | | | | | | uma_zalloc_arg expects exactly one of the two WAIT flags. A future commit will assert this. Reviewed by: rstone MFC after: 1 month Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D34450
* Make MAXPHYS tunable. Bump MAXPHYS to 1M.Konstantin Belousov2020-11-281-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace MAXPHYS by runtime variable maxphys. It is initialized from MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys. Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer cache buffers exactly to atop(maxbcachebuf) (currently it is sized to atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1. The +1 for pbufs allow several pbuf consumers, among them vmapbuf(), to use unaligned buffers still sized to maxphys, esp. when such buffers come from userspace (*). Overall, we save significant amount of otherwise wasted memory in b_pages[] for buffer cache buffers, while bumping MAXPHYS to desired high value. Eliminate all direct uses of the MAXPHYS constant in kernel and driver sources, except a place which initialize maxphys. Some random (and arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted straight. Some drivers, which use MAXPHYS to size embeded structures, get private MAXPHYS-like constant; their convertion is out of scope for this work. Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs, dev/siis, where either submitted by, or based on changes by mav. Suggested by: mav (*) Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27225 Notes: svn path=/head/; revision=368124
* Remove the VM map zone.Mark Johnston2020-08-171-4/+4
| | | | | | | | | | | | | | | | | | | | Today, the zone is only used to allocate a trio of kernel maps: the kernel map itself, and the exec and pipe submaps. Maps for user processes are dynamically allocated but are embedded in the vmspace structure, which is allocated from its own zone. Make the aforementioned kernel maps statically allocated and get rid of the zone. While here, remove a stale comment above vmspace_alloc() and change the names of locks initialized in vm_map_init() to match vmspace_zinit(). Reported by: alc Reviewed by: alc, kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D26052 Notes: svn path=/head/; revision=364302
* Simplify VM and UMA startup by eliminating boot pages. Instead use carefulJeff Roberson2020-01-161-15/+4
| | | | | | | | | | | | | | | ordering to allocate early pages in the same way boot pages were but only as needed. After the KVA allocator has started up we allocate the KVA that we consumed during boot. This also makes the boot pages freeable since they have vm_page structures allocated with the rest of memory. Parts of this patch were written and tested by markj. Reviewed by: glebius, markj Differential Revision: https://reviews.freebsd.org/D23102 Notes: svn path=/head/; revision=356776
* Do not reserve KVA for paging bufs in vm_ksubmap_init(), since nowGleb Smirnoff2019-01-161-11/+2
| | | | | | | | they allocate it in pbuf_init(). This should have been done together with r343030. Notes: svn path=/head/; revision=343100
* Fix some problems that manifest when NUMA domain 0 is empty.Mark Johnston2018-10-301-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | - In uma_prealloc(), we need to check for an empty domain before the first allocation attempt, not after. Fix this by switching uma_prealloc() to use a vm_domainset iterator, which addresses the secondary issue of using a signed domain identifier in round-robin iteration. - Don't automatically create a page daemon for domain 0. - In domainset_empty_vm(), recompute ds_cnt and ds_order after excluding empty domains; otherwise we may frequently specify an empty domain when calling in to the page allocator, wasting CPU time. Convert DOMAINSET_PREF() policies for empty domains to round-robin. - When freeing bootstrap pages, don't count them towards the per-domain total page counts for now: some vm_phys segments are created before the SRAT is parsed and are thus always identified as being in domain 0 even when they are not. Then, when bootstrap pages are freed, they are added to a domain that we had previously thought was empty. Until this is corrected, we simply exclude them from the per-domain page count. Reported and tested by: Rajesh Kumar <rajfbsd@gmail.com> Reviewed by: gallatin MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17704 Notes: svn path=/head/; revision=339925
* Initialize static domainsets regardless of whether an SRAT is present.Mark Johnston2018-10-231-0/+6
| | | | | | | | | Reported by: yuripv X-MFC with: r339452 Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=339664
* Move kernel vmem arena initialization to vm_kern.c.Mark Johnston2018-09-191-83/+1
| | | | | | | | | | | | | | | This keeps the initialization coupled together with the kmem_* KPI implementation, which is the main user of these arenas. No functional change intended. Reviewed by: alc Approved by: re (gjb) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17247 Notes: svn path=/head/; revision=338806
* Remove {max/min}_offset() macros, use vm_map_{max/min}() inlines.Konstantin Belousov2018-08-291-2/+2
| | | | | | | | | | | | | | | Exposing max_offset and min_offset defines in public headers is causing clashes with variable names, for example when building QEMU. Based on the submission by: royger Reviewed by: alc, markj (previous version) Sponsored by: The FreeBSD Foundation (kib) MFC after: 1 week Approved by: re (marius) Differential revision: https://reviews.freebsd.org/D16881 Notes: svn path=/head/; revision=338370
* Eliminate kmem_malloc()'s unused arena parameter. (The arena parameterAlan Cox2018-08-211-2/+1
| | | | | | | | | | | | became unused in FreeBSD 12.x as a side-effect of the NUMA-related changes.) Reviewed by: kib, markj Discussed with: jeff, re@ Differential Revision: https://reviews.freebsd.org/D16825 Notes: svn path=/head/; revision=338143
* Eliminate the unused arena parameter from kmem_alloc_attr().Alan Cox2018-08-181-3/+2
| | | | | | | | Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D16793 Notes: svn path=/head/; revision=338030
* Make UMA and malloc(9) return non-executable memory in most cases.Jonathan T. Looney2018-06-131-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most kernel memory that is allocated after boot does not need to be executable. There are a few exceptions. For example, kernel modules do need executable memory, but they don't use UMA or malloc(9). The BPF JIT compiler also needs executable memory and did use malloc(9) until r317072. (Note that a side effect of r316767 was that the "small allocation" path in UMA on amd64 already returned non-executable memory. This meant that some calls to malloc(9) or the UMA zone(9) allocator could return executable memory, while others could return non-executable memory. This change makes the behavior consistent.) This change makes malloc(9) return non-executable memory unless the new M_EXEC flag is specified. After this change, the UMA zone(9) allocator will always return non-executable memory, and a KASSERT will catch attempts to use the M_EXEC flag to allocate executable memory using uma_zalloc() or its variants. Allocations that do need executable memory have various choices. They may use the M_EXEC flag to malloc(9), or they may use a different VM interfact to obtain executable pages. Now that malloc(9) again allows executable allocations, this change also reverts most of r317072. PR: 228927 Reviewed by: alc, kib, markj, jhb (previous version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D15691 Notes: svn path=/head/; revision=335068
* Fix boot_pages exhaustion on machines with many domains and cores, whereGleb Smirnoff2018-02-091-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | size of UMA zone allocation is greater than page size. In this case zone of zones can not use UMA_MD_SMALL_ALLOC, and we need to postpone switch off of this zone from startup_alloc() until full launch of VM. o Always supply number of VM zones to uma_startup_count(). On machines with UMA_MD_SMALL_ALLOC ignore it completely, unless zsize goes over a page. In the latter case account VM zones for number of allocations from the zone of zones. o Rewrite startup_alloc() so that it will immediately switch off from itself any zone that is already capable of running real alloc. In worst case scenario we may leak a single page here. See comment in uma_startup_count(). o Hardcode call to uma_startup2() into vm_mem_init(). Otherwise some extra SYSINITs, e.g. vm_page_init() may sneak in before. o While here, remove uma_boot_pages_mtx. With recent changes to boot pages calculation, we are guaranteed to use all of the boot_pages in the early single threaded stage. Reported & tested by: mav Notes: svn path=/head/; revision=329058
* Use per-domain locks for vm page queue free. Move paging control fromJeff Roberson2018-02-061-0/+1
| | | | | | | | | | | | | | global to per-domain state. Protect reservations with the free lock from the domain that they belong to. Refactor to make vm domains more of a first class object. Reviewed by: markj, kib, gallatin Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14000 Notes: svn path=/head/; revision=328954
* Fix boot_pages calculation for machines that don't have UMA_MD_SMALL_ALLOC.Gleb Smirnoff2018-02-061-1/+13
| | | | | | | | | | | | | o Call uma_startup1() after initializing kmem, vmem and domains. o Include 8 eight VM startup pages into uma_startup_count() calculation. o Account for vmem_startup() and vm_map_startup() preallocating pages. o Account for extra two allocations done by kmem_init() and vmem_create(). o Hardcode the place of execution of vm_radix_reserve_kva(). Using SYSINIT allowed several other SYSINITs to sneak in before it, thus bumping requirement for amount of boot pages. Notes: svn path=/head/; revision=328952
* Implement NUMA policy for kmem_*(9). This maintains compatibility withJeff Roberson2018-01-121-8/+23
| | | | | | | | | | | | | reservations by giving each memory domain its own KVA space in vmem that is naturally aligned on superpage boundaries. Reviewed by: alc, markj, kib (some objections) Sponsored by: Netflix, Dell/EMC Isilon Tested by; pho Differential Revision: https://reviews.freebsd.org/D13289 Notes: svn path=/head/; revision=327899
* SPDX: Consider code from Carnegie-Mellon University.Pedro F. Giffuni2017-11-301-1/+1
| | | | | | | Interesting cases, most likely from CMU Mach sources. Notes: svn path=/head/; revision=326403
* sys: further adoption of SPDX licensing ID tags.Pedro F. Giffuni2017-11-201-0/+2
| | | | | | | | | | | | | | | | | Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point. Notes: svn path=/head/; revision=326023
* Renumber copyright clause 4Warner Losh2017-02-281-1/+1
| | | | | | | | | | | | Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96 Notes: svn path=/head/; revision=314436
* Remove a redundant use of min().Mark Johnston2017-01-051-2/+1
| | | | | | | | Reported by: rpokala X-MFC With: r311346 Notes: svn path=/head/; revision=311352
* Add a small allocator for exec_map entries.Mark Johnston2017-01-051-6/+12
| | | | | | | | | | | | | | | | | | | | | | Upon each execve, we allocate a KVA range for use in copying data to the new image. Pages must be faulted into the range, and when the range is freed, the backing pages are freed and their mappings are destroyed. This is a lot of needless overhead, and the exec_map management becomes a bottleneck when many CPUs are executing execve concurrently. Moreover, the number of available ranges is fixed at 16, which is insufficient on large systems and potentially excessive on 32-bit systems. The new allocator reduces overhead by making exec_map allocations persistent. When a range is freed, pages backing the range are marked clean and made easy to reclaim. With this change, the exec_map is sized based on the number of CPUs. Reviewed by: kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D8921 Notes: svn path=/head/; revision=311346
* Conditionally move initial vfs bio alloc above 4GAndrew Gallatin2016-10-031-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | On machines with just the wrong amount of physical memory (enough to have a lot of bufs, but not enough to use VM_FREELIST_DMA32) it is possible for 32-bit address limited devices to have little to no memory left when attaching, due to potentially large vfs bio configs consuming all memory below 4GB not protected by VM_FREELIST_ISADMA. This causes the 32-bit devices to allocate from VM_FREELIST_ISADMA, leaving that freelist emtpy when ISA devices need DMAable memory. Rather than decrease VM_DMA32_NPAGES_THRESHOLD, use the time honored technique of putting initially allocated kernel data structs at the end (or at least not the beginning) of memory. Since this allocation is done at boot and is wired, is not freed, so the system is low on 32-bit (and ISA) dma'ble memory forever. So it is a good candidate to move above 4GB. While here, remove an unneeded round_page() from kmem_malloc's size argument as suggested by alc. The first thing kmem_malloc() does is a round_page(size), so there is no need to do it before the call. Reviewed by: alc Sponsored by: Netflix Notes: svn path=/head/; revision=306637
* Parallelize the buffer cache and rewrite getnewbuf(). This results in aJeff Roberson2015-10-141-1/+5
| | | | | | | | | | | | | | | | | | | | | | | 8x performance improvement in a micro benchmark on a 4 socket machine. - Get buffer headers from a per-cpu uma cache that sits in from of the free queue. - Use a per-cpu quantum cache in vmem to eliminate contention for kva. - Use multiple clean queues according to buffer cache size to eliminate clean queue lock contention. - Introduce a bufspace daemon that attempts to prevent getnewbuf() callers from blocking or doing direct recycling. - Close some bufspace allocation races that could lead to endless recycling. - Further the transition to a more modern style of small functions grouped by prefix in order to improve growing complexity. Sponsored by: EMC / Isilon Reviewed by: kib Tested by: pho Notes: svn path=/head/; revision=289279
* Pull in r267961 and r267973 again. Fix for issues reported will follow.Hans Petter Selasky2014-06-281-2/+1
| | | | Notes: svn path=/head/; revision=267992
* Revert r267961, r267973:Glen Barber2014-06-271-1/+2
| | | | | | | | | | | | | These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory Notes: svn path=/head/; revision=267985
* Extend the meaning of the CTLFLAG_TUN flag to automatically check ifHans Petter Selasky2014-06-271-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=267961
* Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping useJohn Baldwin2013-09-091-1/+1
| | | | | | | | | | | | | | | | an address in the first 2GB of the process's address space. This flag should have the same semantics as the same flag on Linux. To facilitate this, add a new parameter to vm_map_find() that specifies an optional maximum virtual address. While here, fix several callers of vm_map_find() to use a VMFS_* constant for the findspace argument instead of TRUE and FALSE. Reviewed by: alc Approved by: re (kib) Notes: svn path=/head/; revision=255426
* - Use an arbitrary but reasonably large import size for kva on architecturesJeff Roberson2013-08-191-1/+2
| | | | | | | | | | | | | | that don't support superpages. This keeps the number of spans and internal fragmentation lower. - When the user asks for alignment from vmem_xalloc adjust the imported size by 2*align to be certain we can satisfy the allocation. This comes at the expense of potential failures when the backend can't supply enough memory but could supply the requested size and alignment. Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=254543
* Add new mmap(2) flags to permit applications to request specific virtualJohn Baldwin2013-08-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | address alignment of mappings. - MAP_ALIGNED(n) requests a mapping aligned on a boundary of (1 << n). Requests for n >= number of bits in a pointer or less than the size of a page fail with EINVAL. This matches the API provided by NetBSD. - MAP_ALIGNED_SUPER is a special case of MAP_ALIGNED. It can be used to optimize the chances of using large pages. By default it will align the mapping on a large page boundary (the system is free to choose any large page size to align to that seems best for the mapping request). However, if the object being mapped is already using large pages, then it will align the virtual mapping to match the existing large pages in the object instead. - Internally, VMFS_ALIGNED_SPACE is now renamed to VMFS_SUPER_SPACE, and VMFS_ALIGNED_SPACE(n) is repurposed for specifying a specific alignment. MAP_ALIGNED(n) maps to using VMFS_ALIGNED_SPACE(n), while MAP_ALIGNED_SUPER maps to VMFS_SUPER_SPACE. - mmap() of a device object now uses VMFS_OPTIMAL_SPACE rather than explicitly using VMFS_SUPER_SPACE. All device objects are forced to use a specific color on creation, so VMFS_OPTIMAL_SPACE is effectively equivalent. Reviewed by: alc MFC after: 1 month Notes: svn path=/head/; revision=254430
* Replace kernel virtual address space allocation with vmem. This providesJeff Roberson2013-08-071-13/+64
| | | | | | | | | | | | | | | | transparent layering and better fragmentation. - Normalize functions that allocate memory to use kmem_* - Those that allocate address space are named kva_* - Those that operate on maps are named kmap_* - Implement recursive allocation handling for kmem_arena in vmem. Reviewed by: alc Tested by: pho Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=254025
* - Add a general purpose resource allocator, vmem, from NetBSD. It wasJeff Roberson2013-06-281-17/+21
| | | | | | | | | | | | | | | | | originally inspired by the Solaris vmem detailed in the proceedings of usenix 2001. The NetBSD version was heavily refactored for bugs and simplicity. - Use this resource allocator to allocate the buffer and transient maps. Buffer cache defrags are reduced by 25% when used by filesystems with mixed block sizes. Ultimately this may permit dynamic buffer cache sizing on low KVA machines. Discussed with: alc, kib, attilio Tested by: pho Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=252330
* Only size and create the bio_transient_map when unmapped buffers areKonstantin Belousov2013-03-211-4/+6
| | | | | | | | | | enabled. Now, disabling the unmapped buffers should result in the kernel memory map identical to pre-r248550. Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=248569
* Implement the concept of the unmapped VMIO buffers, i.e. buffers whichKonstantin Belousov2013-03-191-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | do not map the b_pages pages into buffer_map KVA. The use of the unmapped buffers eliminate the need to perform TLB shootdown for mapping on the buffer creation and reuse, greatly reducing the amount of IPIs for shootdown on big-SMP machines and eliminating up to 25-30% of the system time on i/o intensive workloads. The unmapped buffer should be explicitely requested by the GB_UNMAPPED flag by the consumer. For unmapped buffer, no KVA reservation is performed at all. The consumer might request unmapped buffer which does have a KVA reserve, to manually map it without recursing into buffer cache and blocking, with the GB_KVAALLOC flag. When the mapped buffer is requested and unmapped buffer already exists, the cache performs an upgrade, possibly reusing the KVA reservation. Unmapped buffer is translated into unmapped bio in g_vfs_strategy(). Unmapped bio carry a pointer to the vm_page_t array, offset and length instead of the data pointer. The provider which processes the bio should explicitely specify a readiness to accept unmapped bio, otherwise g_down geom thread performs the transient upgrade of the bio request by mapping the pages into the new bio_transient_map KVA submap. The bio_transient_map submap claims up to 10% of the buffer map, and the total buffer_map + bio_transient_map KVA usage stays the same. Still, it could be manually tuned by kern.bio_transient_maxcnt tunable, in the units of the transient mappings. Eventually, the bio_transient_map could be removed after all geom classes and drivers can accept unmapped i/o requests. Unmapped support can be turned off by the vfs.unmapped_buf_allowed tunable, disabling which makes the buffer (or cluster) creation requests to ignore GB_UNMAPPED and GB_KVAALLOC flags. Unmapped buffers are only enabled by default on the architectures where pmap_copy_page() was implemented and tested. In the rework, filesystem metadata is not the subject to maxbufspace limit anymore. Since the metadata buffers are always mapped, the buffers still have to fit into the buffer map, which provides a reasonable (but practically unreachable) upper bound on it. The non-metadata buffer allocations, both mapped and unmapped, is accounted against maxbufspace, as before. Effectively, this means that the maxbufspace is forced on mapped and unmapped buffers separately. The pre-patch bufspace limiting code did not worked, because buffer_map fragmentation does not allow the limit to be reached. By Jeff Roberson request, the getnewbuf() function was split into smaller single-purpose functions. Sponsored by: The FreeBSD Foundation Discussed with: jeff (previous version) Tested by: pho, scottl (previous version), jhb, bf MFC after: 2 weeks Notes: svn path=/head/; revision=248508
* Switch the vm_object mutex to be a rwlock. This will enable in theAttilio Rao2013-03-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho Notes: svn path=/head/; revision=248084
* Move the callout subsystem initialization to its own SYSINIT()Andre Oppermann2013-03-081-7/+0
| | | | | | | | | | | | | | | | | | from being indirectly called via cpu_startup()+vm_ksubmap_init(). The boot order position remains the same at SI_SUB_CPU. Allocation of the callout array is changed to stardard kernel malloc from a slightly obscure direct kernel_map allocation. kern_timeout_callwheel_alloc() is renamed to callout_callwheel_init() to better describe its purpose. kern_timeout_callwheel_init() is removed simplifying the per-cpu initialization. Reviewed by: davide Notes: svn path=/head/; revision=248032
* Introduce exec_alloc_args(). The objective being to encapsulate theAlan Cox2010-07-271-3/+1
| | | | | | | | | | | | | details of the string buffer allocation in one place. Eliminate the portion of the string buffer that was dedicated to storing the interpreter name. The pointer to the interpreter name can simply be made to point to the appropriate argument string. Reviewed by: kib Notes: svn path=/head/; revision=210545
* Change the order in which the file name, arguments, environment, andAlan Cox2010-07-251-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | shell command are stored in exec*()'s demand-paged string buffer. For a "buildworld" on an 8GB amd64 multiprocessor, the new order reduces the number of global TLB shootdowns by 31%. It also eliminates about 330k page faults on the kernel address space. Change exec_shell_imgact() to use "args->begin_argv" consistently as the start of the argument and environment strings. Previously, it would sometimes use "args->buf", which is the start of the overall buffer, but no longer the start of the argument and environment strings. While I'm here, eliminate unnecessary passing of "&length" to copystr(), where we don't actually care about the length of the copied string. Clean up the initialization of the exec map. In particular, use the correct size for an entry, and express that size in the same way that is used when an entry is allocated. The old size was one page too large. (This discrepancy originated in 2004 when I rewrote exec_map_first_page() to use sf_buf_alloc() instead of the exec map for mapping the first page of the executable.) Reviewed by: kib Notes: svn path=/head/; revision=210475
* Align the start of the clean submap to a superpage boundary. AlthoughAlan Cox2010-02-211-1/+1
| | | | | | | | | no superpage mappings are created within the clean submap, aligning the start of the clean submap helps to prevent interference with kmem_alloc()'s use of superpages. Notes: svn path=/head/; revision=204181
* Adjust some variables (mostly related to the buffer cache) that holdJohn Baldwin2009-03-091-3/+3
| | | | | | | | | | | | | | | | | | | | | | address space sizes to be longs instead of ints. Specifically, the follow values are now longs: runningbufspace, bufspace, maxbufspace, bufmallocspace, maxbufmallocspace, lobufspace, hibufspace, lorunningspace, hirunningspace, maxswzone, maxbcache, and maxpipekva. Previously, a relatively small number (~ 44000) of buffers set in kern.nbuf would result in integer overflows resulting either in hangs or bogus values of hidirtybuffers and lodirtybuffers. Now one has to overflow a long to see such problems. There was a check for a nbuf setting that would cause overflows in the auto-tuning of nbuf. I've changed it to always check and cap nbuf but warn if a user-supplied tunable would cause overflow. Note that this changes the ABI of several sysctls that are used by things like top(1), etc., so any MFC would probably require a some gross shims to allow for that. MFC after: 1 month Notes: svn path=/head/; revision=189595
* Introduce a new parameter "superpage_align" to kmem_suballoc() that isAlan Cox2008-05-101-5/+6
| | | | | | | | | | | | | | used to request superpage alignment for the submap. Request superpage alignment for the kmem_map. Pass VMFS_ANY_SPACE instead of TRUE to vm_map_find(). (They are currently equivalent but VMFS_ANY_SPACE is the new preferred spelling.) Remove a stale comment from kmem_malloc(). Notes: svn path=/head/; revision=178933
* In keeping with style(9)'s recommendations on macros, use a ';'Robert Watson2008-03-161-1/+1
| | | | | | | | | | | | after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink Notes: svn path=/head/; revision=177253
* Add the vm.exec_map_entries tunable and read-only sysctl, which controlsKris Kennaway2005-04-251-1/+7
| | | | | | | | | | | | | | | the number of entries in exec_map (maximum number of simultaneous execs that can be handled by the kernel). The default value of 16 is insufficient on heavily loaded machines (particularly SMP machines), and if it is exceeded then executing further processes will generate a SIGABRT. This is a workaround until a better solution can be implemented. Reviewed by: alc MFC after: 3 days Notes: svn path=/head/; revision=145530