summaryrefslogtreecommitdiff
path: root/sys/kern/subr_param.c
Commit message (Collapse)AuthorAgeFilesLines
* Make MAXPHYS tunable. Bump MAXPHYS to 1M.Konstantin Belousov2020-11-281-5/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace MAXPHYS by runtime variable maxphys. It is initialized from MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys. Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer cache buffers exactly to atop(maxbcachebuf) (currently it is sized to atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1. The +1 for pbufs allow several pbuf consumers, among them vmapbuf(), to use unaligned buffers still sized to maxphys, esp. when such buffers come from userspace (*). Overall, we save significant amount of otherwise wasted memory in b_pages[] for buffer cache buffers, while bumping MAXPHYS to desired high value. Eliminate all direct uses of the MAXPHYS constant in kernel and driver sources, except a place which initialize maxphys. Some random (and arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted straight. Some drivers, which use MAXPHYS to size embeded structures, get private MAXPHYS-like constant; their convertion is out of scope for this work. Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs, dev/siis, where either submitted by, or based on changes by mav. Suggested by: mav (*) Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27225 Notes: svn path=/head/; revision=368124
* Make max ticks for pause in vn_lock_pair() adjustable at runtime.Konstantin Belousov2020-11-261-0/+5
| | | | | | | | | | | Reduce default value from hz / 10 to hz / 100. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=368073
* Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)Pawel Biernacki2020-02-261-2/+3
| | | | | | | | | | | | | | | | | | | r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718 Notes: svn path=/head/; revision=358333
* Remove sparc64 kernel supportWarner Losh2020-02-031-1/+1
| | | | | | | | | Remove all sparc64 specific files Remove all sparc64 ifdefs Removee indireeect sparc64 ifdefs Notes: svn path=/head/; revision=357455
* riscv: restore default HZ=1000, keep QEMU at HZ=100Philip Paeps2019-09-071-1/+1
| | | | | | | | | This reverts r351918 and r351919. Discussed with: br, ian, imp Notes: svn path=/head/; revision=351969
* riscv: default to HZ=100Philip Paeps2019-09-061-1/+1
| | | | | | | | | | Most current RISC-V development platforms are not fast enough to benefit from the increased granularity provided by HZ=1000. Sponsored by: Axiado Notes: svn path=/head/; revision=351918
* The older detection methods (smbios.bios.vendor and smbios.system.product)Stephen J. Kiernan2019-05-211-9/+10
| | | | | | | | | | | | | | | | | | | | | are able to determine some virtual machines, but the vm_guest variable was still only being set to VM_GUEST_VM. Since we do know what some of them specifically are, we can set vm_guest appropriately. Also, if we see the CPUID has the HV flag, but we were unable to find a definitive vendor in the Hypervisor CPUID Information Leaf, fall back to the older detection methods, as they may be able to determine a specific HV type. Add VM_GUEST_PARALLELS value to VM_GUEST for Parallels. Approved by: cem Differential Revision: https://reviews.freebsd.org/D20305 Notes: svn path=/head/; revision=348051
* Instead of individual conditional statements to look for each hypervisorStephen J. Kiernan2019-05-171-0/+1
| | | | | | | | | | | | | | | | | type, use a table to make it easier to add more in the future, if needed. Add VirtualBox detection to the table ("VBoxVBoxVBox" is the hypervisor vendor string to look for.) Also add VM_GUEST_VBOX to the VM_GUEST enumeration to indicate VirtualBox. Save the CPUID base for the hypervisor entry that we detected. Driver code may need to know about it in order to obtain additional CPUID features. Approved by: bryanv, jhb Differential Revision: https://reviews.freebsd.org/D16305 Notes: svn path=/head/; revision=347932
* Fix mistake in r343030: move nswbuf calculation back toGleb Smirnoff2019-01-161-3/+0
| | | | | | | | | | kern_vfs_bio_buffer_alloc(), because in init_param2() nbuf isn't really initialized yet. Pointed out by: bde Notes: svn path=/head/; revision=343101
* Allocate pager bufs from UMA instead of 80-ish mutex protected linked list.Gleb Smirnoff2019-01-151-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | o In vm_pager_bufferinit() create pbuf_zone and start accounting on how many pbufs are we going to have set. In various subsystems that are going to utilize pbufs create private zones via call to pbuf_zsecond_create(). The latter calls uma_zsecond_create(), and sets a limit on created zone. After startup preallocate pbufs according to requirements of all pbuf zones. Subsystems that used to have a private limit with old allocator now have private pbuf zones: md(4), fusefs, NFS client, smbfs, VFS cluster, FFS, swap, vnode pager. The following subsystems use shared pbuf zone: cam(4), nvme(4), physio(9), aio(4). They should have their private limits, but changing that is out of scope of this commit. o Fetch tunable value of kern.nswbuf from init_param2() and while here move NSWBUF_MIN to opt_param.h and eliminate opt_swap.h, that was holding only this option. Default values aren't touched by this commit, but they probably should be reviewed wrt to modern hardware. This change removes a tight bottleneck from sendfile(2) operation, that uses pbufs in vnode pager. Other pagers also would benefit from faster allocation. Together with: gallatin Tested by: pho Notes: svn path=/head/; revision=343030
* sys: further adoption of SPDX licensing ID tags.Pedro F. Giffuni2017-11-201-0/+2
| | | | | | | | | | | | | | | | | Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point. Notes: svn path=/head/; revision=326023
* Allow sysctl kern.vm_guest to return bhyve when running under bhyve.Marcelo Araujo2017-06-081-0/+1
| | | | | | | | | | | Submitted by: Sean Fagan <sef@ixsystems.com> Reviewed by: grehan MFH: 4 weeks. Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D11090 Notes: svn path=/head/; revision=319678
* Initialize 'ticks' earlier in boot after 'hz' is set.John Baldwin2016-11-221-0/+6
| | | | | | | | | | | | | | This avoids the time-warp after kthreads have started running and the required fixup to td_slptick and td_blktick in the EARLY_AP_STARTUP case. Now, 'ticks' is initialized before any kthreads are created or any context switches are performed. Tested by: gavin MFC after: 2 weeks Sponsored by: Netflix Notes: svn path=/head/; revision=308948
* Renumber license clauses in sys/kern to avoid skipping #3Ed Maste2016-09-151-1/+1
| | | | Notes: svn path=/head/; revision=305832
* Add explicit detection of KVM hypervisorEric Badger2016-07-131-0/+1
| | | | | | | | | | | | | | | | | | Set vm_guest to a new enum value (VM_GUEST_KVM) when kvm is detected and use vm_guest in conditionals testing for KVM. Also, fix a conditional checking if we're running in a VM which caught only the generic VM case, but not more specific VMs (KVM, VMWare, etc.). (Spotted by: vangyzen). Differential revision: https://reviews.freebsd.org/D7172 Sponsored by: Dell Inc. Approved by: kib (mentor), vangyzen (mentor) Reviewed by: alc MFC after: 4 weeks Notes: svn path=/head/; revision=302783
* Ensure that maxproc does not exceed pid_max, at the time of boot.Konstantin Belousov2015-09-211-1/+3
| | | | | | | | | | | | | | | | | | | Changes to kern.pid_max mib after the boot can break this relation. The maxfiles value was calculated by the MAXFILES formula based on maxproc value, but this change decouples them, and MAXFILES now references maxusers. Without manual tuning, the maxfiles default value remains as it was prior to this commit. But for systems which have tuned maxproc and rely on maxfiles to adjust, additional reconfiguration is needed. Reported by: rwatson Reviewed by: emaste Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=288068
* Also make kern.maxfilesperproc a boot time tunable.Adrian Chadd2015-09-101-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Auto-tuning threshold discussions aside, it turns out that if you want to lower this on say, rather memory-packed machines, you either set maxusers or kern.maxfiles, or you set it in sysctl. The former is a non-exact way to tune this; the latter doesn't actually affect anything in the startup scripts. This first occured because I wondered why the hell screen would take upwards of 10 seconds to spawn a new screen. I then found python doing the same thing during fork/exec of child processes - it calls close() on each FD up to the current openfiles limit. On a 1TB machine this is like, 26 million FDs per process. Ugh. So: * This allows it to be set early in /boot/loader.conf; * It can be used to work around the ridiculous situation of screen, python, etc doing a close() on potentially millions of FDs even though you only have four open. Tested: * 4GB, 32GB, 64GB, 128GB, 384GB, 1TB systems with autotune, ensuring screen and python forking doesn't result in some pretty hilariously bad behaviour. TODO: * Note that the default login.conf sets openfiles-cur to unlimited, effectively obeying kern.maxfilesperproc. Perhaps we should fix this. * .. and even if we do, we need to also ensure that daemons get a soft limit of something reasonable and capped - they can request more FDs themselves. MFC after: 1 week Sponsored by: Norse Corp, Inc. Notes: svn path=/head/; revision=287606
* Make kstack_pages a tunable on arm, x86, and powepc. On i386, theKonstantin Belousov2015-08-101-0/+3
| | | | | | | | | | | | | | | | | | | | | | initial thread stack is not adjusted by the tunable, the stack is allocated too early to get access to the kernel environment. See TD0_KSTACK_PAGES for the thread0 stack sizing on i386. The tunable was tested on x86 only. From the visual inspection, it seems that it might work on arm and powerpc. The arm USPACE_SVC_STACK_TOP and powerpc USPACE macros seems to be already incorrect for the threads with non-default kstack size. I only changed the macros to use variable instead of constant, since I cannot test. On arm64, mips and sparc64, some static data structures are sized by KSTACK_PAGES, so the tunable is disabled. Sponsored by: The FreeBSD Foundation MFC after: 2 week Notes: svn path=/head/; revision=286584
* - Make 'struct buf *buf' private to vfs_bio.c. Having a global variableJeff Roberson2015-07-291-7/+0
| | | | | | | | | | | | | | | | | 'buf' is inconvenient and has lead me to some irritating to discover bugs over the years. It also makes it more challenging to refactor the buf allocation system. - Move swbuf and declare it as an extern in vfs_bio.c. This is still not perfect but better than it was before. - Eliminate the unused ffs function that relied on knowledge of the buf array. - Move the shutdown code that iterates over the buf array into vfs_bio.c. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=285993
* Remove support for Xen PV domU kernels. Support for HVM domU kernelsJohn Baldwin2015-04-301-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | remains. Xen is planning to phase out support for PV upstream since it is harder to maintain and has more overhead. Modern x86 CPUs include virtualization extensions that support HVM guests instead of PV guests. In addition, the PV code was i386 only and not as well maintained recently as the HVM code. - Remove the i386-only NATIVE option that was used to disable certain components for PV kernels. These components are now standard as they are on amd64. - Remove !XENHVM bits from PV drivers. - Remove various shims required for XEN (e.g. PT_UPDATES_FLUSH, LOAD_CR3, etc.) - Remove duplicate copy of <xen/features.h>. - Remove unused, i386-only xenstored.h. Differential Revision: https://reviews.freebsd.org/D2362 Reviewed by: royger Tested by: royger (i386/amd64 HVM domU and amd64 PVH dom0) Relnotes: yes Notes: svn path=/head/; revision=282274
* Use SYSCTL_OUT_STR() to return strings.Ian Lepore2015-03-141-2/+1
| | | | | | | PR: 195668 Notes: svn path=/head/; revision=280006
* Rework virtual machine hypervisor detection.John Baldwin2014-10-281-59/+8
| | | | | | | | | | | | | | | | | | - Move the existing code to x86/x86/identcpu.c since it is x86-specific. - If the CPUID2_HV flag is set, assume a hypervisor is present and query the 0x40000000 leaf to determine the hypervisor vendor ID. Export the vendor ID and the highest supported hypervisor CPUID leaf via hv_vendor[] and hv_high variables, respectively. The hv_vendor[] array is also exported via the hw.hv_vendor sysctl. - Merge the VMWare detection code from tsc.c into the new probe in identcpu.c. Add a VM_GUEST_VMWARE to identify vmware and use that in the TSC code to identify VMWare. Differential Revision: https://reviews.freebsd.org/D1010 Reviewed by: delphij, jkim, neel Notes: svn path=/head/; revision=273800
* Follow up to r225617. In order to maximize the re-usability of kernel codeDavide Italiano2014-10-161-2/+2
| | | | | | | | | | | in userland rename in-kernel getenv()/setenv() to kern_setenv()/kern_getenv(). This fixes a namespace collision with libc symbols. Submitted by: kmacy Tested by: make universe Notes: svn path=/head/; revision=273174
* Pull in r267961 and r267973 again. Fix for issues reported will follow.Hans Petter Selasky2014-06-281-13/+13
| | | | Notes: svn path=/head/; revision=267992
* Revert r267961, r267973:Glen Barber2014-06-271-13/+13
| | | | | | | | | | | | | These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory Notes: svn path=/head/; revision=267985
* Extend the meaning of the CTLFLAG_TUN flag to automatically check ifHans Petter Selasky2014-06-271-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=267961
* Add VM_LAST, a special last element in enum VM_GUEST and use it in CTASSERTSergey Kandaurov2013-11-121-0/+1
| | | | | | | | | to ensure that vm_guest range is covered by vm_guest_sysctl_names. Suggested by: mjg Notes: svn path=/head/; revision=258069
* Set description string for VM_GUEST_HV (HyperV guest).Sergey Kandaurov2013-11-111-0/+1
| | | | | | | | | | | | This fixes fallout from r256425. Reported by: Pavel Timofeev <timp87@gmail com> Tested by: Pavel Timofeev <timp87@gmail com> Reviewed by: Roger Pau Monnц╘ MFC after: 3 days Notes: svn path=/head/; revision=257996
* Fix typo.Konstantin Belousov2013-10-271-1/+1
| | | | | | | MFC after: 3 days Notes: svn path=/head/; revision=257221
* Implement the concept of the unmapped VMIO buffers, i.e. buffers whichKonstantin Belousov2013-03-191-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | do not map the b_pages pages into buffer_map KVA. The use of the unmapped buffers eliminate the need to perform TLB shootdown for mapping on the buffer creation and reuse, greatly reducing the amount of IPIs for shootdown on big-SMP machines and eliminating up to 25-30% of the system time on i/o intensive workloads. The unmapped buffer should be explicitely requested by the GB_UNMAPPED flag by the consumer. For unmapped buffer, no KVA reservation is performed at all. The consumer might request unmapped buffer which does have a KVA reserve, to manually map it without recursing into buffer cache and blocking, with the GB_KVAALLOC flag. When the mapped buffer is requested and unmapped buffer already exists, the cache performs an upgrade, possibly reusing the KVA reservation. Unmapped buffer is translated into unmapped bio in g_vfs_strategy(). Unmapped bio carry a pointer to the vm_page_t array, offset and length instead of the data pointer. The provider which processes the bio should explicitely specify a readiness to accept unmapped bio, otherwise g_down geom thread performs the transient upgrade of the bio request by mapping the pages into the new bio_transient_map KVA submap. The bio_transient_map submap claims up to 10% of the buffer map, and the total buffer_map + bio_transient_map KVA usage stays the same. Still, it could be manually tuned by kern.bio_transient_maxcnt tunable, in the units of the transient mappings. Eventually, the bio_transient_map could be removed after all geom classes and drivers can accept unmapped i/o requests. Unmapped support can be turned off by the vfs.unmapped_buf_allowed tunable, disabling which makes the buffer (or cluster) creation requests to ignore GB_UNMAPPED and GB_KVAALLOC flags. Unmapped buffers are only enabled by default on the architectures where pmap_copy_page() was implemented and tested. In the rework, filesystem metadata is not the subject to maxbufspace limit anymore. Since the metadata buffers are always mapped, the buffers still have to fit into the buffer map, which provides a reasonable (but practically unreachable) upper bound on it. The non-metadata buffer allocations, both mapped and unmapped, is accounted against maxbufspace, as before. Effectively, this means that the maxbufspace is forced on mapped and unmapped buffers separately. The pre-patch bufspace limiting code did not worked, because buffer_map fragmentation does not allow the limit to be reached. By Jeff Roberson request, the getnewbuf() function was split into smaller single-purpose functions. Sponsored by: The FreeBSD Foundation Discussed with: jeff (previous version) Tested by: pho, scottl (previous version), jhb, bf MFC after: 2 weeks Notes: svn path=/head/; revision=248508
* Move the auto-sizing of the callout array from init_param2() toAndre Oppermann2013-03-081-12/+0
| | | | | | | | | | | | | | | kern_timeout_callwheel_alloc() where it is actually used. This is a mechanical move and no tuning parameters are changed. The pre-allocated callout array is only used for legacy timeout(9) calls and is only allocated and active on cpu0. Eventually all remaining users of timeout(9) should switch to the callout_* API. Reviewed by: davide Notes: svn path=/head/; revision=248031
* - Make callout(9) tickless, relying on eventtimers(4) as backend forDavide Italiano2013-03-041-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | precise time event generation. This greatly improves granularity of callouts which are not anymore constrained to wait next tick to be scheduled. - Extend the callout KPI introducing a set of callout_reset_sbt* functions, which take a sbintime_t as timeout argument. The new KPI also offers a way for consumers to specify precision tolerance they allow, so that callout can coalesce events and reduce number of interrupts as well as potentially avoid scheduling a SWI thread. - Introduce support for dispatching callouts directly from hardware interrupt context, specifying an additional flag. This feature should be used carefully, as long as interrupt context has some limitations (e.g. no sleeping locks can be held). - Enhance mechanisms to gather informations about callwheel, introducing a new sysctl to obtain stats. This change breaks the KBI. struct callout fields has been changed, in particular 'int ticks' (4 bytes) has been replaced with 'sbintime_t' (8 bytes) and another 'sbintime_t' field was added for precision. Together with: mav Reviewed by: attilio, bde, luigi, phk Sponsored by: Google Summer of Code 2012, iXsystems inc. Tested by: flo (amd64, sparc64), marius (sparc64), ian (arm), markj (amd64), mav, Fabian Keil Notes: svn path=/head/; revision=247777
* Move the mbuf memory limit calculations from init_param2() toAndre Oppermann2013-01-171-14/+0
| | | | | | | | | | | | | | | | tunable_mbinit() where it is next to where it is used later. Change the sysinit level of tunable_mbinit() from SI_SUB_TUNABLES to SI_SUB_KMEM after the VM is running. This allows to use better methods to determine the effectively available physical and virtual memory available to the kernel. Update comments. In a second step it can be merged into mbuf_init(). Notes: svn path=/head/; revision=245575
* Do not autotune ncallout to be greater than 18508.Alfred Perlstein2013-01-151-1/+4
| | | | | | | | | | | | | | | | | | | | | | When maxusers was unrestricted and maxfiles was allowed to autotune much higher the result was that ncallout which was based on maxfiles and maxproc grew much higher than was needed. To fix this clip autotuning to the same number we would get with the old maxusers algorithm which would stop scaling at 384 maxusers. Growing ncalout higher is not likely to be needed since most consumers of timeout(9) are gone and any higher value for ncallout causes the callwheel hashes to be much larger than will even be needed for most applications. MFC after: 1 month Reviewed by: mav Notes: svn path=/head/; revision=245469
* - Detect when we are in KVM.Andrey Zonov2013-01-151-0/+2
| | | | | | | | | Silence on: emulation Approved by: kib (mentor) MFC after: 1 week Notes: svn path=/head/; revision=245457
* Teach the kernel to recognize that it is executing inside a bhyve virtualNeel Natu2013-01-051-0/+1
| | | | | | | | | machine. Obtained from: NetApp Notes: svn path=/head/; revision=245066
* Prevent long type overflow of realmem calculation on ILP32 by forcingAndre Oppermann2012-12-101-2/+2
| | | | | | | | | | | calculation to be in quad_t space. Fix style issue with second parameter to qmin(). Reported by: alc Reviewed by: bde, alc Notes: svn path=/head/; revision=244080
* Using a long is the wrong type to represent the realmem and maxmbufmemAndre Oppermann2012-11-291-4/+4
| | | | | | | | | | | | variable as they may overflow on i386/PAE and i386 with > 2GB RAM. Use 64bit quad_t instead. It has broader kernel infrastructure support with TUNABLE_QUAD_FETCH() and qmin/qmax() than other available types. Pointed out by: alc, bde Notes: svn path=/head/; revision=243668
* Base the mbuf related limits on the available physical memory orAndre Oppermann2012-11-271-8/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kernel memory, whichever is lower. The overall mbuf related memory limit must be set so that mbufs (and clusters of various sizes) can't exhaust physical RAM or KVM. The limit is set to half of the physical RAM or KVM (whichever is lower) as the baseline. In any normal scenario we want to leave at least half of the physmem/kvm for other kernel functions and userspace to prevent it from swapping too easily. Via a tunable kern.maxmbufmem the limit can be upped to at most 3/4 of physmem/kvm. At the same time divorce maxfiles from maxusers and set maxfiles to physpages / 8 with a floor based on maxusers. This way busy servers can make use of the significantly increased mbuf limits with a much larger number of open sockets. Tidy up ordering in init_param2() and check up on some users of those values calculated here. Out of the overall mbuf memory limit 2K clusters and 4K (page size) clusters to get 1/4 each because these are the most heavily used mbuf sizes. 2K clusters are used for MTU 1500 ethernet inbound packets. 4K clusters are used whenever possible for sends on sockets and thus outbound packets. The larger cluster sizes of 9K and 16K are limited to 1/6 of the overall mbuf memory limit. When jumbo MTU's are used these large clusters will end up only on the inbound path. They are not used on outbound, there it's still 4K. Yes, that will stay that way because otherwise we run into lots of complications in the stack. And it really isn't a problem, so don't make a scene. Normal mbufs (256B) weren't limited at all previously. This was problematic as there are certain places in the kernel that on allocation failure of clusters try to piece together their packet from smaller mbufs. The mbuf limit is the number of all other mbuf sizes together plus some more to allow for standalone mbufs (ACK for example) and to send off a copy of a cluster. Unfortunately there isn't a way to set an overall limit for all mbuf memory together as UMA doesn't support such a limiting. NB: Every cluster also has an mbuf associated with it. Two examples on the revised mbuf sizing limits: 1GB KVM: 512MB limit for mbufs 419,430 mbufs 65,536 2K mbuf clusters 32,768 4K mbuf clusters 9,709 9K mbuf clusters 5,461 16K mbuf clusters 16GB RAM: 8GB limit for mbufs 33,554,432 mbufs 1,048,576 2K mbuf clusters 524,288 4K mbuf clusters 155,344 9K mbuf clusters 87,381 16K mbuf clusters These defaults should be sufficient for even the most demanding network loads. MFC after: 1 month Notes: svn path=/head/; revision=243631
* Allow maxusers to scale on machines with large address space.Alfred Perlstein2012-11-101-11/+11
| | | | | | | | | | | | | | | | | | | | | Some hooks are added to clamp down maxusers and nmbclusters for small address space systems. VM_MAX_AUTOTUNE_MAXUSERS - the max maxusers that will be autotuned based on physical memory. VM_MAX_AUTOTUNE_NMBCLUSTERS - max nmbclusters based on physical memory. These are set to the old values on i386 to preserve the clamping that was being done to all arches. Another macro VM_AUTOTUNE_NMBCLUSTERS is provided to allow an override for the calculation on a MD basis. Currently no arch defines this. Reviewed by: peter MFC after: 2 weeks Notes: svn path=/head/; revision=242847
* Allow autotune maxusers > 384 on 64 bit machinesAlfred Perlstein2012-10-251-2/+10
| | | | | | | | | | | | A default install on large memory machines with multiple 10gigE interfaces were not being given enough mbufs to do full bandwidth TCP or NFS traffic. To keep the value somewhat reasonable, we scale back the number of maxuers by 1/6 past the 384 point. This gives us enough mbufs for most of our pretty basic 10gigE line-speed tests to complete. Notes: svn path=/head/; revision=242029
* - Mark some sysctls with CTLFLAG_TUN flag instead of CTLFLAG_RDTUN.Andrey Zonov2012-09-031-7/+7
| | | | | | | | | Pointed out by: avg Approved by: kib (mentor) MFC after: 1 week Notes: svn path=/head/; revision=240068
* - Make kern.maxtsiz, kern.dfldsiz, kern.maxdsiz, kern.dflssiz, kern.maxssizAndrey Zonov2012-09-021-7/+7
| | | | | | | | | and kern.sgrowsiz sysctls writable. Approved by: kib (mentor) Notes: svn path=/head/; revision=240026
* As a safety measure, disable lowering pid_max too much.Konstantin Belousov2012-08-161-0/+3
| | | | | | | | Requested by: Peter Jeremy <peter@rulingia.com> MFC after: 1 week Notes: svn path=/head/; revision=239329
* Add a sysctl kern.pid_max, which limits the maximum pid the system isKonstantin Belousov2012-08-151-2/+11
| | | | | | | | | | allowed to allocate, and corresponding tunable with the same name. Note that existing processes with higher pids are left intact. MFC after: 1 week Notes: svn path=/head/; revision=239301
* Modestly increase the maximum allowed size of the kmem map on i386.Alan Cox2011-03-231-11/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Also, express this new maximum as a fraction of the kernel's address space size rather than a constant so that increasing KVA_PAGES will automatically increase this maximum. As a side-effect of this change, kern.maxvnodes will automatically increase by a proportional amount. While I'm here ensure that this change doesn't result in an unintended increase in maxpipekva on i386. Calculate maxpipekva based upon the size of the kernel address space and the amount of physical memory instead of the size of the kmem map. The memory backing pipes is not allocated from the kmem map. It is allocated from its own submap of the kernel map. In short, it has no real connection to the kmem map. (In fact, the commit messages for the maxpipekva auto-sizing talk about using the kernel map size, cf. r117325 and r117391, even though the implementation actually used the kmem map size.) Although the calculation is now done differently, the resulting value for maxpipekva should remain almost the same on i386. However, on amd64, the value will be reduced by 2/3. This is intentional. The recent change to VM_KMEM_SIZE_SCALE on amd64 for the benefit of ZFS also had the unnecessary side-effect of increasing maxpipekva. This change is effectively restoring maxpipekva on amd64 to its prior value. Eliminate init_param3() since it is no longer used. Notes: svn path=/head/; revision=219920
* Make MSGBUF_SIZE kernel option a loader tunable kern.msgbufsize.Sergey Kandaurov2011-01-211-0/+7
| | | | | | | | | | Submitted by: perryh pluto.rain.com (previous version) Reviewed by: jhb Approved by: kib (mentor) Tested by: universe Notes: svn path=/head/; revision=217688
* Add Xen to the list of virtual vendors. In the non PV (HVM) case this fixesChristian S.J. Peron2010-08-061-0/+1
| | | | | | | | | | | the virtualization detection successfully disabling the clflush instruction. This fixes insta-panics for XEN hvm users when the hw.clflush_disable tunable is -1 or 0 (-1 by default). Discussed with: jhb Notes: svn path=/head/; revision=210935
* Reverse the logic of the if statement that sets the default value ofNathan Whitehorn2010-06-241-3/+3
| | | | | | | | | HZ; the list of 1000 Hz platforms was getting unwieldy. Suggested by: marcel Notes: svn path=/head/; revision=209492
* Move default HZ from 100 to 1000 on powerpc.Nathan Whitehorn2010-06-231-1/+1
| | | | | | | | Reviewed by: marcel MFC after: 2 weeks Notes: svn path=/head/; revision=209490