summaryrefslogtreecommitdiff
path: root/sys/vm/vm_object.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Use an atomic reference count for paging in progress so that callers do notJeff Roberson2019-08-191-36/+30
| | | | | | | | | | | | require the object lock. Reviewed by: markj Tested by: pho (as part of a larger branch) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21311 Notes: svn path=/head/; revision=351241
* Merge the vm_page hold and wire mechanisms.Mark Johnston2019-07-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The hold_count and wire_count fields of struct vm_page are separate reference counters with similar semantics. The remaining essential differences are that holds are not counted as a reference with respect to LRU, and holds have an implicit free-on-last unhold semantic whereas vm_page_unwire() callers must explicitly determine whether to free the page once the last reference to the page is released. This change removes the KPIs which directly manipulate hold_count. Functions such as vm_fault_quick_hold_pages() now return wired pages instead. Since r328977 the overhead of maintaining LRU for wired pages is lower, and in many cases vm_fault_quick_hold_pages() callers would swap holds for wirings on the returned pages anyway, so with this change we remove a number of page lock acquisitions. No functional change is intended. __FreeBSD_version is bumped. Reviewed by: alc, kib Discussed with: jeff Discussed with: jhb, np (cxgbe) Tested by: pho (previous version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19247 Notes: svn path=/head/; revision=349846
* Add a return value to vm_page_remove().Mark Johnston2019-06-261-6/+2
| | | | | | | | | | | | | | | | | | Use it to indicate whether the page may be safely freed following its removal from the object. Also change vm_page_remove() to assume that the page's object pointer is non-NULL, and have callers perform this check instead. This is a step towards an implementation of an atomic reference counter for each physical page structure. Reviewed by: alc, dougm, kib MFC after: 1 week Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20758 Notes: svn path=/head/; revision=349432
* Add a vm_page_wired() predicate.Mark Johnston2019-06-021-4/+4
| | | | | | | | | | | | | Use it instead of accessing the wire_count field directly. No functional change intended. Reviewed by: alc, kib MFC after: 1 week Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20485 Notes: svn path=/head/; revision=348502
* Noted by: alcKonstantin Belousov2019-05-061-3/+2
| | | | | | | | | Reviewed by: alc, markj (previous version) Sponsored by: The FreeBSD Foundation MFC after: 6 days Notes: svn path=/head/; revision=347180
* Switch to use shared vnode locks for text files during image activation.Konstantin Belousov2019-05-051-27/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kern_execve() locks text vnode exclusive to be able to set and clear VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0 condition. The change removes VV_TEXT, replacing it with the condition v_writecount <= -1, and puts v_writecount under the vnode interlock. Each text reference decrements v_writecount. To clear the text reference when the segment is unmapped, it is recorded in the vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and v_writecount is incremented on the map entry removal The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that v_writecount does not contradict the desired change. vn_writecheck() is now racy and its use was eliminated everywhere except access. Atomic check for writeability and increment of v_writecount is performed by the VOP. vn_truncate() now increments v_writecount around VOP_SETATTR() call, lack of which is arguably a bug on its own. nullfs bypasses v_writecount to the lower vnode always, so nullfs vnode has its own v_writecount correct, and lower vnode gets all references, since object->handle is always lower vnode. On the text vnode' vm object dealloc, the v_writecount value is reset to zero, and deadfs vop_unset_text short-circuit the operation. Reclamation of lowervp always reclaims all nullfs vnodes referencing lowervp first, so no stray references are left. Reviewed by: markj, trasz Tested by: mjg, pho Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D19923 Notes: svn path=/head/; revision=347151
* Do not collapse objects with OBJ_NOSPLIT backing swap object.Konstantin Belousov2019-05-051-1/+2
| | | | | | | | | | | | | | | | NOSPLIT swap objects are not anonymous, they are used by tmpfs regular files and POSIX shared memory. For such objects, collapse is not permitted. Reported by: mjg Reviewed by: markj, trasz Tested by: mjg, pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D19923 Notes: svn path=/head/; revision=347150
* Include path for tmpfs objects in vm.objects sysctlEric van Gyzen2018-11-301-31/+37
| | | | | | | | | | | | | | This applies the fix in r283924 to the vm.objects sysctl added by r283624 so the output will include the vnode information (i.e. path) for tmpfs objects. Reviewed by: kib, dab MFC after: 2 weeks Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D2724 Notes: svn path=/head/; revision=341282
* Add assertions and comment to vm_object_vnode()Eric van Gyzen2018-11-301-5/+17
| | | | | | | | | | Reviewed by: kib MFC after: 2 weeks Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D2724 Notes: svn path=/head/; revision=341281
* Re-apply r336984, reverting r339934.Mark Johnston2018-11-101-2/+3
| | | | | | | | | | | | r336984 exposed the bug fixed in r340241, leading to the initial revert while the bug was being hunted down. Now that the bug is fixed, we can revert the revert. Discussed with: alc MFC after: 3 days Notes: svn path=/head/; revision=340331
* Revert r336984.Mark Johnston2018-10-301-3/+2
| | | | | | | | | | | | It appears to be responsible for random segfaults observed when lots of paging activity is taking place, but the root cause is not yet understood. Requested by: alc MFC after: now Notes: svn path=/head/; revision=339934
* Refactor domainset iterators for use by malloc(9) and UMA.Mark Johnston2018-10-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | Before this change we had two flavours of vm_domainset iterators: "page" and "malloc". The latter was only used for kmem_*() and hard-coded its behaviour based on kernel_object's policy. Moreover, its use contained a race similar to that fixed by r338755 since the kernel_object's iterator was being run without the object lock. In some cases it is useful to be able to explicitly specify a policy (domainset) or policy+iterator (domainset_ref) when performing memory allocations. To that end, refactor the vm_dominset_* KPI to permit this, and get rid of the "malloc" domainset_iter KPI in the process. Reviewed by: jeff (previous version) Tested by: pho (part of a larger patch) MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17417 Notes: svn path=/head/; revision=339661
* Allow vm object coalescing to occur in the midst of a vm object when theAlan Cox2018-07-311-2/+3
| | | | | | | | | | | | | | | | | OBJ_ONEMAPPING flag is set. In other words, allow recycling of existing but unused subranges of a vm object when the OBJ_ONEMAPPING flag is set. Such situations are increasingly common with jemalloc >= 5.0. This change has the expected effect of reducing the number of vm map entry and object allocations and increasing the number of superpage promotions. Reviewed by: kib, markj Tested by: pho MFC after: 6 weeks Differential Revision: https://reviews.freebsd.org/D16501 Notes: svn path=/head/; revision=336984
* Fix some races introduced in r332974.Mark Johnston2018-05-041-2/+2
| | | | | | | | | | | | | | | | | With r332974, when performing a synchronized access of a page's "queue" field, one must first check whether the page is logically dequeued. If so, then the page lock does not prevent the page from being removed from its page queue. Intoduce vm_page_queue(), which returns the page's logical queue index. In some cases, direct access to the "queue" field is still required, but such accesses should be confined to sys/vm. Reported and tested by: pho Reviewed by: kib Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D15280 Notes: svn path=/head/; revision=333256
* Improve VM page queue scalability.Mark Johnston2018-04-241-53/+6
| | | | | | | | | | | | | | | | | | | | | Currently both the page lock and a page queue lock must be held in order to enqueue, dequeue or requeue a page in a given page queue. The queue locks are a scalability bottleneck in many workloads. This change reduces page queue lock contention by batching queue operations. To detangle the page and page queue locks, per-CPU batch queues are used to reference pages with pending queue operations. The requested operation is encoded in the page's aflags field with the page lock held, after which the page is enqueued for a deferred batch operation. Page queue scans are similarly optimized to minimize the amount of work performed with a page queue lock held. Reviewed by: kib, jeff (previous versions) Tested by: pho Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D14893 Notes: svn path=/head/; revision=332974
* Dequeue wired pages lazily.Mark Johnston2018-02-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, wiring a page would cause it to be removed from its page queue. In the common case, unwiring causes it to be enqueued at the tail of that page queue. This change modifies vm_page_wire() to not dequeue the page, thus avoiding the highly contended page queue locks. Instead, vm_page_unwire() takes care of requeuing the page as a single operation, and the page daemon dequeues wired pages as they are encountered during a queue scan to avoid needlessly revisiting them later. For pages in PQ_ACTIVE we do even better, since a requeue is unnecessary. The change improves scalability for some common workloads. For instance, threads wiring pages into the buffer cache no longer need to modify global page queues, and unwiring is usually done by the bufspace thread, so concurrency is not as much of an issue. As another example, many sysctl handlers wire the output buffer to avoid faults on copyout, and since the buffer is likely to be in PQ_ACTIVE, we now entirely avoid modifying the page queue in this case. The change also adds a block comment describing some properties of struct vm_page's reference counters, and the busy lock. Reviewed by: jeff Discussed with: alc, kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D11943 Notes: svn path=/head/; revision=328977
* Use per-domain locks for vm page queue free. Move paging control fromJeff Roberson2018-02-061-0/+2
| | | | | | | | | | | | | | global to per-domain state. Protect reservations with the free lock from the domain that they belong to. Refactor to make vm domains more of a first class object. Reviewed by: markj, kib, gallatin Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14000 Notes: svn path=/head/; revision=328954
* On munlock(), unwire correct page.Konstantin Belousov2018-02-051-5/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is possible, for complex fork()/collapse situations, to have sibling address spaces to partially share shadow chains. If one sibling performs wiring, it can happen that a transient page, invalid and busy, is installed into a shadow object which is visible to other sibling for the duration of vm_fault_hold(). When the backing object contains the valid page, and the wiring is performed on read-only entry, the transient page is eventually removed. But the sibling which observed the transient page might perform the unwire, executing vm_object_unwire(). There, the first page found in the shadow chain is considered as the page that was wired for the mapping. It is really the page below it which is wired. So we unwire the wrong page, either triggering the asserts of breaking the page' wire counter. As the fix, wait for the busy state to finish if we find such page during unwire, and restart the shadow chain walk after the sleep. Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D14184 Notes: svn path=/head/; revision=328880
* Implement 'domainset', a cpuset based NUMA policy mechanism. This allowsJeff Roberson2018-01-121-0/+3
| | | | | | | | | | | | | | | | | | | userspace to control NUMA policy administratively and programmatically. Implement domainset based iterators in the page layer. Remove the now legacy numa_* syscalls. Cleanup some header polution created by having seq.h in proc.h. Reviewed by: markj, kib Discussed with: alc Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D13403 Notes: svn path=/head/; revision=327895
* Make the vm object bypass and collapse counters per CPU.Alan Cox2017-12-251-8/+19
| | | | | | | | | | Requested by: mjg Reviewed by: kib, markj MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D13611 Notes: svn path=/head/; revision=327179
* SPDX: Consider code from Carnegie-Mellon University.Pedro F. Giffuni2017-11-301-1/+1
| | | | | | | Interesting cases, most likely from CMU Mach sources. Notes: svn path=/head/; revision=326403
* Eliminate kmem_arena and kmem_object in preparation for further NUMA commits.Jeff Roberson2017-11-281-9/+0
| | | | | | | | | | | | | | | | | | | | The arena argument to kmem_*() is now only used in an assert. A follow-up commit will remove the argument altogether before we freeze the API for the next release. This replaces the hard limit on kmem size with a soft limit imposed by UMA. When the soft limit is exceeded we periodically wakeup the UMA reclaim thread to attempt to shrink KVA. On 32bit architectures this should behave much more gracefully as we exhaust KVA. On 64bit the limits are likely never hit. Reviewed by: markj, kib (some objections) Discussed with: alc Tested by: pho Sponsored by: Netflix / Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D13187 Notes: svn path=/head/; revision=326347
* sys: further adoption of SPDX licensing ID tags.Pedro F. Giffuni2017-11-201-0/+2
| | | | | | | | | | | | | | | | | Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point. Notes: svn path=/head/; revision=326023
* Replace manyinstances of VM_WAIT with blocking page allocation flagsJeff Roberson2017-11-081-2/+3
| | | | | | | | | | | | | | | | | | | similar to the kernel memory allocator. This simplifies NUMA allocation because the domain will be known at wait time and races between failure and sleeping are eliminated. This also reduces boilerplate code and simplifies callers. A wait primitive is supplied for uma zones for similar reasons. This eliminates some non-specific VM_WAIT calls in favor of more explicit sleeps that may be satisfied without new pages. Reviewed by: alc, kib, markj Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Notes: svn path=/head/; revision=325530
* Batch atomic updates to the number of active, inactive, and laundryAlan Cox2017-10-191-2/+12
| | | | | | | | | | | | | pages by vm_object_terminate_pages(). For example, for a "buildworld" workload, this batching reduces vm_object_terminate_pages()'s average execution time by 12%. (The total savings were about 11.7 billion processor cycles.) Reviewed by: kib MFC after: 1 week Notes: svn path=/head/; revision=324743
* Optimize vm_object_page_remove() by eliminating pointless calls toAlan Cox2017-09-281-4/+6
| | | | | | | | | | | pmap_remove_all(). If the object to which a page belongs has no references, then that page cannot possibly be mapped. Reviewed by: kib MFC after: 1 week Notes: svn path=/head/; revision=324087
* For unlinked files, do not msync(2) or sync on the vnode deactivation.Konstantin Belousov2017-09-191-2/+2
| | | | | | | | | | | | | | | | | One consequence of the patch is that msyncing unlinked file mappings no longer reduces the amount of the dirty memory in the system, but I do not think that there are users of msync(2) that utilize it for such side-effect. Reported and tested by: tjil PR: 222356 Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D12411 Notes: svn path=/head/; revision=323768
* Batch freeing of the pages in vm_object_page_remove() under the sameKonstantin Belousov2017-09-151-1/+6
| | | | | | | | | | | | | free queue mutex lock owning session, same as it was done for the object termination in r323561. Reported and tested by: mjg Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=323616
* Do not relock free queue mutex for each page, free whole terminatingKonstantin Belousov2017-09-131-12/+47
| | | | | | | | | | | | | | | | | | | | | | | | object' page queue under the single mutex lock. First, all pages on the queue are prepared for free by calls to vm_page_free_prep(), and pages which should not be returned to the physical allocator (e.g. wired or fictitious) are simply removed from the queue. On the second pass, vm_page_free_phys_pglist() inserts all pages from the queue without relocking the mutex. The change improves the object termination, e.g. on the process exit where large anonymous memory objects otherwise cause relocks the free queue mutex for each page. More, if several such processes are exiting or execing in parallel, the mutex was highly contended on the address space demolition. Diagnosed and tested by: mjg (previous version) Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=323561
* Add a vm_page_change_lock() helper, the common code to not relock pageKonstantin Belousov2017-09-091-17/+9
| | | | | | | | | | | | | lock if both old and new pages use the same underlying lock. Convert existing places to use the helper instead of inlining it. Use the optimization in vm_object_page_remove(). Suggested and reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=323368
* Replace global swhash in swap pager with per-object trie to track swapKonstantin Belousov2017-08-251-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | blocks assigned to the object pages. - The global swhash_mtx is removed, trie is synchronized by the corresponding object lock. - The swp_pager_meta_free_all() function used during object termination is optimized by only looking at the trie instead of having to search whole hash for the swap blocks owned by the object. - On swap_pager_swapoff(), instead of iterating over the swhash, global object list have to be inspected. There, we have to ensure that we do see valid trie content if we see that the object type is swap. Sizing of the swblk zone is same as for swblock zone, each swblk maps SWAP_META_PAGES pages. Proposed by: alc Reviewed by: alc, markj (previous version) Tested by: alc, pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D11435 Notes: svn path=/head/; revision=322913
* Add OBJ_PG_DTOR flag to VM object.Ruslan Bukin2017-08-161-36/+50
| | | | | | | | | | | | | | | | | | Setting this flag allows us to skip pages removal from VM object queue during object termination and to leave that for cdev_pg_dtor function. Move pages removal code to separate function vm_object_terminate_pages() as comments does not survive indentation. This will be required for Intel SGX support where we will have to remove pages from VM object manually. Reviewed by: kib, alc Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D11688 Notes: svn path=/head/; revision=322571
* Do not allocate struct kinfo_vmobject on stack.Konstantin Belousov2017-07-221-30/+32
| | | | | | | | | | | Its size is 1184 bytes. Noted by: eugen Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=321371
* Add pctrie_init() and vm_radix_init() to initialize generic pctrie andKonstantin Belousov2017-07-191-2/+2
| | | | | | | | | | | | | | | vm_radix trie. Existing vm_radix_init() function is renamed to vm_radix_zinit(). Inlines moved out of the _ headers. Reviewed by: alc, markj (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D11661 Notes: svn path=/head/; revision=321247
* Commit the 64-bit inode project.Konstantin Belousov2017-05-231-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend the ino_t, dev_t, nlink_t types to 64-bit ints. Modify struct dirent layout to add d_off, increase the size of d_fileno to 64-bits, increase the size of d_namlen to 16-bits, and change the required alignment. Increase struct statfs f_mntfromname[] and f_mntonname[] array length MNAMELEN to 1024. ABI breakage is mitigated by providing compatibility using versioned symbols, ingenious use of the existing padding in structures, and by employing other tricks. Unfortunately, not everything can be fixed, especially outside the base system. For instance, third-party APIs which pass struct stat around are broken in backward and forward incompatible ways. Kinfo sysctl MIBs ABI is changed in backward-compatible way, but there is no general mechanism to handle other sysctl MIBS which return structures where the layout has changed. It was considered that the breakage is either in the management interfaces, where we usually allow ABI slip, or is not important. Struct xvnode changed layout, no compat shims are provided. For struct xtty, dev_t tty device member was reduced to uint32_t. It was decided that keeping ABI compat in this case is more useful than reporting 64-bit dev_t, for the sake of pstat. Update note: strictly follow the instructions in UPDATING. Build and install the new kernel with COMPAT_FREEBSD11 option enabled, then reboot, and only then install new world. Credits: The 64-bit inode project, also known as ino64, started life many years ago as a project by Gleb Kurtsou (gleb). Kirk McKusick (mckusick) then picked up and updated the patch, and acted as a flag-waver. Feedback, suggestions, and discussions were carried by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles), and Rick Macklem (rmacklem). Kris Moore (kris) performed an initial ports investigation followed by an exp-run by Antoine Brodin (antoine). Essential and all-embracing testing was done by Peter Holm (pho). The heavy lifting of coordinating all these efforts and bringing the project to completion were done by Konstantin Belousov (kib). Sponsored by: The FreeBSD Foundation (emaste, kib) Differential revision: https://reviews.freebsd.org/D10439 Notes: svn path=/head/; revision=318736
* - Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeterGleb Smirnoff2017-04-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | in place. To do per-cpu stats, convert all fields that previously were maintained in the vmmeters that sit in pcpus to counter(9). - Since some vmmeter stats may be touched at very early stages of boot, before we have set up UMA and we can do counter_u64_alloc(), provide an early counter mechanism: o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter. o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter, so that at early stages of boot, before counters are allocated we already point to a counter that can be safely written to. o For sparc64 that required a whole dummy pcpu[MAXCPU] array. Further related changes: - Don't include vmmeter.h into pcpu.h. - vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit, to match kernel representation. - struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion. This is based on benno@'s 4-year old patch: https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html Reviewed by: kib, gallatin, marius, lidl Differential Revision: https://reviews.freebsd.org/D10156 Notes: svn path=/head/; revision=317061
* Relax the locking requirements for vm_object_page_noreuse(). WhileAlan Cox2017-03-151-1/+1
| | | | | | | | | | | | | reviewing all uses of OFF_TO_IDX(), I observed that vm_object_page_noreuse() is requiring an exclusive lock on the object when, in fact, a shared lock suffices. Reviewed by: kib, markj MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D10011 Notes: svn path=/head/; revision=315318
* Use atop() instead of OFF_TO_IDX() for convertion of addresses orKonstantin Belousov2017-03-141-5/+5
| | | | | | | | | | | addresses offsets, as intended. Suggested and reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=315281
* Renumber copyright clause 4Warner Losh2017-02-281-1/+1
| | | | | | | | | | | | Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96 Notes: svn path=/head/; revision=314436
* Avoid page lookups in the top-level object in vm_object_madvise().Mark Johnston2017-01-301-69/+98
| | | | | | | | | | | | | | | We can iterate over consecutive resident pages in the top-level object using the object's page list rather than by performing lookups in the object radix tree. This extends one of the optimizations in r312208 to the case where a shadow chain is present. Suggested by: alc Reviewed by: alc, kib (previous version) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D9282 Notes: svn path=/head/; revision=312994
* Avoid unnecessary page lookups in vm_object_madvise().Mark Johnston2017-01-151-23/+34
| | | | | | | | | | | | | | | | | | vm_object_madvise() is frequently used to apply advice to a contiguous set of pages in an object with no backing object. Optimize this case by skipping non-resident subranges in constant time, and by iterating over resident pages using the object memq, thus avoiding radix tree lookups on each page index in the specified range. While here, move MADV_WILLNEED handling to vm_page_advise(), and rename the "advise" parameter to vm_object_madvise() to "advice." Reviewed by: alc, kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D9098 Notes: svn path=/head/; revision=312208
* Improve vm_object_scan_all_shadowed() to also check swap backing objects.Konstantin Belousov2016-12-181-20/+24
| | | | | | | | | | | | | | | | | | | | | As noted in the removed comment, it is possible and not prohibitively costly to look up the swap blocks for the given page index. Implement a swap_pager_find_least() function to do that, and use it to iterate simultaneously over both backing object page queue and swap allocations when looking for shadowed pages. Testing shows that number of new succesful scans, enabled by this addition, is small but non-zero. When worked out, the change both further reduces the depth of the shadow object chain, and frees unused but allocated swap and memory. Suggested and reviewed by: alc Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=310234
* Eliminate every mention of PG_CACHED pages from the comments in the machine-Alan Cox2016-12-121-4/+3
| | | | | | | | | | | | | | | | | | | independent layer of the virtual memory system. Update some of the nearby comments to eliminate redundancy and improve clarity. In vm/vm_reserv.c, do not use hyphens after adverbs ending in -ly per The Chicago Manual of Style. Update the comment in vm/vm_page.h defining the four types of page queues to reflect the elimination of PG_CACHED pages and the introduction of the laundry queue. Reviewed by: kib, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8752 Notes: svn path=/head/; revision=309898
* During vm_page_cache()'s call to vm_radix_insert(), if vm_page_alloc() wasAlan Cox2016-12-011-1/+0
| | | | | | | | | | | | | | | | | called to allocate a new page of radix trie nodes, there could be a call to vm_radix_remove() on the same trie (of PG_CACHED pages) as the in-progress vm_radix_insert(). With the removal of PG_CACHED pages, we can simplify vm_radix_insert() and vm_radix_remove() by removing the flags on the root of the trie that were used to detect this case and the code for restarting vm_radix_insert() when it happened. Reviewed by: kib, markj Tested by: pho Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8664 Notes: svn path=/head/; revision=309365
* Remove most of the code for implementing PG_CACHED pages. (This change doesAlan Cox2016-11-151-38/+1
| | | | | | | | | | | | | not remove user-space visible fields from vm_cnt or all of the references to cached pages from comments. Those changes will come later.) Reviewed by: kib, markj Tested by: pho Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8497 Notes: svn path=/head/; revision=308691
* Introduce a new page queue, PQ_LAUNDRY, for storing unreferenced, dirtyAlan Cox2016-11-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pages, specificially, dirty pages that have passed once through the inactive queue. A new, dedicated thread is responsible for both deciding when to launder pages and actually laundering them. The new policy uses the relative sizes of the inactive and laundry queues to determine whether to launder pages at a given point in time. In general, this leads to more intelligent swapping behavior, since the laundry thread will avoid pageouts when the marginal benefit of doing so is low. Previously, without a dedicated queue for dirty pages, the page daemon didn't have the information to determine whether pageout provides any benefit to the system. Thus, the previous policy often resulted in small but steadily increasing amounts of swap usage when the system is under memory pressure, even when the inactive queue consisted mostly of clean pages. This change addresses that issue, and also paves the way for some future virtual memory system improvements by removing the last source of object-cached clean pages, i.e., PG_CACHE pages. The new laundry thread sleeps while waiting for a request from the page daemon thread(s). A request is raised by setting the variable vm_laundry_request and waking the laundry thread. We request launderings for two reasons: to try and balance the inactive and laundry queue sizes ("background laundering"), and to quickly make up for a shortage of free pages and clean inactive pages ("shortfall laundering"). When background laundering is requested, the laundry thread computes the number of page daemon wakeups that have taken place since the last laundering. If this number is large enough relative to the ratio of the laundry and (global) inactive queue sizes, we will launder vm_background_launder_target pages at vm_background_launder_rate KB/s. Otherwise, the laundry thread goes back to sleep without doing any work. When scanning the laundry queue during background laundering, reactivated pages are counted towards the laundry thread's target. In contrast, shortfall laundering is requested when an inactive queue scan fails to meet its target. In this case, the laundry thread attempts to launder enough pages to meet v_free_target within 0.5s, which is the inactive queue scan period. A laundry request can be latched while another is currently being serviced. In particular, a shortfall request will immediately preempt a background laundering. This change also redefines the meaning of vm_cnt.v_reactivated and removes the functions vm_page_cache() and vm_page_try_to_cache(). The new meaning of vm_cnt.v_reactivated now better reflects its name. It represents the number of inactive or laundry pages that are returned to the active queue on account of a reference. In collaboration with: markj Reviewed by: kib Tested by: pho Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8302 Notes: svn path=/head/; revision=308474
* Fix a race in vm_page_busy_sleep(9).Konstantin Belousov2016-10-131-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Suppose that we have an exclusively busy page, and a thread which can accept shared-busy page. In this case, typical code waiting for the page xbusy state to pass is again: VM_OBJECT_WLOCK(object); ... if (vm_page_xbusied(m)) { vm_page_lock(m); VM_OBJECT_WUNLOCK(object); <---1 vm_page_busy_sleep(p, "vmopax"); goto again; } Suppose that the xbusy state owner locked the object, unbusied the page and unlocked the object after we are at the line [1], but before we executed the load of the busy_lock word in vm_page_busy_sleep(). If it happens that there is still no waiters recorded for the busy state, the xbusy owner did not acquired the page lock, so it proceeded. More, suppose that some other thread happen to share-busy the page after xbusy state was relinquished but before the m->busy_lock is read in vm_page_busy_sleep(). Again, that thread only needs vm_object lock to proceed. Then, vm_page_busy_sleep() reads busy_lock value equal to the VPB_SHARERS_WORD(1). In this case, all tests in vm_page_busy_sleep(9) pass and we are going to sleep, despite the page being share-busied. Update check for m->busy_lock == VPB_UNBUSIED in vm_page_busy_sleep(9) to also accept shared-busy state if we only wait for the xbusy state to pass. Merge sequential if()s with the same 'then' clause in vm_page_busy_sleep(). Note that the current code does not share-busy pages from parallel threads, the only way to have more that one sbusy owner is right now is to recurse. Reported and tested by: pho (previous version) Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D8196 Notes: svn path=/head/; revision=307218
* Replace all remaining calls to vprint(9) with vn_printf(9), and removeEdward Tomasz Napierala2016-08-101-1/+1
| | | | | | | | | the old macro. MFC after: 1 month Notes: svn path=/head/; revision=303924
* In vgonel(), postpone setting BO_DEAD until VOP_RECLAIM() is called,Konstantin Belousov2016-07-111-0/+4
| | | | | | | | | | | | | | | | | | | | if vnode is VMIO. For VMIO vnodes, set BO_DEAD in vm_object_terminate(). The vnode_destroy_object(), when calling into vm_object_terminate(), must be able to flush buffers. BO_DEAD purpose is to quickly destroy buffers on write when the underlying vnode is not operable any more (one example is the devfs node after geom is gone). Setting BO_DEAD for reclaiming vnode before object is terminated is premature, and results in unability to flush buffers with live SU dependencies from vinvalbuf() in vm_object_terminate(). Reported by: David Cross <dcrosstech@gmail.com> Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=302567
* Do not leak the vm object lock when swap reservation failed, inKonstantin Belousov2016-05-291-0/+1
| | | | | | | | | | | vm_object_coalesce(). Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=300959