summaryrefslogtreecommitdiff
path: root/sys/vm/vm_object.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Prevent parallel object collapses. Both vm_object_collapse_scan() andKonstantin Belousov2016-05-261-0/+5
| | | | | | | | | | | | | | | | | | | | swap_pager_copy() might unlock the object, which allows the parallel collapse to execute. Besides destroying the object, it also might move the reference from parent to the backing object, firing the assertion ref_count == 1. Collapses are prevented by bumping paging_in_progress counters on both the object and its backing object. Reported by: cem Tested by: pho (previous version) Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week X-Differential revision: https://reviews.freebsd.org/D6085 Notes: svn path=/head/; revision=300758
* Style changes to some most outrageous violations in vm_object_collapse().Konstantin Belousov2016-05-261-9/+6
| | | | | | | | | Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=300756
* Add implementation of robust mutexes, hopefully close enough to theKonstantin Belousov2016-05-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | intention of the POSIX IEEE Std 1003.1TM-2008/Cor 1-2013. A robust mutex is guaranteed to be cleared by the system upon either thread or process owner termination while the mutex is held. The next mutex locker is then notified about inconsistent mutex state and can execute (or abandon) corrective actions. The patch mostly consists of small changes here and there, adding neccessary checks for the inconsistent and abandoned conditions into existing paths. Additionally, the thread exit handler was extended to iterate over the userspace-maintained list of owned robust mutexes, unlocking and marking as terminated each of them. The list of owned robust mutexes cannot be maintained atomically synchronous with the mutex lock state (it is possible in kernel, but is too expensive). Instead, for the duration of lock or unlock operation, the current mutex is remembered in a special slot that is also checked by the kernel at thread termination. Kernel must be aware about the per-thread location of the heads of robust mutex lists and the current active mutex slot. When a thread touches a robust mutex for the first time, a new umtx op syscall is issued which informs about location of lists heads. The umtx sleep queues for PP and PI mutexes are split between non-robust and robust. Somewhat unrelated changes in the patch: 1. Style. 2. The fix for proper tdfind() call use in umtxq_sleep_pi() for shared pi mutexes. 3. Removal of the userspace struct pthread_mutex m_owner field. 4. The sysctl kern.ipc.umtx_vnode_persistent is added, which controls the lifetime of the shared mutex associated with a vnode' page. Reviewed by: jilles (previous version, supposedly the objection was fixed) Discussed with: brooks, Martin Simmons <martin@lispworks.com> (some aspects) Tested by: pho Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=300043
* sys/vm: minor spelling fixes in comments.Pedro F. Giffuni2016-05-021-1/+1
| | | | | | | No functional change. Notes: svn path=/head/; revision=298940
* Implement process-shared locks support for libthr.so.3, withoutKonstantin Belousov2016-02-281-0/+5
| | | | | | | | | | | | | | | breaking the ABI. Special value is stored in the lock pointer to indicate shared lock, and offline page in the shared memory is allocated to store the actual lock. Reviewed by: vangyzen (previous version) Discussed with: deischen, emaste, jhb, rwatson, Martin Simmons <martin@lispworks.com> Tested by: pho Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=296162
* A change to KPI of vm_pager_get_pages() and underlying VOP_GETPAGES().Gleb Smirnoff2015-12-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | o With new KPI consumers can request contiguous ranges of pages, and unlike before, all pages will be kept busied on return, like it was done before with the 'reqpage' only. Now the reqpage goes away. With new interface it is easier to implement code protected from race conditions. Such arrayed requests for now should be preceeded by a call to vm_pager_haspage() to make sure that request is possible. This could be improved later, making vm_pager_haspage() obsolete. Strenghtening the promises on the business of the array of pages allows us to remove such hacks as swp_pager_free_nrpage() and vm_pager_free_nonreq(). o New KPI accepts two integer pointers that may optionally point at values for read ahead and read behind, that a pager may do, if it can. These pages are completely owned by pager, and not controlled by the caller. This shifts the UFS-specific readahead logic from vm_fault.c, which should be file system agnostic, into vnode_pager.c. It also removes one VOP_BMAP() request per hard fault. Discussed with: kib, alc, jeff, scottl Sponsored by: Nginx, Inc. Sponsored by: Netflix Notes: svn path=/head/; revision=292373
* Pull vm_object_scan_all_shadowed out of vm_object_backing_scanConrad Meyer2015-12-031-155/+146
| | | | | | | | | | | | | | | | These two functions were largely unrelated, they just used the same same loop logic to walk through a backing object's memq. Pull out the all_shadowed test as its own function and eliminate OBSC_TEST_ALL_SHADOWED. Rename vm_object_backing_scan to vm_object_collapse_scan. No functional change. Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D4335 Notes: svn path=/head/; revision=291704
* r221714 fixed the situation when the collapse scan improperly handledKonstantin Belousov2015-12-011-97/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | invalid (busy) page supposedly inserted by the vm_fault(), in the OBSC_COLLAPSE_NOWAIT case. As a continuation to r221714, fix a case when invalid page is found by the object scan in OBSC_COLLAPSE_WAIT case as well. But, since this is waitable scan, we should wait for the termination of the busy state and restart from the beginning of the backing object' page queue. [*] Do not free the shadow page swap space when the parent page is invalid, otherwise this action potentially corrupts user data. Combine all instances of the collapse scan sleep code fragments into the new helper vm_object_backing_scan_wait(). Improve style compliance and comments. Change the return type of vm_object_backing_scan() to bool. Initial submission by: cem, https://reviews.freebsd.org/D4103 [*] Reviewed by: alc, cem Tested by: cem Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D4146 Notes: svn path=/head/; revision=291576
* As a step towards the elimination of PG_CACHED pages, rework the handlingMark Johnston2015-09-301-11/+11
| | | | | | | | | | | | | | | | | | | | | | | of POSIX_FADV_DONTNEED so that it causes the backing pages to be moved to the head of the inactive queue instead of being cached. This affects the implementation of POSIX_FADV_NOREUSE as well, since it works by applying POSIX_FADV_DONTNEED to file ranges after they have been read or written. At that point the corresponding buffers may still be dirty, so the previous implementation would coalesce successive ranges and apply POSIX_FADV_DONTNEED to the result, ensuring that pages backing the dirty buffers would eventually be cached. To preserve this behaviour in an efficient manner, this change adds a new buf flag, B_NOREUSE, which causes the pages backing a VMIO buf to be placed at the head of the inactive queue when the buf is released. POSIX_FADV_NOREUSE then works by setting this flag in bufs that underlie the specified range. Reviewed by: alc, kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3726 Notes: svn path=/head/; revision=288431
* Revert r173708's modifications to vm_object_page_remove().Konstantin Belousov2015-07-251-23/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Assume that a vnode is mapped shared and mlocked(), and then the vnode is truncated, or truncated and then again extended past the mapping point EOF. Truncation removes the pages past the truncation point, and if pages are later created at this range, they are not properly mapped into the mlocked region, and their wiring count is wrong. The revert leaves the invalidated but wired pages on the object queue, which means that the pages are found by vm_object_unwire() when the mapped range is munlock()ed, and reused by the buffer cache when the vnode is extended again. The changes in r173708 were required since then vm_map_unwire() looked at the page tables to find the page to unwire. This is no longer needed with the vm_object_unwire() introduction, which follows the objects shadow chain. Also eliminate OBJPR_NOTWIRED flag for vm_object_page_remove(), which is now redundand, we do not remove wired pages. Reported by: trasz, Dmitry Sivachenko <trtrmitya@gmail.com> Suggested and reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=285878
* Make KPI of vm_pager_get_pages() more strict: if a pager changes a pageGleb Smirnoff2015-06-121-6/+2
| | | | | | | | | | | | | | in the requested array, then it is responsible for disposition of previous page and is responsible for updating the entry in the requested array. Now consumers of KPI do not need to re-lookup the pages after call to vm_pager_get_pages(). Reviewed by: kib Sponsored by: Netflix Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=284310
* Provide vnode in memory map info for files on tmpfsEric van Gyzen2015-06-021-0/+12
| | | | | | | | | | | | | | | | | | | When providing memory map information to userland, populate the vnode pointer for tmpfs files. Set the memory mapping to appear as a vnode type, to match FreeBSD 9 behavior. This fixes the use of tmpfs files with the dtrace pid provider, procstat -v, procfs, linprocfs, pmc (pmcstat), and ptrace (PT_VM_ENTRY). Submitted by: Eric Badger <eric@badgerio.us> (initial revision) Obtained from: Dell Inc. PR: 198431 MFC after: 2 weeks Reviewed by: jhb Approved by: kib (mentor) Notes: svn path=/head/; revision=283924
* Export a list of VM objects in the system via a sysctl. The list can beJohn Baldwin2015-05-271-0/+137
| | | | | | | | | | | | | examined via 'vmstat -o'. It can be used to determine which files are using physical pages of memory and how much each is using. Differential Revision: https://reviews.freebsd.org/D2277 Reviewed by: alc, kib MFC after: 2 weeks Sponsored by: Norse Corp, Inc. (forward porting to HEAD/10) Notes: svn path=/head/; revision=283624
* Place VM objects on the object list when created and never remove them.John Baldwin2015-05-081-14/+17
| | | | | | | | | | | | | | | | | | This is ok since objects come from a NOFREE zone and allows objects to be locked while traversing the object list without triggering a LOR. Ensure that objects on the list are marked DEAD while free or stillborn, and that they have a refcount of zero. This required updating most of the pagers to explicitly mark an object as dead when deallocating it. (Only the vnode pager did this previously.) Differential Revision: https://reviews.freebsd.org/D2423 Reviewed by: alc, kib (earlier version) MFC after: 2 weeks Sponsored by: Norse Corp, Inc. Notes: svn path=/head/; revision=282660
* Correct a typo in vm_object_backing_scan() that originated in r254141.Alan Cox2015-03-071-1/+1
| | | | | | | | | | Specifically, change a lock acquire into a lock release. MFC after: 3 days Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=279720
* Use RW_NEW rather than calling bzero().Alan Cox2015-03-011-2/+1
| | | | Notes: svn path=/head/; revision=279475
* Update mtime for tmpfs files modified through memory mapping. SimilarKonstantin Belousov2015-01-281-1/+6
| | | | | | | | | | | | | | | | | | | | to UFS, perform updates during syncer scans, which in particular means that tmpfs now performs scan on sync. Also, this means that a mtime update may be delayed up to 30 seconds after the write. The vm_object' OBJ_TMPFS_DIRTY flag for tmpfs swap object is similar to the OBJ_MIGHTBEDIRTY flag for the vnode object, it indicates that object could have been dirtied. Adapt fast page fault handler and vm_object_set_writeable_dirty() to handle OBJ_TMPFS_NODE same as OBJT_VNODE. Reported by: Ronald Klop <ronald-lists@klop.ws> Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=277828
* When the last reference on the vnode' vm object is dropped, read theKonstantin Belousov2014-12-051-1/+6
| | | | | | | | | | | | | | | | | | | | | vp->v_vflag without taking vnode lock and without bypass. We do know that vp is the lowest level in the stack, since the pointer is obtained from the object' handle. Stale VV_TEXT flag read can only happen if parallel execve() is performed and not yet activated the image, since process takes reference for text mapping. In this case, the execve() code manages the VV_TEXT flag on its own already. It was observed that otherwise read-only sendfile(2) requires exclusive vnode lock and contending on it on some loads for VV_TEXT handling. Reported by: glebius, scottl Tested by: glebius, pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=275513
* When unwiring a region of an address space, do not assume that theAlan Cox2014-07-261-0/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | underlying physical pages are mapped by the pmap. If, for example, the application has performed an mprotect(..., PROT_NONE) on any part of the wired region, then those pages will no longer be mapped by the pmap. So, using the pmap to lookup the wired pages in order to unwire them doesn't always work, and when it doesn't work wired pages are leaked. To avoid the leak, introduce and use a new function vm_object_unwire() that locates the wired pages by traversing the object and its backing objects. At the same time, switch from using pmap_change_wiring() to the recently introduced function pmap_unwire() for unwiring the region's mappings. pmap_unwire() is faster, because it operates a range of virtual addresses rather than a single virtual page at a time. Moreover, by operating on a range, it is superpage friendly. It doesn't waste time performing unnecessary demotions. Reported by: markj Reviewed by: kib Tested by: pho, jmg (arm) Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=269134
* Correct assertion. The shadowing object cannot be tmpfs vm object,Konstantin Belousov2014-07-241-2/+4
| | | | | | | | | | | | and tmpfs object cannot shadow. In other words, tmpfs vm object is always at the bottom of the shadow chain. Reported and tested by: bdrewery Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=269053
* The OBJ_TMPFS flag of vm_object means that there is unreclaimed tmpfsKonstantin Belousov2014-07-141-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | vnode for the tmpfs node owning this object. The flag is currently used for two purposes. First, it allows to correctly handle VV_TEXT for tmpfs vnode when the ref count on the object is decremented to 1, similar to vnode_pager_dealloc() for regular filesystems. Second, it prevents some operations, which are done on OBJT_SWAP vm objects backing user anonymous memory, but are incorrect for the object owned by tmpfs node. The second kind of use of the OBJ_TMPFS flag is incorrect, since the vnode might be reclaimed, which clears the flag, but vm object operations must still be disallowed. Introduce one more flag, OBJ_TMPFS_NODE, which is permanently set on the object for VREG tmpfs node, and used instead of OBJ_TMPFS to test whether vm object collapse and similar actions should be disabled. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=268615
* Rename global cnt to vm_cnt to avoid shadowing.Bryan Drewery2014-03-221-1/+1
| | | | | | | | | | | | | | | | | To reduce the diff struct pcu.cnt field was not renamed, so PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in kvm(3) and vmstat(8). The goal was to not affect externally used KPI. Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the the global cnt variable. Exp-run revealed no ports using it directly. No objection from: arch@ Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=263620
* Do not vdrop() the tmpfs vnode until it is unlocked. The holdKonstantin Belousov2014-03-121-1/+2
| | | | | | | | | | reference might be the last, and then vdrop() would free the vnode. Reported and tested by: bdrewery MFC after: 1 week Notes: svn path=/head/; revision=263092
* Fix-up r254141: in the process of making a failing vm_page_rename()Attilio Rao2014-02-141-2/+4
| | | | | | | | | | | | | | | a call of pager_swap_freespace() was moved around, now leading to freeing the incorrect page because of the pindex changes after vm_page_rename(). Get back to use the correct pindex when destroying the swap space. Sponsored by: EMC / Isilon storage division Reported by: avg Tested by: pho MFC after: 7 days Notes: svn path=/head/; revision=261867
* Fix function name in KASSERT().Gleb Smirnoff2014-02-121-2/+1
| | | | | | | Submitted by: hiren Notes: svn path=/head/; revision=261811
* Do not coalesce if the swap object belongs to tmpfs vnode. TheKonstantin Belousov2013-11-051-2/+3
| | | | | | | | | | | | | | | | | | | coalesce would extend the object to keep pages for the anonymous mapping created by the process. The pages has no relations to the tmpfs file content which could be written into the corresponding range, causing anonymous mapping and file content aliasing and subsequent corruption. Another lesser problem created by coalescing is over-accounting on the tmpfs node destruction, since the object size is substracted from the total count of the pages owned by the tmpfs mount. Reported and tested by: bdrewery Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=257680
* Drain for the xbusy state for two places which potentially doKonstantin Belousov2013-09-081-0/+6
| | | | | | | | | | | | | | | pmap_remove_all(). Not doing the drain allows the pmap_enter() to proceed in parallel, making the pmap_remove_all() effects void. The race results in an invalidated page mapped wired by usermode. Reported and tested by: pho Reviewed by: alc Sponsored by: The FreeBSD Foundation Approved by: re (glebius) Notes: svn path=/head/; revision=255396
* Remove the deprecated VM_ALLOC_RETRY flag for the vm_page_grab(9).Konstantin Belousov2013-08-221-2/+1
| | | | | | | | | | | The flag was mandatory since r209792, where vm_page_grab(9) was changed to only support the alloc retry semantic. Suggested and reviewed by: alc Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=254649
* On all the architectures, avoid to preallocate the physical memoryAttilio Rao2013-08-091-27/+49
| | | | | | | | | | | | | | | | | | | | | | | | for nodes used in vm_radix. On architectures supporting direct mapping, also avoid to pre-allocate the KVA for such nodes. In order to do so make the operations derived from vm_radix_insert() to fail and handle all the deriving failure of those. vm_radix-wise introduce a new function called vm_radix_replace(), which can replace a leaf node, already present, with a new one, and take into account the possibility, during vm_radix_insert() allocation, that the operations on the radix trie can recurse. This means that if operations in vm_radix_insert() recursed vm_radix_insert() will start from scratch again. Sponsored by: EMC / Isilon storage division Reviewed by: alc (older version) Reviewed by: jeff Tested by: pho, scottl Notes: svn path=/head/; revision=254141
* The soft and hard busy mechanism rely on the vm object lock to work.Attilio Rao2013-08-091-23/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unify the 2 concept into a real, minimal, sxlock where the shared acquisition represent the soft busy and the exclusive acquisition represent the hard busy. The old VPO_WANTED mechanism becames the hard-path for this new lock and it becomes per-page rather than per-object. The vm_object lock becames an interlock for this functionality: it can be held in both read or write mode. However, if the vm_object lock is held in read mode while acquiring or releasing the busy state, the thread owner cannot make any assumption on the busy state unless it is also busying it. Also: - Add a new flag to directly shared busy pages while vm_page_alloc and vm_page_grab are being executed. This will be very helpful once these functions happen under a read object lock. - Move the swapping sleep into its own per-object flag The KPI is heavilly changed this is why the version is bumped. It is very likely that some VM ports users will need to change their own code. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff, kib Tested by: gavin, bapt (older version) Tested by: pho, scottl Notes: svn path=/head/; revision=254138
* Replace kernel virtual address space allocation with vmem. This providesJeff Roberson2013-08-071-1/+1
| | | | | | | | | | | | | | | | transparent layering and better fragmentation. - Normalize functions that allocate memory to use kmem_* - Those that allocate address space are named kva_* - Those that operate on maps are named kmap_* - Implement recursive allocation handling for kmem_arena in vmem. Reviewed by: alc Tested by: pho Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=254025
* Never remove user-wired pages from an object when doingKonstantin Belousov2013-07-111-9/+10
| | | | | | | | | | | | | | | | | | msync(MS_INVALIDATE). The vm_fault_copy_entry() requires that object range which corresponds to the user-wired vm_map_entry, is always fully populated. Add OBJPR_NOTWIRED flag for vm_object_page_remove() to request the preserving behaviour, use it when calling vm_object_page_remove() from vm_object_sync(). Reported and tested by: pho Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=253189
* - Add a general purpose resource allocator, vmem, from NetBSD. It wasJeff Roberson2013-06-281-6/+0
| | | | | | | | | | | | | | | | | originally inspired by the Solaris vmem detailed in the proceedings of usenix 2001. The NetBSD version was heavily refactored for bugs and simplicity. - Use this resource allocator to allocate the buffer and transient maps. Buffer cache defrags are reduced by 25% when used by filesystems with mixed block sizes. Ultimately this may permit dynamic buffer cache sizing on low KVA machines. Discussed with: alc, kib, attilio Tested by: pho Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=252330
* Revise the interface between vm_object_madvise() and vm_page_dontneed() soAlan Cox2013-06-101-22/+2
| | | | | | | | | | that pointless calls to pmap_is_modified() can be easily avoided when performing madvise(..., MADV_FREE). Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=251591
* In vm_object_split(), busy and consequently unbusy the pages only whenAttilio Rao2013-06-041-3/+4
| | | | | | | | | | | swap_pager_copy() is invoked, otherwise there is no reason to do so. This will eliminate the necessity to busy pages most of the times. Sponsored by: EMC / Isilon storage division Reviewed by: alc Notes: svn path=/head/; revision=251397
* After the object lock was dropped, the object' reference count couldKonstantin Belousov2013-05-301-5/+5
| | | | | | | | | | | | | | change. Retest the ref_count and return from the function to not execute the further code which assumes that ref_count == 1 if it is not. Also, do not leak vnode lock if other thread cleared OBJ_TMPFS flag meantime. Reported by: bdrewery Tested by: bdrewery, pho Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=251151
* Remove the capitalization in the assertion message. Print the addressKonstantin Belousov2013-05-301-1/+1
| | | | | | | of the object to get useful information from optimizated kernels dump. Notes: svn path=/head/; revision=251150
* Rework the handling of the tmpfs node backing swap object and tmpfsKonstantin Belousov2013-04-281-1/+23
| | | | | | | | | | | | | | | | | | | | | vnode v_object to avoid double-buffering. Use the same object both as the backing store for tmpfs node and as the v_object. Besides reducing memory use up to 2x times for situation of mapping files from tmpfs, it also makes tmpfs read and write operations copy twice bytes less. VM subsystem was already slightly adapted to tolerate OBJT_SWAP object as v_object. Now the vm_object_deallocate() is modified to not reinstantiate OBJ_ONEMAPPING flag and help the VFS to correctly handle VV_TEXT flag on the last dereference of the tmpfs backing object. Reviewed by: alc Tested by: pho, bf MFC after: 1 month Notes: svn path=/head/; revision=250030
* Make vm_object_page_clean() and vm_mmap_vnode() tolerate the vnode'Konstantin Belousov2013-04-281-1/+6
| | | | | | | | | | | | | | | | | | | | v_object of non OBJT_VNODE type. For vm_object_page_clean(), simply do not assert that object type must be OBJT_VNODE, and add a comment explaining how the check for OBJ_MIGHTBEDIRTY prevents the rest of function from operating on such objects. For vm_mmap_vnode(), if the object type is not OBJT_VNODE, require it to be for swap pager (or default), handle the bypass filesystems, and correctly acquire the object reference in this case. Reviewed by: alc Tested by: pho, bf MFC after: 1 week Notes: svn path=/head/; revision=250029
* Sync back vmcontention branch into HEAD:Attilio Rao2013-03-181-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the per-object resident and cached pages splay tree with a path-compressed multi-digit radix trie. Along with this, switch also the x86-specific handling of idle page tables to using the radix trie. This change is supposed to do the following: - Allowing the acquisition of read locking for lookup operations of the resident/cached pages collections as the per-vm_page_t splay iterators are now removed. - Increase the scalability of the operations on the page collections. The radix trie does rely on the consumers locking to ensure atomicity of its operations. In order to avoid deadlocks the bisection nodes are pre-allocated in the UMA zone. This can be done safely because the algorithm needs at maximum one new node per insert which means the maximum number of the desired nodes is the number of available physical frames themselves. However, not all the times a new bisection node is really needed. The radix trie implements path-compression because UFS indirect blocks can lead to several objects with a very sparse trie, increasing the number of levels to usually scan. It also helps in the nodes pre-fetching by introducing the single node per-insert property. This code is not generalized (yet) because of the possible loss of performance by having much of the sizes in play configurable. However, efforts to make this code more general and then reusable in further different consumers might be really done. The only KPI change is the removal of the function vm_page_splay() which is now reaped. The only KBI change, instead, is the removal of the left/right iterators from struct vm_page, which are now reaped. Further technical notes broken into mealpieces can be retrieved from the svn branch: http://svn.freebsd.org/base/user/attilio/vmcontention/ Sponsored by: EMC / Isilon storage division In collaboration with: alc, jeff Tested by: flo, pho, jhb, davide Tested by: ian (arm) Tested by: andreast (powerpc) Notes: svn path=/head/; revision=248449
* Switch the vm_object mutex to be a rwlock. This will enable in theAttilio Rao2013-03-091-103/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho Notes: svn path=/head/; revision=248084
* Merge from vmc-playground:Attilio Rao2013-03-091-5/+6
| | | | | | | | | | | | | | | | Introduce a new KPI that verifies if the page cache is empty for a specified vm_object. This KPI does not make assumptions about the locking in order to be used also for building assertions at init and destroy time. It is mostly used to hide implementation details of the page cache. Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: alc (vm_radix based version) Tested by: flo, pho, jhb, davide Notes: svn path=/head/; revision=248082
* Merge from vmcontention:Attilio Rao2013-03-041-4/+5
| | | | | | | | | | | | | | | | As vm objects are type-stable there is no need to initialize the resident splay tree pointer and the cache splay tree pointer in _vm_object_allocate() but this could be done in the init UMA zone handler. The destructor UMA zone handler, will further check if the condition is retained at every destruction and catch for bugs. Sponsored by: EMC / Isilon storage division Submitted by: alc Notes: svn path=/head/; revision=247788
* The value held by the vm object's field pg_color is only consideredAlan Cox2013-03-021-1/+0
| | | | | | | | | | valid if the flag OBJ_COLORED is set. Since _vm_object_allocate() doesn't set this flag, it needn't initialize pg_color. Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=247659
* Merge from vmc-playground branch:Attilio Rao2013-02-261-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the sub-optimal uma_zone_set_obj() primitive with more modern uma_zone_reserve_kva(). The new primitive reserves before hand the necessary KVA space to cater the zone allocations and allocates pages with ALLOC_NOOBJ. More specifically: - uma_zone_reserve_kva() does not need an object to cater the backend allocator. - uma_zone_reserve_kva() can cater M_WAITOK requests, in order to serve zones which need to do uma_prealloc() too. - When possible, uma_zone_reserve_kva() uses directly the direct-mapping by uma_small_alloc() rather than relying on the KVA / offset combination. The removal of the object attribute allows 2 further changes: 1) _vm_object_allocate() becomes static within vm_object.c 2) VM_OBJECT_LOCK_INIT() is removed. This function is replaced by direct calls to mtx_init() as there is no need to export it anymore and the calls aren't either homogeneous anymore: there are now small differences between arguments passed to mtx_init(). Sponsored by: EMC / Isilon storage division Reviewed by: alc (which also offered almost all the comments) Tested by: pho, jhb, davide Notes: svn path=/head/; revision=247360
* Remove white spaces.Attilio Rao2013-02-261-2/+2
| | | | | | | Sponsored by: EMC / Isilon storage division Notes: svn path=/head/; revision=247346
* Wrap the sleeps synchronized by the vm_object lock into the specificAttilio Rao2013-02-261-7/+5
| | | | | | | | | | | | | | macro VM_OBJECT_SLEEP(). This hides some implementation details like the usage of the msleep() primitive and the necessity to access to the lock address directly. For this reason VM_OBJECT_MTX() macro is now retired. Sponsored by: EMC / Isilon storage division Reviewed by: alc Tested by: pho Notes: svn path=/head/; revision=247323
* In the past four years, we've added two new vm object types. Each time,Alan Cox2012-12-091-8/+27
| | | | | | | | | | | | | | | | | | | | | similar changes had to be made in various places throughout the machine- independent virtual memory layer to support the new vm object type. However, in most of these places, it's actually not the type of the vm object that matters to us but instead certain attributes of its pages. For example, OBJT_DEVICE, OBJT_MGTDEVICE, and OBJT_SG objects contain fictitious pages. In other words, in most of these places, we were testing the vm object's type to determine if it contained fictitious (or unmanaged) pages. To both simplify the code in these places and make the addition of future vm object types easier, this change introduces two new vm object flags that describe attributes of the vm object's pages, specifically, whether they are fictitious or unmanaged. Reviewed and tested by: kib Notes: svn path=/head/; revision=244043
* Add support for the (relatively) new object type OBJT_MGTDEVICE toAlan Cox2012-11-281-0/+4
| | | | | | | | | | | vm_object_set_memattr(). Also, add a "safety belt" so that vm_object_set_memattr() doesn't silently modify undefined object types. Reviewed by: kib MFC after: 10 days Notes: svn path=/head/; revision=243659
* Remove the support for using non-mpsafe filesystem modules.Konstantin Belousov2012-10-221-32/+1
| | | | | | | | | | | | | | | In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho Notes: svn path=/head/; revision=241896