summaryrefslogtreecommitdiff
path: root/sys/vm/device_pager.c
Commit message (Collapse)AuthorAgeFilesLines
* Don't hold the object lock while calling getpages.Jeff Roberson2020-01-191-1/+2
| | | | | | | | | | | | | The vnode pager does not want the object lock held. Moving this out allows further object lock scope reduction in callers. While here add some missing paging in progress calls and an assert. The object handle is now protected explicitly with pip. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23033 Notes: svn path=/head/; revision=356902
* Make page busy state deterministic on free. Pages must be xbusy whenJeff Roberson2019-12-221-4/+6
| | | | | | | | | | | | | | | | | | | removed from objects including calls to free. Pages must not be xbusy when freed and not on an object. Strengthen assertions to match these expectations. In practice very little code had to change busy handling to meet these rules but we can now make stronger guarantees to busy holders and avoid conditionally dropping busy in free. Refine vm_page_remove() and vm_page_replace() semantics now that we have stronger guarantees about busy state. This removes redundant and potentially problematic code that has proliferated. Discussed with: markj Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D22822 Notes: svn path=/head/; revision=356002
* (4/6) Protect page valid with the busy lock.Jeff Roberson2019-10-151-1/+1
| | | | | | | | | | | | | | Atomics are used for page busy and valid state when the shared busy is held. The details of the locking protocol and valid and dirty synchronization are in the updated vm_page.h comments. Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21594 Notes: svn path=/head/; revision=353539
* (2/6) Don't release xbusy in vm_page_remove(), defer to vm_page_free_prep().Jeff Roberson2019-10-151-0/+1
| | | | | | | | | | | | This persists busy state across operations like rename and replace. Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21549 Notes: svn path=/head/; revision=353537
* Change synchonization rules for vm_page reference counting.Mark Johnston2019-09-091-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are several mechanisms by which a vm_page reference is held, preventing the page from being freed back to the page allocator. In particular, holding the page's object lock is sufficient to prevent the page from being freed; holding the busy lock or a wiring is sufficent as well. These references are protected by the page lock, which must therefore be acquired for many per-page operations. This results in false sharing since the page locks are external to the vm_page structures themselves and each lock protects multiple structures. Transition to using an atomically updated per-page reference counter. The object's reference is counted using a flag bit in the counter. A second flag bit is used to atomically block new references via pmap_extract_and_hold() while removing managed mappings of a page. Thus, the reference count of a page is guaranteed not to increase if the page is unbusied, unmapped, and the object's write lock is held. As a consequence of this, the page lock no longer protects a page's identity; operations which move pages between objects are now synchronized solely by the objects' locks. The vm_page_wire() and vm_page_unwire() KPIs are changed. The former requires that either the object lock or the busy lock is held. The latter no longer has a return value and may free the page if it releases the last reference to that page. vm_page_unwire_noq() behaves the same as before; the caller is responsible for checking its return value and freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is introduced for use in pmap_extract_and_hold(). It fails if the page is concurrently being unmapped, typically triggering a fallback to the fault handler. vm_page_wire() no longer requires the page lock and vm_page_unwire() now internally acquires the page lock when releasing the last wiring of a page (since the page lock still protects a page's queue state). In particular, synchronization details are no longer leaked into the caller. The change excises the page lock from several frequently executed code paths. In particular, vm_object_terminate() no longer bounces between page locks as it releases an object's pages, and direct I/O and sendfile(SF_NOCACHE) completions no longer require the page lock. In these latter cases we now get linear scalability in the common scenario where different threads are operating on different files. __FreeBSD_version is bumped. The DRM ports have been updated to accomodate the KPI changes. Reviewed by: jeff (earlier version) Tested by: gallatin (earlier version), pho Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20486 Notes: svn path=/head/; revision=352110
* Add a return value to vm_page_remove().Mark Johnston2019-06-261-1/+1
| | | | | | | | | | | | | | | | | | Use it to indicate whether the page may be safely freed following its removal from the object. Also change vm_page_remove() to assume that the page's object pointer is non-NULL, and have callers perform this check instead. This is a step towards an implementation of an atomic reference counter for each physical page structure. Reviewed by: alc, dougm, kib MFC after: 1 week Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20758 Notes: svn path=/head/; revision=349432
* Change the vm_ooffset_t type to unsigned.Konstantin Belousov2018-12-021-3/+3
| | | | | | | | | | | | | | | | | | | The type represents byte offset in the vm_object_t data space, which does not span negative offsets in FreeBSD VM. The change matches byte offset signess with the unsignedness of the vm_pindex_t which represents the type of the page indexes in the objects. This allows to remove the UOFF_TO_IDX() macro which was used when we have to forcibly interpret the type as unsigned anyway. Also it fixes a lot of implicit bugs in the device drivers d_mmap methods. Reviewed by: alc, markj (previous version) Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=341398
* sys: further adoption of SPDX licensing ID tags.Pedro F. Giffuni2017-11-201-0/+2
| | | | | | | | | | | | | | | | | Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point. Notes: svn path=/head/; revision=326023
* All these files need sys/vmmeter.h, but now they got it implicitlyGleb Smirnoff2017-04-171-0/+1
| | | | | | | included via sys/pcpu.h. Notes: svn path=/head/; revision=317055
* Renumber copyright clause 4Warner Losh2017-02-281-1/+1
| | | | | | | | | | | | Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96 Notes: svn path=/head/; revision=314436
* Consistently handle negative or wrapping offsets in the mmap(2) syscalls.Konstantin Belousov2017-02-121-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | For regular files and posix shared memory, POSIX requires that [offset, offset + size) range is legitimate. At the maping time, check that offset is not negative. Allowing negative offsets might expose the data that filesystem put into vm_object for internal use, esp. due to OFF_TO_IDX() signess treatment. Fault handler verifies that the mapped range is valid, assuming that mmap(2) checked that arithmetic gives no undefined results. For device mappings, leave the semantic of negative offsets to the driver. Correct object page index calculation to not erronously propagate sign. In either case, disallow overflow of offset + size. Update mmap(2) man page to explain the requirement of the range validity, and behaviour when the range becomes invalid after mapping. Reported and tested by: royger (previous version) Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=313690
* Add a new populate() pager method and extend device pager ops vectorKonstantin Belousov2016-12-081-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | with cdev_pg_populate() to provide device drivers access to it. It gives drivers fine control of the pages ownership and allows drivers to implement arbitrary prefault policies. The populate method is called on a page fault and is supposed to populate the vm object with the page at the fault location and some amount of pages around it, at pager's discretion. VM provides the pager with the hints about current range of the object mapping, to avoid instantiation of immediately unused pages, if pager decides so. Also, VM passes the fault type and map entry protection to the pager, allowing it to force the optimal required ownership of the mapped pages. Installed pages must contiguously fill the returned region, be fully valid and exclusively busied. Of course, the pages must be compatible with the object' type. After populate() successfully returned, VM fault handler installs as many instantiated pages into the process page tables as it sees reasonable, while still obeying the correct semantic for COW and vm map locking. The method is opt-in, pager sets OBJ_POPULATE flag to indicate that the method can be called. If pager' vm objects can be shadowed, pager must implement the traditional getpages() method in addition to the populate(). Populate() might fall back to the getpages() on per-call basis as well, by returning VM_PAGER_BAD error code. For now for device pagers, the populate() method is only allowed to be used by the managed device pagers, but the limitation is only made because there is no unmanaged fault handlers which could use it right now. KPI designed together with, and reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Notes: svn path=/head/; revision=309710
* Split long line instead of unindenting it. Add KASSERT() verifyingKonstantin Belousov2016-10-301-1/+4
| | | | | | | | | | | that a device object with the same handle has the same ops vector. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=308108
* Avoid duplicated calls to pmap_page_get_memattr().Konstantin Belousov2016-05-011-5/+13
| | | | | | | | | | | | | | | | | | Avoid logging inconsistency for the /dev/mem device at all. The driver leaves memattr intact, and the corrective action in the device pager handles it right. In the logged warning, name the driver we blame, and show memory attributes values. Reported and tested by: pho Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D6149 Notes: svn path=/head/; revision=298891
* vm_page_replace: add wrapper to KASSERT about old pageConrad Meyer2015-12-171-2/+1
| | | | | | | | | | | | | | | | It turns out the callers of vm_page_replace know exactly which page they are replacing and would like to assert about it. Change those from hard panics to KASSERTs, and provide them with a wrapper so they don't have to deal with warnings from an INVARIANTS-dependent dead store of the return value of vm_page_replace. Submitted by: Ryan Libby <rlibby@gmail.com> Reviewed by: alc, kib (earlier version) Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D4497 Notes: svn path=/head/; revision=292406
* A change to KPI of vm_pager_get_pages() and underlying VOP_GETPAGES().Gleb Smirnoff2015-12-161-9/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | o With new KPI consumers can request contiguous ranges of pages, and unlike before, all pages will be kept busied on return, like it was done before with the 'reqpage' only. Now the reqpage goes away. With new interface it is easier to implement code protected from race conditions. Such arrayed requests for now should be preceeded by a call to vm_pager_haspage() to make sure that request is possible. This could be improved later, making vm_pager_haspage() obsolete. Strenghtening the promises on the business of the array of pages allows us to remove such hacks as swp_pager_free_nrpage() and vm_pager_free_nonreq(). o New KPI accepts two integer pointers that may optionally point at values for read ahead and read behind, that a pager may do, if it can. These pages are completely owned by pager, and not controlled by the caller. This shifts the UFS-specific readahead logic from vm_fault.c, which should be file system agnostic, into vnode_pager.c. It also removes one VOP_BMAP() request per hard fault. Discussed with: kib, alc, jeff, scottl Sponsored by: Nginx, Inc. Sponsored by: Netflix Notes: svn path=/head/; revision=292373
* Minor cleanup.Konstantin Belousov2015-11-291-26/+12
| | | | | | | | | | | | Systematically use ANSI C functions definitions. Correct type of the flags argument to the dev_pager_putpages() function. Use vm_pager_free_nonreq(). Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=291446
* Place VM objects on the object list when created and never remove them.John Baldwin2015-05-081-0/+2
| | | | | | | | | | | | | | | | | | This is ok since objects come from a NOFREE zone and allows objects to be locked while traversing the object list without triggering a LOR. Ensure that objects on the list are marked DEAD while free or stillborn, and that they have a refcount of zero. This required updating most of the pagers to explicitly mark an object as dead when deallocating it. (Only the vnode pager did this previously.) Differential Revision: https://reviews.freebsd.org/D2423 Reviewed by: alc, kib (earlier version) MFC after: 2 weeks Sponsored by: Norse Corp, Inc. Notes: svn path=/head/; revision=282660
* Eliminate an unused variable.Alan Cox2015-04-191-2/+0
| | | | | | | MFC after: 1 week Notes: svn path=/head/; revision=281720
* Initialize paddr to handle the case of zero size.Konstantin Belousov2014-03-121-0/+1
| | | | | | | | Reported and reviewed by: Conrad Meyer <cemeyer@uw.edu> MFC after: 1 week Notes: svn path=/head/; revision=263095
* Correct assertion to assert that the existing device VM object uses theJohn Baldwin2014-02-111-2/+4
| | | | | | | | | | same type rather than asserting in the case where we just created a new VM object. Reviewed by: kib Notes: svn path=/head/; revision=261782
* Different consumers of the struct vm_page abuse pageq member to keepKonstantin Belousov2013-08-101-2/+2
| | | | | | | | | | | | | | | | | | | | additional information, when the page is guaranteed to not belong to a paging queue. Usually, this results in a lot of type casts which make reasoning about the code correctness harder. Sometimes m->object is used instead of pageq, which could cause real and confusing bugs if non-NULL m->object is leaked. See r141955 and r253140 for examples. Change the pageq member into a union containing explicitly-typed members. Use them instead of type-punning or abusing m->object in x86 pmaps, uma and vm_page_alloc_contig(). Requested and reviewed by: alc Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=254182
* On all the architectures, avoid to preallocate the physical memoryAttilio Rao2013-08-091-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | for nodes used in vm_radix. On architectures supporting direct mapping, also avoid to pre-allocate the KVA for such nodes. In order to do so make the operations derived from vm_radix_insert() to fail and handle all the deriving failure of those. vm_radix-wise introduce a new function called vm_radix_replace(), which can replace a leaf node, already present, with a new one, and take into account the possibility, during vm_radix_insert() allocation, that the operations on the radix trie can recurse. This means that if operations in vm_radix_insert() recursed vm_radix_insert() will start from scratch again. Sponsored by: EMC / Isilon storage division Reviewed by: alc (older version) Reviewed by: jeff Tested by: pho, scottl Notes: svn path=/head/; revision=254141
* Switch the vm_object mutex to be a rwlock. This will enable in theAttilio Rao2013-03-091-11/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho Notes: svn path=/head/; revision=248084
* Fix a bug in the device pager code that can trigger an assertionKenneth D. Merry2013-01-091-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | in devfs if a particular race condition is hit in the device pager code. This was a side effect of change 227530 which changed the device pager interface to call a new destructor routine for the cdev. That destructor routine, old_dev_pager_dtor(), takes a VM object handle. The object handle is cast to a struct cdev *, and passed into dev_rel(). That works in most cases, except the case in cdev_pager_allocate() where there is a race condition between two threads allocating an object backed by the same device. The loser of the race deallocates its object at the end of the function. The problem is that before inserting the object into the dev_pager_object_list, the object's handle is changed from the struct cdev pointer to the object's own address. This is to avoid conflicts with the winner of the race, which already inserted an object in the list with a handle that is a pointer to the same cdev structure. The object is then passed to vm_object_deallocate(), and eventually makes its way down to old_dev_pager_dtor(). That function passes the handle pointer (which is actually a VM object, not a struct cdev as usual) into dev_rel(). dev_rel() decrements the reference count in the assumed struct cdev (which happens to be 0), and that triggers the assertion in dev_rel() that the reference count is greater than or equal to 0. The fix is to add a cdev pointer to the VM object, and use that pointer when calling the cdev_pg_dtor() routine. vm_object.h: Add a struct cdev pointer to the VM object structure. device_pager.c: In cdev_pager_allocate(), populate the new cdev pointer. In dev_pager_dealloc(), use the new cdev pointer when calling the object's cdev_pg_dtor() routine. Reviewed by: kib Sponsored by: Spectra Logic Corporation MFC after: 1 week Notes: svn path=/head/; revision=245226
* Move the declaration of vm_phys_paddr_to_vm_page() from vm/vm_page.hKonstantin Belousov2012-11-161-0/+1
| | | | | | | | | | to vm/vm_phys.h, where it belongs. Requested and reviewed by: alc MFC after: 2 weeks Notes: svn path=/head/; revision=243132
* After the PHYS_TO_VM_PAGE() function was de-inlined, the main reasonKonstantin Belousov2012-08-051-0/+1
| | | | | | | | | | | | | | | | to pull vm_param.h was removed. Other big dependency of vm_page.h on vm_param.h are PA_LOCK* definitions, which are only needed for in-kernel code, because modules use KBI-safe functions to lock the pages. Stop including vm_param.h into vm_page.h. Include vm_param.h explicitely for the kernel code which needs it. Suggested and reviewed by: alc MFC after: 2 weeks Notes: svn path=/head/; revision=239065
* Do not double-reference the found vm object in cdev_pager_lookup().Konstantin Belousov2012-05-181-1/+0
| | | | | | | | | | | | | | vm_pager_object_lookup() already referenced the object. Note that there is no in-tree consumers of cdev_pager_lookup(). The only known user of the function is i915 gem driver, which is not yet imported. This should make the KPI change minor. Submitted by: avg MFC after: 1 week Notes: svn path=/head/; revision=235603
* Add new pager type, OBJT_MGTDEVICE. It provides the device pagerKonstantin Belousov2012-05-121-8/+46
| | | | | | | | | | | | | | | | | | | | which carries fictitous managed pages. In particular, the consumers of the new object type can remove all mappings of the device page with pmap_remove_all(). The range of physical addresses used for fake page allocation shall be registered with vm_phys_fictitious_reg_range() interface to allow the PHYS_TO_VM_PAGE() to work in pmap. Most likely, only i386 and amd64 pmaps can handle fictitious managed pages right now. Sponsored by: The FreeBSD Foundation Reviewed by: alc MFC after: 1 month Notes: svn path=/head/; revision=235375
* Update the device pager interface, while keeping the compatibilityKonstantin Belousov2011-11-151-75/+160
| | | | | | | | | | | | | | | | | | | | | | | | | | | layer for old KPI and KBI. New interface should be used together with d_mmap_single cdevsw method. Device pager can be allocated with the cdev_pager_allocate(9) function, which takes struct cdev_pager_ops, containing constructor/destructor and page fault handler methods supplied by driver. Constructor and destructor, called at the pager allocation and deallocation time, allow the driver to handle per-object private data. The pager handler is called to handle page fault on the vm map entry backed by the driver pager. Driver shall return either the vm_page_t which should be mapped, or error code (which does not cause kernel panic anymore). The page handler interface has a placeholder to specify the access mode causing the fault, but currently PROT_READ is always passed there. Sponsored by: The FreeBSD Foundation Reviewed by: alc MFC after: 1 month Notes: svn path=/head/; revision=227530
* Fix a race in the device pager allocation. If another thread won andKonstantin Belousov2011-07-301-2/+9
| | | | | | | | | | | | | | | allocated the device pager for the given handle, then the object fictitious pages list and the object membership in the global object list still need to be initialized. Otherwise, dev_pager_dealloc() will traverse uninitialized pointers. Reported and tested by: pho Reviewed by: jhb Approved by: re (kensmith) MFC after: 1 week Notes: svn path=/head/; revision=224522
* Handle a race between device_pager and devsw in a more graceful manner:Attilio Rao2011-07-061-2/+4
| | | | | | | | | | | | return an error code rather than panic the kernel. Sponsored by: Sandvine Incorporated Reviewed by: kib Tested by: pho MFC after: 2 weeks Notes: svn path=/head/; revision=223823
* Eliminate duplication of the fake page code and zone by the device and sgAlan Cox2011-03-111-61/+3
| | | | | | | | | pagers. Reviewed by: jhb Notes: svn path=/head/; revision=219476
* Explicitly initialize the page's queue field to PQ_NONE instead of relyingAlan Cox2011-01-171-0/+1
| | | | | | | | | | | | on PQ_NONE being zero. Redefine PQ_NONE and PQ_COUNT so that a page queue isn't allocated for PQ_NONE. Reviewed by: kib@ Notes: svn path=/head/; revision=217508
* Add new make_dev_p(9) flag MAKEDEV_ETERNAL to inform devfs that createdKonstantin Belousov2010-08-061-6/+7
| | | | | | | | | | | | cdev will never be destroyed. Propagate the flag to devfs vnodes as VV_ETERNVALDEV. Use the flags to avoid acquiring devmtx and taking a thread reference on such nodes. In collaboration with: pho MFC after: 1 month Notes: svn path=/head/; revision=210923
* Eliminate page queues locking around most calls to vm_page_free().Alan Cox2010-05-061-4/+0
| | | | Notes: svn path=/head/; revision=207728
* On Alan's advice, rather than do a wholesale conversion on a singleKip Macy2010-04-301-6/+13
| | | | | | | | | | | | | | | architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib Notes: svn path=/head/; revision=207410
* Update d_mmap() to accept vm_ooffset_t and vm_memattr_t.Robert Noland2009-12-291-14/+3
| | | | | | | | | | | | | | | | This replaces d_mmap() with the d_mmap2() implementation and also changes the type of offset to vm_ooffset_t. Purge d_mmap2(). All driver modules will need to be rebuilt since D_VERSION is also bumped. Reviewed by: jhb@ MFC after: Not in this lifetime... Notes: svn path=/head/; revision=201223
* Extend the device pager to support different memory attributes on differentJohn Baldwin2009-08-281-5/+15
| | | | | | | | | | | | | | | | | | pages in an object. - Add a new variant of d_mmap() currently called d_mmap2() which accepts an additional in/out parameter that is the memory attribute to use for the requested page. - A driver either uses d_mmap() or d_mmap2() for all requests but not both. The current implementation uses a flag in the cdevsw (D_MMAP2) to indicate that the driver provides a d_mmap2() handler instead of d_mmap(). This is done to make the change ABI compatible with existing drivers and MFC'able to 7 and 8. Submitted by: alc MFC after: 1 month Notes: svn path=/head/; revision=196615
* Change the handling of fictitious pages by pmap_page_set_memattr() onAlan Cox2009-07-191-12/+4
| | | | | | | | | | | | | | | | | | | | amd64 and i386. Essentially, fictitious pages provide a mechanism for creating aliases for either normal or device-backed pages. Therefore, pmap_page_set_memattr() on a fictitious page needn't update the direct map or flush the cache. Such actions are the responsibility of the "primary" instance of the page or the device driver that "owns" the physical address. For example, these actions are already performed by pmap_mapdev(). The device pager needn't restore the memory attributes on a fictitious page before releasing it. It's now pointless. Add pmap_page_set_memattr() to the Xen pmap. Approved by: re (kib) Notes: svn path=/head/; revision=195774
* Add support to the virtual memory system for configuring machine-Alan Cox2009-07-121-26/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dependent memory attributes: Rename vm_cache_mode_t to vm_memattr_t. The new name reflects the fact that there are machine-dependent memory attributes that have nothing to do with controlling the cache's behavior. Introduce vm_object_set_memattr() for setting the default memory attributes that will be given to an object's pages. Introduce and use pmap_page_{get,set}_memattr() for getting and setting a page's machine-dependent memory attributes. Add full support for these functions on amd64 and i386 and stubs for them on the other architectures. The function pmap_page_set_memattr() is also responsible for any other machine-dependent aspects of changing a page's memory attributes, such as flushing the cache or updating the direct map. The uses include kmem_alloc_contig(), vm_page_alloc(), and the device pager: kmem_alloc_contig() can now be used to allocate kernel memory with non-default memory attributes on amd64 and i386. vm_page_alloc() and the device pager will set the memory attributes for the real or fictitious page according to the object's default memory attributes. Update the various pmap functions on amd64 and i386 that map pages to incorporate each page's memory attributes in the mapping. Notes: (1) Inherent to this design are safety features that prevent the specification of inconsistent memory attributes by different mappings on amd64 and i386. In addition, the device pager provides a warning when a device driver creates a fictitious page with memory attributes that are inconsistent with the real page that the fictitious page is an alias for. (2) Storing the machine-dependent memory attributes for amd64 and i386 as a dedicated "int" in "struct md_page" represents a compromise between space efficiency and the ease of MFCing these changes to RELENG_7. In collaboration with: jhb Approved by: re (kib) Notes: svn path=/head/; revision=195649
* Implement global and per-uid accounting of the anonymous memory. AddKonstantin Belousov2009-06-231-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved for the uid. The accounting information (charge) is associated with either map entry, or vm object backing the entry, assuming the object is the first one in the shadow chain and entry does not require COW. Charge is moved from entry to object on allocation of the object, e.g. during the mmap, assuming the object is allocated, or on the first page fault on the entry. It moves back to the entry on forks due to COW setup. The per-entry granularity of accounting makes the charge process fair for processes that change uid during lifetime, and decrements charge for proper uid when region is unmapped. The interface of vm_pager_allocate(9) is extended by adding struct ucred *, that is used to charge appropriate uid when allocation if performed by kernel, e.g. md(4). Several syscalls, among them is fork(2), may now return ENOMEM when global or per-uid limits are enforced. In collaboration with: pho Reviewed by: alc Approved by: re (kensmith) Notes: svn path=/head/; revision=194766
* Validate the page in one place, dev_pager_getpages(), rather than doing itAlan Cox2009-06-221-7/+6
| | | | | | | | | in two places, dev_pager_getfake() and dev_pager_updatefake(). Compare a pointer to "NULL" rather than "0". Notes: svn path=/head/; revision=194642
* Strive for greater consistency among the places that implement real,Alan Cox2009-06-211-0/+1
| | | | | | | | fictious, and contiguous page allocation. Eliminate unnecessary reinitialization of a page's fields. Notes: svn path=/head/; revision=194562
* Save previous content of the td_fpop before storing the currentKonstantin Belousov2008-09-261-0/+6
| | | | | | | | | | | | | | | | | | filedescriptor into it. Make sure that td_fpop is NULL when calling d_mmap from dev_pager_getpages(). Change guards against td_fpop field being non-NULL with private state for another device, and against sudden clearing the td_fpop. This could occur when either a driver method calls another driver through the filedescriptor operation, or a page fault happen while driver is writing to a memory backed by another driver. Noted by: rwatson Tested by: rnoland MFC after: 3 days Notes: svn path=/head/; revision=183383
* Preset a device object's alignment ("pg_color") based upon theAlan Cox2008-05-171-1/+5
| | | | | | | | | physical address of the device's memory. This enables pmap_align_superpage() to propose a virtual address for mapping the device memory that permits the use of superpage mappings. Notes: svn path=/head/; revision=179074
* Remove comment that is no longer quite true.Konstantin Belousov2007-08-181-3/+0
| | | | | | | | Noted by: alc Approved by: re (kensmith) Notes: svn path=/head/; revision=171889
* Protect the creation of the device pager with the dev_pager_mtx. LookupKonstantin Belousov2007-08-071-12/+24
| | | | | | | | | | | | | | of device pager in the pagers list by handle is now synchronized with its removal from the list, and dev_pager_mtx is put before vm object lock in lock order. Dispose the dev_pager_sx lock, since dev_pager_mtx now covers the same block. Noted by: kensmith Reviewed by: alc Approved by: re (kensmith) Notes: svn path=/head/; revision=171779
* Consider a scenario in which one processor, call it Pt, is performingAlan Cox2007-08-051-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vm_object_terminate() on a device-backed object at the same time that another processor, call it Pa, is performing dev_pager_alloc() on the same device. The problem is that vm_pager_object_lookup() should not be allowed to return a doomed object, i.e., an object with OBJ_DEAD set, but it does. In detail, the unfortunate sequence of events is: Pt in vm_object_terminate() holds the doomed object's lock and sets OBJ_DEAD on the object. Pa in dev_pager_alloc() holds dev_pager_sx and calls vm_pager_object_lookup(), which returns the doomed object. Next, Pa calls vm_object_reference(), which requires the doomed object's lock, so Pa waits for Pt to release the doomed object's lock. Pt proceeds to the point in vm_object_terminate() where it releases the doomed object's lock. Pa is now able to complete vm_object_reference() because it can now complete the acquisition of the doomed object's lock. So, now the doomed object has a reference count of one! Pa releases dev_pager_sx and returns the doomed object from dev_pager_alloc(). Pt now acquires dev_pager_mtx, removes the doomed object from dev_pager_object_list, releases dev_pager_mtx, and finally calls uma_zfree with the doomed object. However, the doomed object is still in use by Pa. Repeating my key point, vm_pager_object_lookup() must not return a doomed object. Moreover, the test for the object's state, i.e., doomed or not, and the increment of the object's reference count should be carried out atomically. Reviewed by: kib Approved by: re (kensmith) MFC after: 3 weeks Notes: svn path=/head/; revision=171737
* Do not acquire Giant unconditionally around the calls to the cdevswKonstantin Belousov2007-08-051-5/+0
| | | | | | | | | | | | d_mmap methods. prep_cdevsw() already installs the shims that acquire/drop Giant for the methods of a driver that specified the D_NEEDGIANT flag. Reviewed by: alc Approved by: re (kensmith) Notes: svn path=/head/; revision=171725