summaryrefslogtreecommitdiff
path: root/sys/fs/devfs/devfs_vnops.c
Commit message (Collapse)AuthorAgeFilesLines
* devfs: rework si_usecount to track opensMateusz Guzik2020-08-111-16/+144
| | | | | | | | | | | This removes a lot of special casing from the VFS layer. Reviewed by: kib (previous version) Tested by: pho (previous version) Differential Revision: https://reviews.freebsd.org/D25612 Notes: svn path=/head/; revision=364113
* devfs: save on spurious relocking for devfs_populateMateusz Guzik2020-08-101-0/+6
| | | | | | | Tested by: pho Notes: svn path=/head/; revision=364069
* devfs: use cheaper lockmgr entry pointsMateusz Guzik2020-08-101-0/+6
| | | | | | | Tested by: pho Notes: svn path=/head/; revision=364068
* devfs: use vget_prep/vget_finishMateusz Guzik2020-08-101-7/+8
| | | | | | | Tested by: pho Notes: svn path=/head/; revision=364067
* vfs: remove the obsolete privused argument from vaccessMateusz Guzik2020-08-051-1/+1
| | | | | | | | This brings argument count down to 6, which is passable without the stack on amd64. Notes: svn path=/head/; revision=363893
* devfs: fix a vnode use-after-free in devfs_ioctlMateusz Guzik2020-07-041-8/+9
| | | | | | | | | | The vnode to be replaced was read with a shared lock, meaning 2 racing threads can find the same one. While here clean it up a little bit. Notes: svn path=/head/; revision=362923
* vfs: track sequential reads and writes separatelyThomas Munro2020-06-211-2/+2
| | | | | | | | | | | | | | | | | | For software like PostgreSQL and SQLite that sometimes reads sequentially while also writing sequentially some distance behind with interleaved syscalls on the same fd, performance is better on UFS if we do sequential access heuristics separately for reads and writes. Patch originally by Andrew Gierth in 2008, updated and proposed by me with his permission. Reviewed by: mjg, kib, tmunro Approved by: mjg (mentor) Obtained from: Andrew Gierth <andrew@tao11.riddles.org.uk> Differential Revision: https://reviews.freebsd.org/D25024 Notes: svn path=/head/; revision=362460
* Fix up various vnode-related asserts which did not dump the used vnodeMateusz Guzik2020-02-031-2/+1
| | | | Notes: svn path=/head/; revision=357446
* Provide O_SEARCHKyle Evans2020-02-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | O_SEARCH is defined by POSIX [0] to open a directory for searching, skipping permissions checks on the directory itself after the initial open(). This is close to the semantics we've historically applied for O_EXEC on a directory, which is UB according to POSIX. Conveniently, O_SEARCH on a file is also explicitly undefined behavior according to POSIX, so O_EXEC would be a fine choice. The spec goes on to state that O_SEARCH and O_EXEC need not be distinct values, but they're not defined to be the same value. This was pointed out as an incompatibility with other systems that had made its way into libarchive, which had assumed that O_EXEC was an alias for O_SEARCH. This defines compatibility O_SEARCH/FSEARCH (equivalent to O_EXEC and FEXEC respectively) and expands our UB for O_EXEC on a directory. O_EXEC on a directory is checked in vn_open_vnode already, so for completeness we add a NOEXECCHECK when O_SEARCH has been specified on the top-level fd and do not re-check that when descending in namei. [0] https://pubs.opengroup.org/onlinepubs/9699919799/ Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23247 Notes: svn path=/head/; revision=357412
* vfs: consistently use size_t for buflen around VOP_VPTOCNPMateusz Guzik2020-02-011-1/+1
| | | | Notes: svn path=/head/; revision=357383
* vfs: drop the mostly unused flags argument from VOP_UNLOCKMateusz Guzik2020-01-031-9/+9
| | | | | | | | | | | Filesystems which want to use it in limited capacity can employ the VOP_UNLOCK_FLAGS macro. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D21427 Notes: svn path=/head/; revision=356337
* vfs: flatten vop vectorsMateusz Guzik2019-12-161-0/+2
| | | | | | | | | | | | | | | This eliminates the following loop from all VOP calls: while(vop != NULL && \ vop->vop_spare2 == NULL && vop->vop_bypass == NULL) vop = vop->vop_default; Reviewed by: jeff Tesetd by: pho Differential Revision: https://reviews.freebsd.org/D22738 Notes: svn path=/head/; revision=355790
* vfs: introduce v_irflag and make v_type smallerMateusz Guzik2019-12-081-5/+5
| | | | | | | | | | | | | | | | | | The current vnode layout is not smp-friendly by having frequently read data avoidably sharing cachelines with very frequently modified fields. In particular v_iflag inspected for VI_DOOMED can be found in the same line with v_usecount. Instead make it available in the same cacheline as the v_op, v_data and v_type which all get read all the time. v_type is avoidably 4 bytes while the necessary data will easily fit in 1. Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new flag field with a new value: VIRF_DOOMED. Reviewed by: kib, jeff Differential Revision: https://reviews.freebsd.org/D22715 Notes: svn path=/head/; revision=355537
* tty: implement TIOCNOTTYKyle Evans2019-11-301-2/+9
| | | | | | | | | | | | | | | | | | Generally, it's preferred that an application fork/setsid if it doesn't want to keep its controlling TTY, but it could be that a debugger is trying to steal it instead -- so it would hook in, drop the controlling TTY, then do some magic to set things up again. In this case, TIOCNOTTY is quite handy and still respected by at least OpenBSD, NetBSD, and Linux as far as I can tell. I've dropped the note about obsoletion, as I intend to support TIOCNOTTY as long as it doesn't impose a major burden. Reviewed by: bcr (manpages), kib Differential Revision: https://reviews.freebsd.org/D22572 Notes: svn path=/head/; revision=355248
* vfs: change si_usecount management to count used vnodesMateusz Guzik2019-11-201-8/+8
| | | | | | | | | | | | | | | | | | | | Currently si_usecount is effectively a sum of usecounts from all associated vnodes. This is maintained by special-casing for VCHR every time usecount is modified. Apart from complicating the code a little bit, it has a scalability impact since it forces a read from a cacheline shared with said count. There are no consumers of the feature in the ports tree. In head there are only 2: revoke and devfs_close. Both can get away with a weaker requirement than the exact usecount, namely just the count of active vnodes. Changing the meaning to the latter means we only need to modify it on 0<->1 transitions, avoiding the check plenty of times (and entirely in something like vrefact). Reviewed by: kib, jeff Tested by: pho Differential Revision: https://reviews.freebsd.org/D22202 Notes: svn path=/head/; revision=354890
* devfs_vptocnp(): correct the component name when node is not at top.Konstantin Belousov2019-10-111-27/+16
| | | | | | | | | | | | | | | | Node' cdp.si_name is the full path as provided by make_dev(9), it should not be returned by VOP_VPTOCNP() when only the last component is requested. Use the dirent entry instead. With this note, handling of VDIR and VCHR nodes only differs in handling of root vnode, which simplifies and unifies the logic. Reported by: Li, Zhichao1 <Zhichao_Li1@Dell.com> Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=353447
* devfs: plug redundant bwillwrite avoidanceMateusz Guzik2019-10-051-11/+0
| | | | | | | | | | | | | vn_write already checks for vnode type to see if bwillwrite should be called. This effectively reverts r244643. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21905 Notes: svn path=/head/; revision=353126
* Rework v_object lifecycle for vnodes.Konstantin Belousov2019-08-291-1/+0
| | | | | | | | | | | | | | | | | | | | | | | Current implementation of vnode_create_vobject() and vnode_destroy_vobject() is written so that it prepared to handle the vm object destruction for live vnode. Practically, no filesystems use this, except for some remnants that were present in UFS till today. One of the consequences of that model is that each filesystem must call vnode_destroy_vobject() in VOP_RECLAIM() or earlier, as result all of them get rid of the v_object in reclaim. Move the call to vnode_destroy_vobject() to vgonel() before VOP_RECLAIM(). This makes v_object stable: either the object is NULL, or it is valid vm object till the vnode reclamation. Remove code from vnode_create_vobject() to handle races with the parallel destruction. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D21412 Notes: svn path=/head/; revision=351598
* Avoid relying on header pollution from sys/refcount.h.Mark Johnston2019-07-291-0/+1
| | | | | | | | MFC after: 3 days Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=350421
* Extract eventfilter declarations to sys/_eventfilter.hConrad Meyer2019-05-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h" in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header pollution substantially. EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c files into appropriate headers (e.g., sys/proc.h, powernv/opal.h). As a side effect of reduced header pollution, many .c files and headers no longer contain needed definitions. The remainder of the patch addresses adding appropriate includes to fix those files. LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by sys/mutex.h since r326106 (but silently protected by header pollution prior to this change). No functional change (intended). Of course, any out of tree modules that relied on header pollution for sys/eventhandler.h, sys/lock.h, or sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped. Notes: svn path=/head/; revision=347984
* Add d_off support for multiple filesystems.Konstantin Belousov2018-11-141-0/+2
| | | | | | | | | | | | | | | | | | | | | | | The d_off field has been added to the dirent structure recently. Currently filesystems don't support this feature. Support has been added and tested for zfs, ufs, ext2fs, fdescfs, msdosfs and unionfs. A stub implementation is available for cd9660, nandfs, udf and pseudofs but hasn't been tested. Motivation for this feature: our usecase is for a userspace nfs server (nfs-ganesha) with zfs. At the moment we cache direntry offsets by calling lseek once per entry, with this patch we can get the offset directly from getdirentries(2) calls which provides a significant speedup. Submitted by: Jack Halford <jack@gandi.net> Reviewed by: mckusick, pfg, rmacklem (previous versions) Sponsored by: Gandi.net MFC after: 1 week Differential revision: https://reviews.freebsd.org/D17917 Notes: svn path=/head/; revision=340431
* Move 32-bit compat support for FIODGNAME to the right place.Brooks Davis2018-10-261-8/+34
| | | | | | | | | | | | | | | | | | | ioctl(2) commands only have meaning in the context of a file descriptor so translating them in the syscall layer is incorrect. The new handler users an accessor to retrieve/construct a pointer from the last member of the passed structure and relies on type punning to access the other member which requires no translation. Unlike r339174 this change supports both places FIODGNAME is handled. Reviewed by: kib Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D17475 Notes: svn path=/head/; revision=339779
* Revert r339174: Move 32-bit compat support for FIODGNAME to the right place.Brooks Davis2018-10-041-42/+8
| | | | | | | | | | A case was missed in this commit which breaks sshing into a 32-bit sshd on a 64-bit system. Approved by: re (gjb) Notes: svn path=/head/; revision=339186
* Move 32-bit compat support for FIODGNAME to the right place.Brooks Davis2018-10-031-8/+42
| | | | | | | | | | | | | | | | | | ioctl(2) commands only have meaning in the context of a file descriptor so translating them in the syscall layer is incorrect. The new handler users an accessor to retrieve/construct a pointer from the last member of the passed structure and relies on type punning to access the other member which requires no translation. Reviewed by: kib Approved by: re (rgrimes, gjb) Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Review: https://reviews.freebsd.org/D17388 Notes: svn path=/head/; revision=339174
* Report INT_MAX for LINK_MAX for devfs' VOP_PATHCONF().John Baldwin2017-12-191-1/+1
| | | | | | | | | | devfs uses int's for link counts internally and already reports the the full link count via stat() post ino64. Sponsored by: Chelsio Communications Notes: svn path=/head/; revision=326996
* Handle _PC_FILESIZEBITS and _PC_SYMLINK_MAX for devfs' VOP_PATHCONF().John Baldwin2017-12-191-0/+6
| | | | | | | | MFC after: 1 month Sponsored by: Chelsio Communications Notes: svn path=/head/; revision=326994
* Move NAME_MAX, LINK_MAX, and CHOWN_RESTRICTED out of vop_stdpathconf().John Baldwin2017-12-191-0/+9
| | | | | | | | | | | | | | | | | | | Having all filesystems fall through to default values isn't always correct and these values can vary for different filesystem implementations. Most of these changes just use the existing default values with a few exceptions: - Don't report CHOWN_RESTRICTED for ZFS since it doesn't do the exact permissions check this claims for chown(). - Use NANDFS_NAME_LEN for NAME_MAX for nandfs. - Don't report a LINK_MAX of 0 on smbfs. Now fail with EINVAL to indicate hard links aren't supported. Requested by: bde (though perhaps not this exact implementation) Reviewed by: kib (earlier version) MFC after: 1 month Sponsored by: Chelsio Communications Notes: svn path=/head/; revision=326993
* In devfs_lookupx() dotdot lookup case, avoid dereferencingKonstantin Belousov2017-12-141-5/+6
| | | | | | | | | | | | | | | | | | | dvp->v_mount after dvp is unlocked. The vnode might be reclaimed after unlock, so v_mount becomes NULL. Cache the struct mount pointer before the unlock, the struct is type-stable. Note that devfs_allocv() reads mp->mnt_data but does not operate on it further when dirent is doomed. The unmount cannot proceed until all dirents are reclaimed. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=326851
* sys/fs: further adoption of SPDX licensing ID tags.Pedro F. Giffuni2017-11-271-0/+2
| | | | | | | | | | | | | | | Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Notes: svn path=/head/; revision=326268
* Only handle _PC_MAX_CANON, _PC_MAX_INPUT, and _PC_VDISABLE for TTY devices.John Baldwin2017-09-211-0/+18
| | | | | | | | | | | | | Move handling of these three pathconf() variables out of vop_stdpathconf() and into devfs_pathconf() as TTY devices can only be devfs files. In addition, only return settings for these three variables for devfs devices whose device switch has the D_TTY flag set. Discussed with: bde, kib Sponsored by: Chelsio Communications Notes: svn path=/head/; revision=323882
* Commit the 64-bit inode project.Konstantin Belousov2017-05-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend the ino_t, dev_t, nlink_t types to 64-bit ints. Modify struct dirent layout to add d_off, increase the size of d_fileno to 64-bits, increase the size of d_namlen to 16-bits, and change the required alignment. Increase struct statfs f_mntfromname[] and f_mntonname[] array length MNAMELEN to 1024. ABI breakage is mitigated by providing compatibility using versioned symbols, ingenious use of the existing padding in structures, and by employing other tricks. Unfortunately, not everything can be fixed, especially outside the base system. For instance, third-party APIs which pass struct stat around are broken in backward and forward incompatible ways. Kinfo sysctl MIBs ABI is changed in backward-compatible way, but there is no general mechanism to handle other sysctl MIBS which return structures where the layout has changed. It was considered that the breakage is either in the management interfaces, where we usually allow ABI slip, or is not important. Struct xvnode changed layout, no compat shims are provided. For struct xtty, dev_t tty device member was reduced to uint32_t. It was decided that keeping ABI compat in this case is more useful than reporting 64-bit dev_t, for the sake of pstat. Update note: strictly follow the instructions in UPDATING. Build and install the new kernel with COMPAT_FREEBSD11 option enabled, then reboot, and only then install new world. Credits: The 64-bit inode project, also known as ino64, started life many years ago as a project by Gleb Kurtsou (gleb). Kirk McKusick (mckusick) then picked up and updated the patch, and acted as a flag-waver. Feedback, suggestions, and discussions were carried by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles), and Rick Macklem (rmacklem). Kris Moore (kris) performed an initial ports investigation followed by an exp-run by Antoine Brodin (antoine). Essential and all-embracing testing was done by Peter Holm (pho). The heavy lifting of coordinating all these efforts and bringing the project to completion were done by Konstantin Belousov (kib). Sponsored by: The FreeBSD Foundation (emaste, kib) Differential revision: https://reviews.freebsd.org/D10439 Notes: svn path=/head/; revision=318736
* Simplify devfs_fsync() by removing it. This might also be a minorEdward Tomasz Napierala2017-02-201-27/+1
| | | | | | | | | | | | | optimization, as vn_isdisk() needs to lock a global mutex. Reviewed by: imp Tested by: pho MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9628 Notes: svn path=/head/; revision=313994
* Apply noexec mount option for mmap(PROT_EXEC).Konstantin Belousov2017-02-191-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | Right now the noexec mount option disallows image activators to try execve the files on the mount point. Also, after r127187, noexec also limits max_prot map entries permissions for mappings of files from such mounts, but not the actual mapping permissions. As result, the API behaviour is inconsistent. The files from noexec mount can be mapped with PROT_EXEC, but if mprotect(2) drops execution permission, it cannot be re-enabled later. Make this consistent logically and aligned with behaviour of other systems, by disallowing PROT_EXEC for mmap(2). Note that this change only ensures aligned results from mmap(2) and mprotect(2), it does not prevent actual code execution from files coming from noexec mount. Such files can always be read into anonymous executable memory and executed from there. Reported by: shamaz.mazum@gmail.com PR: 217062 Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=313967
* Change the "devfs_fsync: vop_stdfsync failed" from panic to a printf.Edward Tomasz Napierala2017-02-151-1/+1
| | | | | | | | | | | | | It's not a proper fix, but should be better than what we have now. Since it got broken some six months ago it results in an incredibly annoying and trivially reproducible panic every time eg an USB disk gets disconnected. MFC after: 2 weeks Sponsored by: DARPA, AFRL Notes: svn path=/head/; revision=313775
* Hide the boottime and bootimebin globals, provide the getboottime(9)Konstantin Belousov2016-07-271-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | and getboottimebin(9) KPI. Change consumers of boottime to use the KPI. The variables were renamed to avoid shadowing issues with local variables of the same name. Issue is that boottime* should be adjusted from tc_windup(), which requires them to be members of the timehands structure. As a preparation, this commit only introduces the interface. Some uses of boottime were found doubtful, e.g. NLM uses boottime to identify the system boot instance. Arguably the identity should not change on the leap second adjustment, but the commit is about the timekeeping code and the consumers were kept bug-to-bug compatible. Tested by: pho (as part of the bigger patch) Reviewed by: jhb (same) Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 month X-Differential revision: https://reviews.freebsd.org/D7302 Notes: svn path=/head/; revision=303382
* devfs: Move most ioctl logic down to vnode layerConrad Meyer2016-07-251-24/+41
| | | | | | | | | | | Devfs' file layer ioctl is now just a thin shim around the vnode layer. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D7286 Notes: svn path=/head/; revision=303310
* Another follow-up to r291460. Only access vp->v_rdev for VCHR vnodesKonstantin Belousov2016-06-151-13/+20
| | | | | | | | | | | | in devfs_reclaim(). Reported and tested by: pho Sponsored by: The FreeBSD Foundation Approved by: re (gjb) MFC after: 1 week Notes: svn path=/head/; revision=301928
* sys/devfs: unsign an index to prevent signed integer overflow.Pedro F. Giffuni2016-04-281-1/+1
| | | | | | | | | | cdp_maxdirent in struct:cdev_priv is of type u_int. Use the same type for the corresponding index in devfs_revoke(). MFC after: 1 week Notes: svn path=/head/; revision=298732
* Assert that the linkage between struct cdev_privdata and and structKonstantin Belousov2016-01-171-0/+2
| | | | | | | | | | | file is consistent. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=294204
* Make devfs_fpdrop() static. It was not a public KPI, and it has noKonstantin Belousov2016-01-131-1/+1
| | | | | | | | | | reason to remain exported for some time. Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=293827
* Hide transient EBADF errors caused by the parallel revoke(2) or forcedKonstantin Belousov2016-01-021-3/+3
| | | | | | | | | | | | | | | | unmount of devfs mounts, by restarting the failed syscall. When restarted, failing syscalls eventually either stop finding the node and returning ENOENT, or the vnode op vectors finally transition to the deadfs vop. The later return EIO or other error, more appropriate for the operation. Submitted by: bde Tested by: pho MFC after: 3 weeks Notes: svn path=/head/; revision=293059
* Minor style cleanup.Konstantin Belousov2016-01-011-1/+1
| | | | | | | | Submitted by: bde MFC after: 1 week Notes: svn path=/head/; revision=293042
* Make it possible for the cdevsw d_close() driver method to detect lastKonstantin Belousov2015-12-221-3/+9
| | | | | | | | | | | | | | | | | | | close and close due to revoke(2)-like operation. A new FLASTCLOSE flag indicates that this is last close. FREVOKE is set for revokes, and FNONBLOCK is also set, same as is already done for VOP_CLOSE() call from vgonel(). The flags reuse user open(2) flags which are never stored in f_flag, to not consume bit space in the ABI visible way. Assert this with the static check. Requested and reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=292624
* Keep devfs mount locked for the whole duration of the devfs_setattr(),Konstantin Belousov2015-12-221-7/+14
| | | | | | | | | | | and ensure that our dirent is instantiated. Reported and tested by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=292621
* The cdevpriv_dtr_t typedef was not able to be used in a function prototypeJohn Baldwin2015-12-021-1/+1
| | | | | | | | | | | | | | | | | | like the various d_*_t typedefs since it declared a function pointer rather than a function. Add a new d_priv_dtor_t typedef that declares the function and can be used as a function prototype. The previous typedef wasn't useful outside of the cdevpriv implementation, so retire it. The name d_priv_dtor_t was chosen to be more consistent with cdev methods since it is commonly used in place of d_close_t even though it is not a direct pointer in struct cdevsw. Reviewed by: kib, imp MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D4340 Notes: svn path=/head/; revision=291653
* After r286237 it should be fine to call vgone(9) on a busy GEOM vnode;Edward Tomasz Napierala2015-08-231-1/+2
| | | | | | | | | | remove KASSERT that would prevent forced devfs unmount from working. MFC after: 1 month Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=287033
* The changes that introduced fo_mmap() treated all character deviceJohn Baldwin2015-08-061-6/+17
| | | | | | | | | | | | | | | | | mappings as if MAP_SHARED was always present since in general MAP_PRIVATE is not permitted for character devices. However, there is one exception in that MAP_PRIVATE mappings are permitted for /dev/zero. Only require a writable file descriptor (FWRITE) for shared, writable mappings of character devices. vm_mmap_cdev() will reject any private mappings for other devices. Reviewed by: kib Reported by: sbruno (broke qemu cross-builds), peter Differential Revision: https://reviews.freebsd.org/D3316 Notes: svn path=/head/; revision=286371
* Add a new file operations hook for mmap operations. File type-specificJohn Baldwin2015-06-041-0/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | logic is now placed in the mmap hook implementation rather than requiring it to be placed in sys/vm/vm_mmap.c. This hook allows new file types to support mmap() as well as potentially allowing mmap() for existing file types that do not currently support any mapping. The vm_mmap() function is now split up into two functions. A new vm_mmap_object() function handles the "back half" of vm_mmap() and accepts a referenced VM object to map rather than a (handle, handle_type) tuple. vm_mmap() is now reduced to converting a (handle, handle_type) tuple to a a VM object and then calling vm_mmap_object() to handle the actual mapping. The vm_mmap() function remains for use by other parts of the kernel (e.g. device drivers and exec) but now only supports mapping vnodes, character devices, and anonymous memory. The mmap() system call invokes vm_mmap_object() directly with a NULL object for anonymous mappings. For mappings using a file descriptor, the descriptors fo_mmap() hook is invoked instead. The fo_mmap() hook is responsible for performing type-specific checks and adjustments to arguments as well as possibly modifying mapping parameters such as flags or the object offset. The fo_mmap() hook routines then call vm_mmap_object() to handle the actual mapping. The fo_mmap() hook is optional. If it is not set, then fo_mmap() will fail with ENODEV. A fo_mmap() hook is implemented for regular files, character devices, and shared memory objects (created via shm_open()). While here, consistently use the VM_PROT_* constants for the vm_prot_t type for the 'prot' variable passed to vm_mmap() and vm_mmap_object() as well as the vm_mmap_vnode() and vm_mmap_cdev() helper routines. Previously some places were using the mmap()-specific PROT_* constants instead. While this happens to work because PROT_xx == VM_PROT_xx, using VM_PROT_* is more correct. Differential Revision: https://reviews.freebsd.org/D2658 Reviewed by: alc (glanced over), kib MFC after: 1 month Sponsored by: Chelsio Notes: svn path=/head/; revision=283998
* Refine r280308. Do not completely disable timestamping of devfs nodesKonstantin Belousov2015-04-011-7/+25
| | | | | | | | | | | | | | | | | | on reads or writes, the time marks are used to display idle time by w(1) [1]. Instead, use vfs.devfs.dotimes as the selector of default precision vs. using time_second. The later gives seconds precision, which is good enough for the purpose. Note that timestamp updates are unlocked and the updates itself, as well as the check in devfs_timestamp, are non-atomic. Noted by: truckman [1] Reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=280949
* Disable timestamping on devfs read/write operations by default.Xin LI2015-03-211-2/+11
| | | | | | | | | | | | | | | | | | | | | Currently we update timestamps unconditionally when doing read or write operations. This may slow things down on hardware where reading timestamps is expensive (e.g. HPET, because of the default vfs.timestamp_precision setting is nanosecond now) with limited benefit. A new sysctl variable, vfs.devfs.dotimes is added, which can be set to non-zero value when the old behavior is desirable. Differential Revision: https://reviews.freebsd.org/D2104 Reported by: Mike Tancsa <mike sentex net> Reviewed by: kib Relnotes: yes Sponsored by: iXsystems, Inc. MFC after: 2 weeks Notes: svn path=/head/; revision=280308