summaryrefslogtreecommitdiff
path: root/sys/kern
Commit message (Collapse)AuthorAgeFilesLines
* Build on Jeff Roberson's linker-set based dynamic per-CPU allocatorRobert Watson2009-07-148-226/+119
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith) Notes: svn path=/head/; revision=195699
* Add support to the virtual memory system for configuring machine-Alan Cox2009-07-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dependent memory attributes: Rename vm_cache_mode_t to vm_memattr_t. The new name reflects the fact that there are machine-dependent memory attributes that have nothing to do with controlling the cache's behavior. Introduce vm_object_set_memattr() for setting the default memory attributes that will be given to an object's pages. Introduce and use pmap_page_{get,set}_memattr() for getting and setting a page's machine-dependent memory attributes. Add full support for these functions on amd64 and i386 and stubs for them on the other architectures. The function pmap_page_set_memattr() is also responsible for any other machine-dependent aspects of changing a page's memory attributes, such as flushing the cache or updating the direct map. The uses include kmem_alloc_contig(), vm_page_alloc(), and the device pager: kmem_alloc_contig() can now be used to allocate kernel memory with non-default memory attributes on amd64 and i386. vm_page_alloc() and the device pager will set the memory attributes for the real or fictitious page according to the object's default memory attributes. Update the various pmap functions on amd64 and i386 that map pages to incorporate each page's memory attributes in the mapping. Notes: (1) Inherent to this design are safety features that prevent the specification of inconsistent memory attributes by different mappings on amd64 and i386. In addition, the device pager provides a warning when a device driver creates a fictitious page with memory attributes that are inconsistent with the real page that the fictitious page is an alias for. (2) Storing the machine-dependent memory attributes for amd64 and i386 as a dedicated "int" in "struct md_page" represents a compromise between space efficiency and the ease of MFCing these changes to RELENG_7. In collaboration with: jhb Approved by: re (kib) Notes: svn path=/head/; revision=195649
* The control terminal revocation at the session leader exit does notKonstantin Belousov2009-07-091-3/+4
| | | | | | | | | | | | | correctly checks for reclaimed vnode, possibly calling VOP_REVOKE for such vnode. If the terminal is already revoked, or devfs mount was forcibly unmounted, the revocation of doomed ctty vnode causes panic. Reported and tested by: lstewart Approved by: re (kensmith) MFC after: 2 weeks Notes: svn path=/head/; revision=195509
* Remove crcopy call from seteuid now that it calls crcopysafe.Jamie Gritton2009-07-081-1/+0
| | | | | | | | Reviewed by: brooks Approved by: re (kib), bz (mentor) Notes: svn path=/head/; revision=195477
* Regenerate after lpathconf(2) addition.Edward Tomasz Napierala2009-07-083-2/+25
| | | | | | | Approved by: re (kib) Notes: svn path=/head/; revision=195459
* There is an optimization in chmod(1), that makes it not to call chmod(2)Edward Tomasz Napierala2009-07-082-4/+24
| | | | | | | | | | | | | | | | if the new file mode is the same as it was before; however, this optimization must be disabled for filesystems that support NFSv4 ACLs. Chmod uses pathconf(2) to determine whether this is the case - however, pathconf(2) always follows symbolic links, while the 'chmod -h' doesn't. This change adds lpathconf(3) to make it possible to solve that problem in a clean way. Reviewed by: rwatson (earlier version) Approved by: re (kib) Notes: svn path=/head/; revision=195458
* Fix regressions in return events of poll() on TTYs.Ed Schouten2009-07-082-11/+8
| | | | | | | | | | | | As pointed out, POLLHUP should be generated, even if it hasn't been specified on input. It is also not allowed to return both POLLOUT and POLLHUP at the same time. Reported by: jilles Approved by: re (kib) Notes: svn path=/head/; revision=195444
* Increase HZ_VM from 10 to 100. While 10 hz saves cpu timeMike Silbersack2009-07-081-1/+1
| | | | | | | | | | | under VM environments, it's too slow for FreeBSD to work properly. For example, ping at 10hz pings about every 600ms instead of about every second. Approved by: re (kib) Notes: svn path=/head/; revision=195430
* Fix poll(2) and select(2) for named pipes to return "ready for read"Konstantin Belousov2009-07-072-15/+19
| | | | | | | | | | | | | | | | | | | | when all writers, observed by reader, exited. Use writer generation counter for fifo, and store the snapshot of the fifo generation in the f_seqcount field of struct file, that is otherwise unused for fifos. Set FreeBSD-undocumented POLLINIGNEOF flag only when file f_seqcount is equal to fifo' fi_wgen, and revert r89376. Fix POLLINIGNEOF for sockets and pipes, and return POLLHUP for them. Note that the patch does not fix not returning POLLHUP for fifos. PR: kern/94772 Submitted by: bde (original version) Reviewed by: rwatson, jilles Approved by: re (kensmith) MFC after: 6 weeks (might be) Notes: svn path=/head/; revision=195423
* In vn_vget_ino() and their inline equivalents, mnt_ref() the mount pointKonstantin Belousov2009-07-021-0/+2
| | | | | | | | | | | | | | around the sequence that drop vnode lock and then busies the mount point. Not having vlocked node or direct reference to the mp allows for the forced unmount to proceed, making mp unmounted or reused. Tested by: pho Reviewed by: jeff Approved by: re (kensmith) MFC after: 2 weeks Notes: svn path=/head/; revision=195294
* Call prison_check from vfs_suser rather than re-implementing it.Jamie Gritton2009-07-021-2/+1
| | | | | | | Approved by: re (kib), bz (mentor) Notes: svn path=/head/; revision=195285
* Audit file descriptor and command arguments to ioctl(2).Robert Watson2009-07-021-0/+2
| | | | | | | | Approved by: re (audit argument blanket) MFC after: 1 week Notes: svn path=/head/; revision=195281
* Clean up a number of aspects of token generation from audit arguments toRobert Watson2009-07-021-1/+0
| | | | | | | | | | | | | | | | | | | system calls: - Centralize generation of argument tokens for VM addresses in a macro, ADDR_TOKEN(), and properly encode 64-bit addresses in 64-bit arguments. - Fix up argument numbers across a large number of syscalls so that they match the numeric argument into the system call. - Don't audit the address argument to ioctl(2) or ptrace(2), but do keep generating tokens for mmap(2), minherit(2), since they relate to passing object access across execve(2). Approved by: re (audit argument blanket) Obtained from: TrustedBSD Project MFC after: 1 week Notes: svn path=/head/; revision=195280
* For access(2) and eaccess(2), audit the requested access mode.Robert Watson2009-07-011-0/+1
| | | | | | | | Approved by: re (audit argument blanket) MFC after: 3 days Notes: svn path=/head/; revision=195267
* - Use fd_lastfile + 1 as the upper bound on nd. This is more correct thanJeff Roberson2009-07-011-6/+8
| | | | | | | | | | | | | | | | using the size of the descriptor array. - A lock is not needed to fetch fd_lastfile. The results are stale the instant it is dropped. - Use a private mutex pool for select since the pool mutex is not used as a leaf. - Fetch the si_mtx pointer first before resorting to hashing to compute the mutex address. Reviewed by: McKusick Approved by: re (kib) Notes: svn path=/head/; revision=195259
* Audit file descriptor numbers for various socket-related system calls.Robert Watson2009-07-011-0/+17
| | | | | | | | Approved by: re (audit argument blanket) MFC after: 3 days Notes: svn path=/head/; revision=195255
* Define missing audit argument macro AUDIT_ARG_SOCKET(), andRobert Watson2009-07-011-0/+3
| | | | | | | | | | | capture the domain, type, and protocol arguments to socket(2) and socketpair(2). Approved by: re (audit argument blanket) MFC after: 3 days Notes: svn path=/head/; revision=195252
* Improve the handling of cpuset with interrupts.John Baldwin2009-07-011-3/+23
| | | | | | | | | | | | | | | | | | | | | | | | - For x86, change the interrupt source method to assign an interrupt source to a specific CPU to return an error value instead of void, thus allowing it to fail. - If moving an interrupt to a CPU fails due to a lack of IDT vectors in the destination CPU, fail the request with ENOSPC rather than panicing. - For MSI interrupts on x86 (but not MSI-X), only allow cpuset to be used on the first interrupt in a group. Moving the first interrupt in a group moves the entire group. - Use the icu_lock to protect intr_next_cpu() on x86 instead of the intr_table_lock to fix a LOR introduced in the last set of MSI changes. - Add a new privilege PRIV_SCHED_CPUSET_INTR for using cpuset with interrupts. Previously, binding an interrupt to a CPU only performed a privilege check if the interrupt had an interrupt thread. Interrupts without a thread could be bound by non-root users as a result. - If an interrupt event's assign_cpu method fails, then restore the original cpuset mask for the associated interrupt thread. Approved by: re (kib) Notes: svn path=/head/; revision=195249
* When auditing unmount(2), capture FSID arguments as regular text stringsRobert Watson2009-07-011-1/+3
| | | | | | | | | | | | | rather than as paths, which would lead to them being treated as relative pathnames and hence confusingly converted into absolute pathnames. Capture flags to unmount(2) via an argument token. Approved by: re (audit argument blanket) MFC after: 3 days Notes: svn path=/head/; revision=195247
* Audit the file descriptor number passed to lseek(2).Robert Watson2009-07-011-0/+1
| | | | | | | | Approved by: re (kib) MFC after: 3 days Notes: svn path=/head/; revision=195242
* Fix link(2) auditing: use the second audit record path for the new objectRobert Watson2009-07-011-1/+1
| | | | | | | | | | name. Approved by: re (kib) MFC after: 3 days Notes: svn path=/head/; revision=195238
* udit the 'options' argument to wait4(2).Robert Watson2009-07-011-0/+1
| | | | | | | | Approved by: re (kib) MFC after: 3 days Notes: svn path=/head/; revision=195235
* Remove a stale comment. The very same revision (r85511) that introducedAlan Cox2009-06-301-3/+0
| | | | | | | | | this comment also implemented the proposed change to the code. Approved by: re (kib) Notes: svn path=/head/; revision=195209
* Add FIONSPACE from NetBSD. FIONSPACE is provided so that programs mayEd Maste2009-06-301-0/+8
| | | | | | | | | | | | | | | easily determine how much space is left in the send queue; they do not need to know the send queue size. NetBSD revisions: sys_socket.c r1.41, 1.42 filio.h r1.9 Obtained from: NetBSD Approved by: re (kensmith) Notes: svn path=/head/; revision=195191
* Free struct ucreds allocated in vfs_hang_addrlist() when deletingKonstantin Belousov2009-06-291-4/+14
| | | | | | | | | | | | the export element. While there, remove register storage-class specifiers. Reported and tested by: pho Reviewed by: kan Approved by: re (kensmith) Notes: svn path=/head/; revision=195166
* Don't assume a default (currently 15) value for preloaded klds whenAttilio Rao2009-06-291-39/+19
| | | | | | | | | | | | | | loading hwpmc, but calculate at runtime and allocate the necessary space. Also the current logic is wrong as it can lead to an endless loop. Sponsored by: Sandvine Incorporated Reported by: Ryan Stone <rstone at sandvine dot com> Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com> Approved by: re (kib) Notes: svn path=/head/; revision=195159
* - Turn the third (islocked) argument of the knote call into flags parameter.Stanislav Sedov2009-06-281-6/+18
| | | | | | | | | | | | | | Introduce the new flag KNF_NOKQLOCK to allow event callers to be called without KQ_LOCK mtx held. - Modify VFS knote calls to always use KNF_NOKQLOCK flag. This is required for ZFS as its getattr implementation may sleep. Approved by: re (rwatson) Reviewed by: kib MFC after: 2 weeks Notes: svn path=/head/; revision=195148
* Add FIONWRITE support to TTYs.Ed Schouten2009-06-281-3/+4
| | | | | | | | | | TTYs already supported TIOCOUTQ, but FIONWRITE seems to be a more generic name for this. Approved by: re (kib) Notes: svn path=/head/; revision=195136
* There are a number of ways an application can check if there arePoul-Henning Kamp2009-06-281-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | inbound data waiting on a filedescriptor, such as a pipe or a socket, for instance by using select(2), poll(2), kqueue(2), ioctl(FIONREAD) etc. But we have no way of finding out if written data have yet to be disposed of, for instance, transmitted (and ack'ed!) to some remote host, or read by the applicantion at the far end of the pipe. The closest we get, is calling shutdown(2) on a TCP socket in non-blocking mode, but this has the undesirable sideeffect of preventing future communication. Add a complement to FIONREAD, called FIONWRITE, which returns the number of bytes not yet properly disposed of. Implement it for all sockets. Background: A HTTP server will want to time out connections, if no new request arrives within a certain period after the last transmitted response has actually been sent (and ack'ed). For a busy HTTP server, this timeout can be subsecond duration. In order to signal to a load-balancer that the connection is truly dead, TCP_RST will be the preferred method, as this avoids the need for a RTT delay for FIN handshaking, with a client which, surprisingly often, no longer at the remote IP number. If a slow, distant client is being served a response which is big enough to fill the window, but small enough to fit in the socket buffer, the write(2) call will return immediately. If the session timeout is armed at that time, all bytes in the response may not have been transmitted by the time it fires. FIONWRITE allows the timeout to check that no data is outstanding on the connection, before it TCP_RST's it. Input & Idea from: rwatson Approved by: re (kib) Notes: svn path=/head/; revision=195134
* Correct a long-standing performance bug in cluster_rbuild(). Specifically,Alan Cox2009-06-271-4/+15
| | | | | | | | | | | | | in the case of a file system with a block size that is less than the page size, cluster_rbuild() looks at too many of the page's valid bits. Consequently, it may terminate prematurely, resulting in poor performance. Reported by: bde Reviewed by: tegge Approved by: re (kib) Notes: svn path=/head/; revision=195122
* Replace AUDIT_ARG() with variable argument macros with a set more moreRobert Watson2009-06-2713-113/+110
| | | | | | | | | | | | | | | | | specific macros for each audit argument type. This makes it easier to follow call-graphs, especially for automated analysis tools (such as fxr). In MFC, we should leave the existing AUDIT_ARG() macros as they may be used by third-party kernel modules. Suggested by: brooks Approved by: re (kib) Obtained from: TrustedBSD Project MFC after: 1 week Notes: svn path=/head/; revision=195104
* This change is the next step in implementing the cache control functionalityAlan Cox2009-06-261-1/+1
| | | | | | | | | | | | | | required by video card drivers. Specifically, this change introduces vm_cache_mode_t with an appropriate VM_CACHE_DEFAULT definition on all architectures. In addition, this changes adds a vm_cache_mode_t parameter to kmem_alloc_contig() and vm_phys_alloc_contig(). These will be the interfaces for allocating mapped kernel memory and physical memory, respectively, with non-default cache modes. In collaboration with: jhb Notes: svn path=/head/; revision=195033
* In lf_iteratelocks_vnode, increment state->ls_threads around iteratingKonstantin Belousov2009-06-251-1/+10
| | | | | | | | | | | of the vnode advisory lock list. This prevents deallocation of state while inside the loop. Reported and tested by: pho MFC after: 2 weeks Notes: svn path=/head/; revision=194993
* Return errors from intr_event_bind() to the caller of intr_set_affinity().John Baldwin2009-06-251-2/+1
| | | | | | | | | | | Specifically, if a non-root user attempts to bind an interrupt the request will now report failure with EPERM rather than silently failing with a successful return code. MFC after: 1 week Notes: svn path=/head/; revision=194987
* Use the correct cast for the arguments passed to freebsd_shmctl() inJohn Baldwin2009-06-251-1/+1
| | | | | | | | | oshmctl(). Submitted by: kib Notes: svn path=/head/; revision=194976
* Tweak the oshmctl() compile fix: convert the K&R definition to ANSI.John Baldwin2009-06-251-7/+1
| | | | Notes: svn path=/head/; revision=194959
* oshmctl() now requires a sysv_shm.c-local function prototype.Robert Watson2009-06-251-0/+4
| | | | Notes: svn path=/head/; revision=194941
* - Use DPCPU for SCHED_STATS. This is somewhat awkward because theJeff Roberson2009-06-251-18/+37
| | | | | | | | | | | | | | offset of the stat is not known until link time so we must emit a function to call SYSCTL_ADD_PROC rather than using SYSCTL_PROC directly. - Eliminate the atomic from SCHED_STAT_INC now that it's using per-cpu variables. Sched stats are always incremented while we're holding a spinlock so no further protection is required. Reviewed by: sam Notes: svn path=/head/; revision=194936
* - Add a sysctl_dpcpu_long to support long typed pcpu stats.Jeff Roberson2009-06-251-10/+19
| | | | | | | | | - Remove the #ifndef SMP case as the SMP code works on UP as well. Reviewed by: sam Notes: svn path=/head/; revision=194935
* Wrap a PR_VNET inside "#ifdef VIMAGE" since that the only place it applies.Jamie Gritton2009-06-241-0/+2
| | | | | | | | | | bz wants the blame for this. Noticed by: rwatson Approved by: bz (mentor) Notes: svn path=/head/; revision=194923
* Regen.John Baldwin2009-06-243-87/+99
| | | | Notes: svn path=/head/; revision=194919
* In case of prisons with their own network stack, permitJamie Gritton2009-06-241-0/+128
| | | | | | | | | | | | | | | additional privileges as well as not restricting the type of sockets a user can open. Note: the VIMAGE/vnet fetaure of of jails is still considered experimental and cannot guarantee that privileged users can be kept imprisoned if enabled. Reviewed by: rwatson Approved by: bz (mentor) Notes: svn path=/head/; revision=194915
* Change the ABI of some of the structures used by the SYSV IPC API:John Baldwin2009-06-245-20/+290
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - The uid/cuid members of struct ipc_perm are now uid_t instead of unsigned short. - The gid/cgid members of struct ipc_perm are now gid_t instead of unsigned short. - The mode member of struct ipc_perm is now mode_t instead of unsigned short (this is merely a style bug). - The rather dubious padding fields for ABI compat with SV/I386 have been removed from struct msqid_ds and struct semid_ds. - The shm_segsz member of struct shmid_ds is now a size_t instead of an int. This removes the need for the shm_bsegsz member in struct shmid_kernel and should allow for complete support of SYSV SHM regions >= 2GB. - The shm_nattch member of struct shmid_ds is now an int instead of a short. - The shm_internal member of struct shmid_ds is now gone. The internal VM object pointer for SHM regions has been moved into struct shmid_kernel. - The existing __semctl(), msgctl(), and shmctl() system call entries are now marked COMPAT7 and new versions of those system calls which support the new ABI are now present. - The new system calls are assigned to the FBSD-1.1 version in libc. The FBSD-1.0 symbols in libc now refer to the old COMPAT7 system calls. - A simplistic framework for tagging system calls with compatibility symbol versions has been added to libc. Version tags are added to system calls by adding an appropriate __sym_compat() entry to src/lib/libc/incldue/compat.h. [1] PR: kern/16195 kern/113218 bin/129855 Reviewed by: arch@, rwatson Discussed with: kan, kib [1] Notes: svn path=/head/; revision=194910
* Deprecate the msgsys(), semsys(), and shmsys() system calls by movingJohn Baldwin2009-06-244-185/+185
| | | | | | | | | | | | them under COMPAT_FREEBSD[4567]. Starting with FreeBSD 5.0 the SYSV IPC API was implemented via direct system calls (e.g. msgctl(), msgget(), etc.) rather than indirecting through the var-args *sys() system calls. The shmsys() system call was already effectively deprecated for all but COMPAT_FREEBSD4 already as its implementation for the !COMPAT_FREEBSD4 case was to simply invoke nosys(). Notes: svn path=/head/; revision=194894
* - Similar to the previous commit, but for CURRENT: Fix a bug where a FIFO vnodeUlf Lilleengen2009-06-241-1/+0
| | | | | | | use count was increased twice, but only decreased once. Notes: svn path=/head/; revision=194881
* - Fix a bug where a FIFO vnode use count was increased twice, but onlyUlf Lilleengen2009-06-241-1/+0
| | | | | | | | | decreased once. MFC after: 1 week Notes: svn path=/head/; revision=194879
* Fix a race in vi_if_move, where a vnet is used after the prison thatJamie Gritton2009-06-241-21/+28
| | | | | | | | | referred to it has been released. Approved by: bz (mentor) Notes: svn path=/head/; revision=194841
* Add a new COMPAT7 flag for FreeBSD 7.x compatibility system calls.John Baldwin2009-06-242-5/+36
| | | | Notes: svn path=/head/; revision=194833
* - Move syscall function argument structure types to be just above theJohn Baldwin2009-06-243-20/+17
| | | | | | | | relevenat system call function. - Whitespace fixes. Notes: svn path=/head/; revision=194832
* Add stack_print_short() and stack_print_short_ddb() interfaces toRobert Watson2009-06-241-10/+54
| | | | | | | | | | stack(9), which generate a more compact rendition of a stack trace via the kernel's printf. MFC after: 1 week Notes: svn path=/head/; revision=194828