summaryrefslogtreecommitdiff
path: root/sys/kern/subr_trap.c
Commit message (Collapse)AuthorAgeFilesLines
* Move KTRUSERRET() from userret() to ast(). It's a really longEdward Tomasz Napierala2020-10-031-3/+4
| | | | | | | | | | | | detour - it writes ktrace entries to the filesystem - so the overhead of ast() won't make any difference. Reviewed by: kib Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D26404 Notes: svn path=/head/; revision=366391
* Move td_softdep_cleanup() from userret() to ast(); it's infrequentEdward Tomasz Napierala2020-09-141-5/+3
| | | | | | | | | | | | | at best. The schedule_cleanup() function already sets TDF_ASTPENDING. Reviewed by: kib, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D26375 Notes: svn path=/head/; revision=365712
* Move TDP_GEOM check from userret() to ast(); this code path is quiteEdward Tomasz Napierala2020-09-141-7/+7
| | | | | | | | | | | | | | infrequent. Reviewed by: kib No objections: mav Tested by: pho MFC after: 2 weeks Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D26374 Notes: svn path=/head/; revision=365711
* Move racct/rctl throttling from userret() to ast(). There's no reasonEdward Tomasz Napierala2020-09-141-4/+5
| | | | | | | | | | | | for it to sit in the syscall fast path. Reviewed by: kib MFC after: 2 weeks Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D26368 Notes: svn path=/head/; revision=365710
* In r354148 the goal was to check THREAD_CAN_SLEEP() only once for theGleb Smirnoff2020-09-091-1/+1
| | | | | | | | | | | | purpose of epoch_trace() and for calling subsequent panic, but to keep code fully under INVARIANTS, so don't use bare function call to panic(). However, at the last stage of review a true value slipped in, while always false was assumed. I checked that in email archive with kib@. Noticed by: trasz Notes: svn path=/head/; revision=365504
* Retire procfs-based process debugging.John Baldwin2020-04-011-1/+0
| | | | | | | | | | | | | | | | | | Modern debuggers and process tracers use ptrace() rather than procfs for debugging. ptrace() has a supserset of functionality available via procfs and new debugging features are only added to ptrace(). While the two debugging services share some fields in struct proc, they each use dedicated fields and separate code. This results in extra complexity to support a feature that hasn't been enabled in the default install for several years. PR: 244939 (exp-run) Reviewed by: kib, mjg (earlier version) Relnotes: yes Differential Revision: https://reviews.freebsd.org/D23837 Notes: svn path=/head/; revision=359530
* Return reschedule_signals() to being static again.Konstantin Belousov2020-03-101-6/+2
| | | | | | | | | | | | | It was used after sigfastblock_setpend() call in in ast() when current thread fast-blocks signals. Add a flag to sigfastblock_setpend() to request reschedule, and remove the direct use of the function from subr_trap.c Tested by: pho Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=358855
* Fix a bug in r358168, do not call sigfastblock_setpend() under a mutex.Konstantin Belousov2020-02-201-5/+7
| | | | | | | | | PR: 244250 Reported and tested by: lwhsu Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=358182
* Do not read sigfastblock word on syscall entry.Konstantin Belousov2020-02-201-29/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On machines with SMAP, fueword executes two serializing instructions which can be seen in microbenchmarks. As a measure to restore microbenchmark numbers, only read the word on the attempt to deliver signal in ast(). If the word is set, signal is not delivered and word is kept, preventing interruption of interruptible sleeps by signals until userspace calls sigfastblock(UNBLOCK) which clears the word. This way, the spurious EINTR that userspace can see while in critical section is on first interruptible sleep, if a signal is pending, and on signal posting. It is believed that it is not important for rtld and lbithr critical sections. It might be visible for the application code e.g. for the callback of dl_iterate_phdr(3), but again the belief is that the non-compliance is acceptable. Most important is that the retry of the sleeping syscall does not interrupt unless additional signal is posted. For now I added the knob kern.sigfastblock_fetch_always to enable the word read on syscall entry to be able to diagnose possible issues due to spurious EINTR. While there, do some code restructuting to have all sigfastblock() handling located in kern_sig.c. Reviewed by: jeff Discussed with: mjg Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D23622 Notes: svn path=/head/; revision=358168
* Annotate branches in the syscall pathMateusz Guzik2020-02-141-2/+2
| | | | | | | | | | | This in particular significantly shortens amd64_syscall, which otherwise keeps jumping forward over 2KB of code in total. Note some of these branches should be either eliminated altogether or coalesced. Notes: svn path=/head/; revision=357911
* Add a way to manage thread signal mask using shared word, instead of syscall.Konstantin Belousov2020-02-091-17/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A new syscall sigfastblock(2) is added which registers a uint32_t variable as containing the count of blocks for signal delivery. Its content is read by kernel on each syscall entry and on AST processing, non-zero count of blocks is interpreted same as the signal mask blocking all signals. The biggest downside of the feature that I see is that memory corruption that affects the registered fast sigblock location, would cause quite strange application misbehavior. For instance, the process would be immune to ^C (but killable by SIGKILL). With consumers (rtld and libthr added), benchmarks do not show a slow-down of the syscalls in micro-measurements, and macro benchmarks like buildworld do not demonstrate a difference. Part of the reason is that buildworld time is dominated by compiler, and clang already links to libthr. On the other hand, small utilities typically used by shell scripts have the total number of syscalls cut by half. The syscall is not exported from the stable libc version namespace on purpose. It is intended to be used only by our C runtime implementation internals. Tested by: pho Disscussed with: cem, emaste, jilles Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D12773 Notes: svn path=/head/; revision=357693
* vfs: prealloc vnodes in getnewvnode_reserveMateusz Guzik2020-01-111-2/+2
| | | | | | | | | | | | Having a reserved vnode count does not guarantee that getnewvnodes wont block later. Said blocking partially defeats the purpose of reserving in the first place. Preallocate instaed. The only consumer was always passing "1" as count and never nesting reservations. Notes: svn path=/head/; revision=356643
* schedlock 4/4Jeff Roberson2019-12-151-2/+1
| | | | | | | | | | | | | | | | | | | | | Don't hold the scheduler lock while doing context switches. Instead we unlock after selecting the new thread and switch within a spinlock section leaving interrupts and preemption disabled to prevent local concurrency. This means that mi_switch() is entered with the thread locked but returns without. This dramatically simplifies scheduler locking because we will not hold the schedlock while spinning on blocked lock in switch. This change has not been made to 4BSD but in principle it would be more straightforward. Discussed with: markj Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D22778 Notes: svn path=/head/; revision=355784
* Merge td_epochnest with td_no_sleeping.Gleb Smirnoff2019-10-291-8/+6
| | | | | | | | | | | | | | | | | | | Epoch itself doesn't rely on the counter and it is provided merely for sleeping subsystems to check it. - In functions that sleep use THREAD_CAN_SLEEP() to assert correctness. With EPOCH_TRACE compiled print epoch info. - _sleep() was a wrong place to put the assertion for epoch, right place is sleepq_add(), as there ways to call the latter bypassing _sleep(). - Do not increase td_no_sleeping in non-preemptible epochs. The critical section would trigger all possible safeguards, no sleeping counter is extraneous. Reviewed by: kib Notes: svn path=/head/; revision=354148
* Use THREAD_CAN_SLEEP() macro to check if thread can sleep. There is noGleb Smirnoff2019-10-241-1/+1
| | | | | | | | | functional change. Discussed with: kib Notes: svn path=/head/; revision=354052
* When assertion for a thread not being in an epoch fails also print allGleb Smirnoff2019-10-151-0/+4
| | | | | | | | | | entered epochs. Works with EPOCH_TRACE only. Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D22017 Notes: svn path=/head/; revision=353596
* assert that td_lk_slocks is not leaked upon return from kernelAndriy Gapon2019-08-191-0/+3
| | | | | | | | | | | | This is similar to checks for td_sx_slocks and td_rw_rlocks. Although td_lk_slocks is an implementation detail, it still makes sense to validate it. MFC after: 1 week Sponsored by: Panzura Notes: svn path=/head/; revision=351213
* Deinline racct throttling out of syscall exit path.Mateusz Guzik2018-11-291-10/+2
| | | | | | | | | | | | racct is not enabled by default and even when it is enabled processes are typically not throttled. The order of checks is left unchanged since racct_enable will be annotated as __read_frequently, while checking for the flag in the processes would probably require an extra fetch. Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=341181
* hwpmc: support sampling both kernel and user stacks when interrupted in kernelMatt Macy2018-06-041-0/+5
| | | | | | | | | | | | | | | | This adds the -U options to pmcstat which will attribute in-kernel samples back to the user stack that invoked the system call. It is not the default, because when looking at kernel profiles it is generally more desirable to merge all instances of a given system call together. Although heavily revised, this change is directly derived from D7350 by Jonathan T. Looney. Obtained from: jtl Sponsored by: Juniper Networks, Limelight Networks Notes: svn path=/head/; revision=334595
* sx: port over writer starvation prevention measures from rwlockMateusz Guzik2018-05-221-0/+3
| | | | | | | | | | | | | | | | | | A constant stream of readers could completely starve writers and this is not a hypothetical scenario. The 'poll2_threads' test from the will-it-scale suite reliably starves writers even with concurrency < 10 threads. The problem was run into and diagnosed by dillon@backplane.com There was next to no change in lock contention profile during -j 128 pkg build, despite an sx lock being at the top. Tested by: pho Notes: svn path=/head/; revision=334024
* Add simple preempt safe epoch APIMatt Macy2018-05-101-0/+2
| | | | | | | | | | | | | | | | | | Read locking is over used in the kernel to guarantee liveness. This API makes it easy to provide livenes guarantees without atomics. Includes epoch_test kernel module to stress test the API. Documentation will follow initial use case. Test case and improvements to preemption handling in response to discussion with mjg@ Reviewed by: imp@, shurd@ Approved by: sbruno@ Notes: svn path=/head/; revision=333466
* Account the size of the vslock-ed memory by the thread.Konstantin Belousov2018-03-241-0/+2
| | | | | | | | | | | | | Assert that all such memory is unwired on return to usermode. The count of the wired memory will be used to detect the copyout mode. Tested by: pho (as part of the larger patch) Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=331490
* spdx: initial adoption of licensing ID tags.Pedro F. Giffuni2017-11-181-0/+2
| | | | | | | | | | | | | | | | | | | | The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point. Initially, only tag files that use BSD 4-Clause "Original" license. RelNotes: yes Differential Revision: https://reviews.freebsd.org/D13133 Notes: svn path=/head/; revision=325966
* - Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeterGleb Smirnoff2017-04-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | in place. To do per-cpu stats, convert all fields that previously were maintained in the vmmeters that sit in pcpus to counter(9). - Since some vmmeter stats may be touched at very early stages of boot, before we have set up UMA and we can do counter_u64_alloc(), provide an early counter mechanism: o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter. o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter, so that at early stages of boot, before counters are allocated we already point to a counter that can be safely written to. o For sparc64 that required a whole dummy pcpu[MAXCPU] array. Further related changes: - Don't include vmmeter.h into pcpu.h. - vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit, to match kernel representation. - struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion. This is based on benno@'s 4-year old patch: https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html Reviewed by: kib, gallatin, marius, lidl Differential Revision: https://reviews.freebsd.org/D10156 Notes: svn path=/head/; revision=317061
* Do not leak mount references for dying threads.Konstantin Belousov2017-02-251-3/+3
| | | | | | | | | | | | | | | | | | | | Thread might create a condition for delayed SU cleanup, which creates a reference to the mount point in td_su, but exit without returning through userret(), e.g. when terminating due to single-threading or process exit. In this case, td_su reference is not dropped and mount point cannot be freed. Handle the situation by clearing td_su also in the thread destructor and in exit1(). softdep_ast_cleanup() has to receive the thread as argument, since e.g. thread destructor is executed in different context. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=314253
* The assertion re-added in r302614 was triggered when stopping signalKonstantin Belousov2016-07-181-10/+18
| | | | | | | | | | | | | | | | | | is delivered to vforked child. Issue is that we avoid stopping such children in issignal() to not block parents. But executed AST, which ignored stops, leaves the child with the signal pending but no AST pending. On first exec after vfork(), call signotify() to handle pending reenabled signals. Adjust the assert to not check vfork children until exec. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=302999
* Revive the check, disabled in r197963.Konstantin Belousov2016-07-121-10/+37
| | | | | | | | | | | | | | | | | | | | Despite the implication (process has pending signals -> the current thread marked for AST and has TDF_NEEDSIGCHK set) is not true due to other thread might manipulate its signal blocking mask, it should still hold for the single-threaded processes. Enable check for the condition for single-threaded case, and replicate it from userret() to ast() as well, where we check that ast indeed has no signal to deliver. Note that the check is under DIAGNOSTIC, it is not enabled for INVARIANTS but !DIAGNOSTIC since it imposes too heavy-weight locking for day-to-day used debugging kernel. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=302614
* Add assert to complement r302328.Konstantin Belousov2016-07-121-1/+3
| | | | | | | | | | | | | AST must not execute with TDF_SBDRY or TDF_SEINTR/TDF_SERESTART thread flags set, which is asserted in userret(). As the consequence, -1 return from cursig() must not be possible. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=302613
* Rewrite sigdeferstop(9) and sigallowstop(9) into more flexibleKonstantin Belousov2016-06-261-1/+1
| | | | | | | | | | | | | | | | framework allowing to set the suspension policy for the dynamic block. Extend the currently possible policies of stopping on interruptible sleeps and ignoring such sleeps by two more: do not suspend at interruptible sleeps, but interrupt them with either EINTR or ERESTART. Reviewed by: jilles Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb) Notes: svn path=/head/; revision=302215
* Add four new RCTL resources - readbps, readiops, writebps and writeiops,Edward Tomasz Napierala2016-04-071-3/+7
| | | | | | | | | | | | | | | | | | for limiting disk (actually filesystem) IO. Note that in some cases these limits are not quite precise. It's ok, as long as it's within some reasonable bounds. Testing - and review of the code, in particular the VFS and VM parts - is very welcome. MFC after: 1 month Relnotes: yes Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D5080 Notes: svn path=/head/; revision=297633
* racct: perform a lockless check for p_throttledMateusz Guzik2015-07-131-1/+1
| | | | | | | | | This reduces proc lock contention. Reviewed by: trasz Notes: svn path=/head/; revision=285511
* Generalised support for copy-on-write structures shared by threads.Mateusz Guzik2015-06-101-2/+2
| | | | | | | | | | | | Thread credentials are maintained as follows: each thread has a pointer to creds and a reference on them. The pointer is compared with proc's creds on userspace<->kernel boundary and updated if needed. This patch introduces a counter which can be compared instead, so that more structures can use this scheme without adding more comparisons on the boundary. Notes: svn path=/head/; revision=284214
* Currently, softupdate code detects overstepping on the workitemsKonstantin Belousov2015-05-271-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | limits in the code which is deep in the call stack, and owns several critical system resources, like vnode locks. Attempt to wait while the per-mount softupdate thread cleans up the backlog may deadlock, because the thread might need to lock the same vnode which is owned by the waiting thread. Instead of synchronously waiting for the worker, perform the worker' tickle and pause until the backlog is cleaned, at the safe point during return from kernel to usermode. A new ast request to call softdep_ast_cleanup() is created, the SU code now only checks the size of queue and schedules ast. There is no ast delivery for the kernel threads, so they are exempted from the mechanism, except NFS daemon threads. NFS server loop explicitely checks for the request, and informs the schedule_cleanup() that it is capable of handling the requests by the process P2_AST_SU flag. This is needed because nfsd may be the sole cause of the SU workqueue overflow. But, to not cause nsfd to spawn additional threads just because we slow down existing workers, only tickle su threads, without waiting for the backlog cleanup. Reviewed by: jhb, mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=283600
* Remove support for Xen PV domU kernels. Support for HVM domU kernelsJohn Baldwin2015-04-301-9/+0
| | | | | | | | | | | | | | | | | | | | | | | | remains. Xen is planning to phase out support for PV upstream since it is harder to maintain and has more overhead. Modern x86 CPUs include virtualization extensions that support HVM guests instead of PV guests. In addition, the PV code was i386 only and not as well maintained recently as the HVM code. - Remove the i386-only NATIVE option that was used to disable certain components for PV kernels. These components are now standard as they are on amd64. - Remove !XENHVM bits from PV drivers. - Remove various shims required for XEN (e.g. PT_UPDATES_FLUSH, LOAD_CR3, etc.) - Remove duplicate copy of <xen/features.h>. - Remove unused, i386-only xenstored.h. Differential Revision: https://reviews.freebsd.org/D2362 Reviewed by: royger Tested by: royger (i386/amd64 HVM domU and amd64 PVH dom0) Relnotes: yes Notes: svn path=/head/; revision=282274
* Add kern.racct.enable tunable and RACCT_DISABLED config option.Edward Tomasz Napierala2015-04-291-5/+8
| | | | | | | | | | | | | | The point of this is to be able to add RACCT (with RACCT_DISABLED) to GENERIC, to avoid having to rebuild the kernel to use rctl(8). Differential Revision: https://reviews.freebsd.org/D2369 Reviewed by: kib@ MFC after: 1 month Relnotes: yes Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=282213
* Revert r263475: TDP_DEVMEMIO no longer needed, since amd64 /dev/kmemKonstantin Belousov2015-01-121-2/+0
| | | | | | | | | | | does not access kernel mappings directly. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=277055
* Fix two issues with /dev/mem access on amd64, both causing kernel pageKonstantin Belousov2014-03-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | faults. First, for accesses to direct map region should check for the limit by which direct map is instantiated. Second, for accesses to the kernel map, success returned from the kernacc(9) does not guarantee that consequent attempt to read or write to the checked address succeed, since other thread might invalidate the address meantime. Add a new thread private flag TDP_DEVMEMIO, which instructs vm_fault() to return error when fault happens on the MAP_ENTRY_NOFAULT entry, instead of panicing. The trap handler would then see a page fault from access, and recover in normal way, making /dev/mem access safer. Remove GIANT_REQUIRED from the amd64 memrw(), since it is not needed and having Giant locked does not solve issues for amd64. Note that at least the second issue exists on other architectures, and requires similar patching for md code. Reported and tested by: clusteradm (gjb, sbruno) Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=263475
* Update kernel inclusions of capability.h to use capsicum.h instead; someRobert Watson2014-03-161-1/+1
| | | | | | | | | | | further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. MFC after: 3 weeks Notes: svn path=/head/; revision=263233
* - Assert for not leaking readers rw locks counter on userland return.Attilio Rao2013-12-171-0/+3
| | | | | | | | | - Use a correct spin_cnt for KDTRACE_HOOK case in rw read lock. Sponsored by: EMC / Isilon storage division Notes: svn path=/head/; revision=259509
* - For kernel compiled only with KDTRACE_HOOKS and not any lock debuggingAttilio Rao2013-11-251-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip Notes: svn path=/head/; revision=258541
* Partially revert r195702. Deferring stops is now implemented via a set ofJohn Baldwin2013-03-181-1/+1
| | | | | | | | | | | calls to toggle TDF_SBDRY rather than passing PBDRY to individual sleep calls. - Remove the stop_allowed parameters from cursig() and issignal(). issignal() checks TDF_SBDRY directly. - Remove the PBDRY and SLEEPQ_STOP_ON_BDRY flags. Notes: svn path=/head/; revision=248470
* When throttling a process to enforce RACCT limits, do not use neitherEdward Tomasz Napierala2013-03-141-9/+2
| | | | | | | | | | | PBDRY (which simply doesn't make any sense) nor PCATCH (which could be used by a malicious process to work around the PCPU limit). Submitted by: Rudo Tomori Reviewed by: kib Notes: svn path=/head/; revision=248300
* Replace the TDP_NOSLEEPING flag with a counter so that theJohn Baldwin2013-03-011-1/+1
| | | | | | | | | THREAD_NO_SLEEPING() and THREAD_SLEEPING_OK() macros can nest. Reviewed by: attilio Notes: svn path=/head/; revision=247588
* Further refine the handling of stop signals in the NFS client. TheJohn Baldwin2013-02-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | changes in r246417 were incomplete as they did not add explicit calls to sigdeferstop() around all the places that previously passed SBDRY to _sleep(). In addition, nfs_getcacheblk() could trigger a write RPC from getblk() resulting in sigdeferstop() recursing. Rather than manually deferring stop signals in specific places, change the VFS_*() and VOP_*() methods to defer stop signals for filesystems which request this behavior via a new VFCF_SBDRY flag. Note that this has to be a VFC flag rather than a MNTK flag so that it works properly with VFS_MOUNT() when the mount is not yet fully constructed. For now, only the NFS clients are set this new flag in VFS_SET(). A few other related changes: - Add an assertion to ensure that TDF_SBDRY doesn't leak to userland. - When a lookup request uses VOP_READLINK() to follow a symlink, mark the request as being on behalf of the thread performing the lookup (cnp_thread) rather than using a NULL thread pointer. This causes NFS to properly handle signals during this VOP on an interruptible mount. PR: kern/176179 Reported by: Russell Cattelan (sigdeferstop() recursion) Reviewed by: kib MFC after: 1 month Notes: svn path=/head/; revision=247116
* Fixup r240246: hwpmc needs to retain the pinning until ASTs are notAttilio Rao2012-10-301-1/+6
| | | | | | | | | | | | | | | | executed. This means past the point where userret() is generally executed. Skip the td_pinned check if a callchain tracing is currently happening and add a more robust check to pmc_capture_user_callchain() in order to catch td_pinned leak past ast() in hwpmc case. Reported and tested by: fabient MFC after: 1 week X-MFC: r240246 Notes: svn path=/head/; revision=242361
* Add CPU percentage limit enforcement to RCTL. The resouce name is "pcpu".Edward Tomasz Napierala2012-10-261-0/+13
| | | | | | | It was implemented by Rudolf Tomori during Google Summer of Code 2012. Notes: svn path=/head/; revision=242139
* Add a KPI to allow to reserve some amount of space in the numvnodesKonstantin Belousov2012-10-141-0/+2
| | | | | | | | | | | | | | | | counter, without actually allocating the vnodes. The supposed use of the getnewvnode_reserve(9) is to reclaim enough free vnodes while the code still does not hold any resources that might be needed during the reclamation, and to consume the slack later for getnewvnode() calls made from the innards. After the critical block is finished, the caller shall free any reserve left, by getnewvnode_drop_reserve(9). Reviewed by: avg Tested by: pho MFC after: 1 week Notes: svn path=/head/; revision=241556
* Move the checks for td_pinned, td_critnest, TDP_NOFAULTING andAttilio Rao2012-09-081-1/+14
| | | | | | | | | | | | | TDP_NOSLEEPING leaking from syscallret() to userret() so that also trap handling is covered. Also, the check on td_locks is not duplicated between the two functions. Reported by: avg Reviewed by: kib MFC after: 1 week Notes: svn path=/head/; revision=240246
* Move PT_UPDATED_FLUSH() before td_locks check in order to have moreAttilio Rao2012-09-081-3/+3
| | | | | | | | | | coverage also in the XEN case. Reviewed by: kib MFC after: 1 week Notes: svn path=/head/; revision=240245
* userret() already checks for td_locks when INVARIANTS is enabled, soAttilio Rao2012-09-081-1/+0
| | | | | | | | | | there is no need to check if Giant is acquired after it. Reviewed by: kib MFC after: 1 week Notes: svn path=/head/; revision=240244