summaryrefslogtreecommitdiff
path: root/sys/kern/kern_intr.c
Commit message (Collapse)AuthorAgeFilesLines
* Allow swi_sched() to be called from NMI context.Alexander Motin2020-07-251-7/+27
| | | | | | | | | | | | | | | | | | For purposes of handling hardware error reported via NMIs I need a way to escape NMI context, being too restrictive to do something significant. To do it this change introduces new swi_sched() flag SWI_FROMNMI, making it careful about used KPIs. On platforms allowing IPI sending from NMI context (x86 for now) it immediately wakes clk_intr_event via new IPI_SWI, otherwise it works just like SWI_DELAY. To handle the delayed SWIs this patch calls clk_intr_event on every hardclock() tick. MFC after: 2 weeks Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D25754 Notes: svn path=/head/; revision=363527
* Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)Pawel Biernacki2020-02-261-4/+8
| | | | | | | | | | | | | | | | | | | r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718 Notes: svn path=/head/; revision=358333
* Fix for unbalanced EPOCH(9) usage in the generic kernel interruptHans Petter Selasky2020-02-101-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | handler. Interrupt handlers are removed via intr_event_execute_handlers() when IH_DEAD is set. The thread removing the interrupt is woken up, and calls intr_event_update(). When this happens, the ie_hflags are cleared and re-built from all the remaining handlers sharing the event. When the last IH_NET handler is removed, the IH_NET flag will be cleared from ih_hflags (or ie_hflags may still be being rebuilt in a different context), and the ithread_execute_handlers() may return with ie_hflags missing IH_NET. This can lead to a scenario where IH_NET was present before calling ithread_execute_handlers, and is not present at its return, meaning the need for epoch must be cached locally. This can happen when loading and unloading network drivers. Also make sure the ie_hflags is not cleared before being updated. This is a regression issue after r357004. Backtrace: panic() # trying to access epoch tracker on stack of dead thread _epoch_enter_preempt() ifunit_ref() ifioctl() fo_ioctl() kern_ioctl() sys_ioctl() syscallenter() amd64_syscall() Differential Revision: https://reviews.freebsd.org/D23483 Reviewed by: glebius@, gallatin@, mav@, jeff@ and kib@ Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=357744
* Enter the network epoch for interrupt handlers of INTR_TYPE_NET.Gleb Smirnoff2020-01-231-2/+22
| | | | | | | | | | Provide tunable to limit how many times handlers may be executed without reentering epoch. Differential Revision: https://reviews.freebsd.org/D23242 Notes: svn path=/head/; revision=357004
* Add ie_hflags to struct intr_event, which accumulates flags from allGleb Smirnoff2020-01-231-11/+4
| | | | | | | handlers on this event. For now handle only IH_ENTROPY in that manner. Notes: svn path=/head/; revision=357003
* schedlock 4/4Jeff Roberson2019-12-151-6/+7
| | | | | | | | | | | | | | | | | | | | | Don't hold the scheduler lock while doing context switches. Instead we unlock after selecting the new thread and switch within a spinlock section leaving interrupts and preemption disabled to prevent local concurrency. This means that mi_switch() is entered with the thread locked but returns without. This dramatically simplifies scheduler locking because we will not hold the schedlock while spinning on blocked lock in switch. This change has not been made to 4BSD but in principle it would be more straightforward. Discussed with: markj Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D22778 Notes: svn path=/head/; revision=355784
* schedlock 1/4Jeff Roberson2019-12-151-3/+3
| | | | | | | | | | | | | | | Eliminate recursion from most thread_lock consumers. Return from sched_add() without the thread_lock held. This eliminates unnecessary atomics and lock word loads as well as reducing the hold time for scheduler locks. This will eventually allow for lockless remote adds. Discussed with: kib Reviewed by: jhb Tested by: pho Differential Revision: https://reviews.freebsd.org/D22626 Notes: svn path=/head/; revision=355779
* Remove the deprecated timeout(9) interface.John Baldwin2019-12-131-1/+0
| | | | | | | | | | All in-tree consumers have been converted to callout(9). Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D22602 Notes: svn path=/head/; revision=355732
* Disable intr_storm_threshold mechanism by defaultConrad Meyer2019-05-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ixl.4 manual page has documented that the threshold falsely detects interrupt storms on 40Gbit NICs as long ago as 2015, and we have seen similar false positives with the ioat(4) DMA device (which can push GB/s). For example, synthetic load can be generated with tools/tools/ioat 'ioatcontrol 0 200 8192 1 1000' (allocate 200x8kB buffers, generate an interrupt for each one, and do this for 1000 milliseconds). With storm-detection disabled, the Broadwell-EP version of this device is capable of generating ~350k real interrupts per second. The following historical context comes from jhb@: Originally, the threshold worked around incorrect routing of PCI INTx interrupts on single-CPU systems which would end up in a hard hang during boot. Since the threshold was added, our PCI interrupt routing was improved, most PCI interrupts use edge-triggered MSI instead of level-triggered INTx, and typical systems have multiple CPUs available to service interrupts. On the off chance that the threshold is useful in the future, it remains available as a tunable and sysctl. Reviewed by: jhb Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D20401 Notes: svn path=/head/; revision=348255
* Bind TCP HPTS (pacer) threads to NUMA domainsAndrew Gallatin2019-05-101-0/+19
| | | | | | | | | | | | | | | | | | | Bind the TCP pacer threads to NUMA domains and build per-domain pacer-thread lookup tables. These tables allow us to use the inpcb's NUMA domain information to match an inpcb with a pacer thread on the same domain. The motivation for this is to keep the TCP connection local to a NUMA domain as much as possible. Thanks to jhb for pre-reviewing an earlier version of the patch. Reviewed by: rrs Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20134 Notes: svn path=/head/; revision=347430
* Revert accidental change that should not have been included in r345475.Ian Lepore2019-03-241-1/+1
| | | | | | | | I had changed this value as part of a local experiment, and neglected to change it back before committing the other changes. Notes: svn path=/head/; revision=345476
* Truncate a too-long interrupt handler name when there is only one handler.Ian Lepore2019-03-241-4/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | There are only 19 bytes available for the name of an interrupt plus the name(s) of handlers/drivers using it. There is a mechanism from the days of shared interrupts that replaces some of the handler names with '+' when they don't all fit into 19 bytes. In modern times there is typically only one device on an interrupt, but long device names are the norm, especially with embedded systems. Also, in systems with multiple interrupt controllers, the names of the interrupts themselves can be long. For example, 'gic0,s54: imx6_anatop0' doesn't fit, and replacing the device driver name with a '+' provides no useful info at all. When there is only one handler but its name was too long to fit, this change truncates enough leading chars of the handler name (replacing them with a '-' char to indicate that some chars are missing) to use all 19 bytes, preserving the unit number typically on the end of the name. Using the prior example, this results in: 'gic0,s54:-6_anatop0' which provides plenty of info to figure out which device is involved. PR: 211946 Reviewed by: gonzo@ (prior version without the '-' char) Differential Revision: https://reviews.freebsd.org/D19675 Notes: svn path=/head/; revision=345475
* add support for marking interrupt handlers as suspendedAndriy Gapon2018-12-171-9/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The goal of this change is to fix a problem with PCI shared interrupts during suspend and resume. I have observed a couple of variations of the following scenario. Devices A and B are on the same PCI bus and share the same interrupt. Device A's driver is suspended first and the device is powered down. Device B generates an interrupt. Interrupt handlers of both drivers are called. Device A's interrupt handler accesses registers of the powered down device and gets back bogus values (I assume all 0xff). That data is interpreted as interrupt status bits, etc. So, the interrupt handler gets confused and may produce some noise or enter an infinite loop, etc. This change affects only PCI devices. The pci(4) bus driver marks a child's interrupt handler as suspended after the child's suspend method is called and before the device is powered down. This is done only for traditional PCI interrupts, because only they can be shared. At the moment the change is only for x86. Notable changes in core subsystems / interfaces: - BUS_SUSPEND_INTR and BUS_RESUME_INTR methods are added to bus interface along with convenience functions bus_suspend_intr and bus_resume_intr; - rman_set_irq_cookie and rman_get_irq_cookie functions are added to provide a way to associate an interrupt resource with an interrupt cookie; - intr_event_suspend_handler and intr_event_resume_handler functions are added to the MI interrupt handler interface. I added two new interrupt handler flags, IH_SUSP and IH_CHANGED, to implement the new intr_event functions. IH_SUSP marks a suspended interrupt handler. IH_CHANGED is used to implement a barrier that ensures that a change to the interrupt handler's state is visible to future interrupts. While there, I fixed some whitespace issues in comments and changed a couple of logically boolean variables to be bool. MFC after: 1 month (maybe) Differential Revision: https://reviews.freebsd.org/D15755 Notes: svn path=/head/; revision=342170
* Remove the Yarrow PRNG algorithm option in accordance with due noticeMark Murray2018-08-261-2/+2
| | | | | | | | | | | | | | | | | | | given in random(4). This includes updating of the relevant man pages, and no-longer-used harvesting parameters. Ensure that the pseudo-unit-test still does something useful, now also with the "other" algorithm instead of Yarrow. PR: 230870 Reviewed by: cem Approved by: so(delphij,gtetlow) Approved by: re(marius) Differential Revision: https://reviews.freebsd.org/D16898 Notes: svn path=/head/; revision=338324
* safer wait-free iteration of shared interrupt handlersAndriy Gapon2018-08-031-28/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code that iterates a list of interrupt handlers for a (shared) interrupt, whether in the ISR context or in the context of an interrupt thread, does so in a lock-free fashion. Thus, the routines that modify the list need to take special steps to ensure that the iterating code has a consistent view of the list. Previously, those routines tried to play nice only with the code running in the ithread context. The iteration in the ISR context was left to a chance. After commit r336635 atomic operations and memory fences are used to ensure that ie_handlers list is always safe to navigate with respect to inserting and removal of list elements. There is still a question of when it is safe to actually free a removed element. The idea of this change is somewhat similar to the idea of the epoch based reclamation. There are some simplifications comparing to the general epoch based reclamation. All writers are serialized using a mutex, so we do not need to worry about concurrent modifications. Also, all read accesses from the open context are serialized too. So, we can get away just two epochs / phases. When a thread removes an element it switches the global phase from the current phase to the other and then drains the previous phase. Only after the draining the removed element gets actually freed. The code that iterates the list in the ISR context takes a snapshot of the global phase and then increments the use count of that phase before iterating the list. The use count (in the same phase) is decremented after the iteration. This should ensure that there should be no iteration over the removed element when its gets freed. This commit also simplifies the coordination with the interrupt thread context. Now we always schedule the interrupt thread when removing one of handlers for its interrupt. This makes the code both simpler and safer as the interrupt thread masks the interrupt thus ensuring that there is no interaction with the ISR context. P.S. This change matters only for shared interrupts and I realize that those are becoming a thing of the past (and quickly). I also understand that the problem that I am trying to solve is extremely rare. PR: 229106 Reviewed by: cem Discussed with: Samy Al Bahra MFC after: 5 weeks Differential Revision: https://reviews.freebsd.org/D15905 Notes: svn path=/head/; revision=337255
* change interrupt event's list of handlers from TAILQ to CK_SLISTAndriy Gapon2018-07-231-45/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The primary reason for this commit is to separate mechanical and nearly mechanical code changes from an upcoming fix for unsafe teardown of shared interrupt handlers that have only filters (see D15905). The technical rationale is that SLIST is sufficient. The only operation that gets worse performance -- O(n) instead of O(1) is a removal of a handler, but it is not a critical operation and the list is expected to be rather short. Additionally, it is easier to reason about SLIST when considering the concurrent lock-free access to the list from the interrupt context and the interrupt thread. CK_SLIST is used because the upcoming change depends on the memory order provided by CK_SLIST insert and the fact that CL_SLIST remove does not trash the linkage in a removed element. While here, I also fixed a couple of whitespace issues, made code under ifdef notyet compilable, added a lock assertion to ithread_update() and made intr_event_execute_handlers() static as it had no external callers. Reviewed by: cem (earlier version) MFC after: 4 weeks Differential Revision: https://reviews.freebsd.org/D16016 Notes: svn path=/head/; revision=336635
* Yank crufty INTR_FILTER optionConrad Meyer2018-05-241-540/+0
| | | | | | | | | | | | | | | | | | It was introduced to the tree in r169320 and r169321 in May 2007. It never got much use and never became a kernel default. The code duplicates the default path quite a bit, with slight modifications. Just yank out the cruft. Whatever goals were being aimed for can probably be met within the existing framework, without a flag day option. Mostly mechanical change: 'unifdef -m -UINTR_FILTER'. Reviewed by: mmacy Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D15546 Notes: svn path=/head/; revision=334170
* intr unbreak KTR/LINT buildMatt Macy2018-05-191-2/+2
| | | | Notes: svn path=/head/; revision=333873
* intr: eliminate / annotate unused stack localsMatt Macy2018-05-191-6/+6
| | | | Notes: svn path=/head/; revision=333861
* sys/kern: adoption of SPDX licensing ID tags.Pedro F. Giffuni2017-11-271-0/+2
| | | | | | | | | | | | | | | Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Notes: svn path=/head/; revision=326271
* Extend cpuset_get/setaffinity() APIsConrad Meyer2017-05-031-30/+104
| | | | | | | | | | | | | Add IRQ placement-only and ithread-only API variants. intr_event_bind has been extended with sibling methods, as it has many more callsites in existing code. Reviewed by: kib@, adrian@ (earlier version) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D10586 Notes: svn path=/head/; revision=317756
* - Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeterGleb Smirnoff2017-04-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | in place. To do per-cpu stats, convert all fields that previously were maintained in the vmmeters that sit in pcpus to counter(9). - Since some vmmeter stats may be touched at very early stages of boot, before we have set up UMA and we can do counter_u64_alloc(), provide an early counter mechanism: o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter. o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter, so that at early stages of boot, before counters are allocated we already point to a counter that can be safely written to. o For sparc64 that required a whole dummy pcpu[MAXCPU] array. Further related changes: - Don't include vmmeter.h into pcpu.h. - vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit, to match kernel representation. - struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion. This is based on benno@'s 4-year old patch: https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html Reviewed by: kib, gallatin, marius, lidl Differential Revision: https://reviews.freebsd.org/D10156 Notes: svn path=/head/; revision=317061
* The part of r285680 which removed release semantic for two stores toKonstantin Belousov2015-07-211-2/+14
| | | | | | | | | | | | it_need was wrong [*]. Restore the releases and add a comment explaining why it is needed. Noted by: alc [*] Reviewed by: bde [*] Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=285751
* Restructure code for readability improvement. No functional change.Kirk McKusick2015-07-191-10/+7
| | | | | | | Reviewed by: kib Notes: svn path=/head/; revision=285705
* Further cleanup after r285607.Konstantin Belousov2015-07-181-22/+29
| | | | | | | | | | | | | | | | | | | | | Remove useless release semantic for some stores to it_need. For stores where the release is needed, add a comment explaining why. Fence after the atomic_cmpset() op on the it_need should be acquire only, release is not needed (see above). The combination of atomic_cmpset() + fence_acq() is better expressed there as atomic_cmpset_acq(). Use atomic_cmpset() for swi' ih_need read and clear. Discussed with: alc, bde Reviewed by: bde Comments wording provided by: bde Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=285680
* Do not use atomic_swap_int(9), it is not available on allKonstantin Belousov2015-07-151-2/+2
| | | | | | | | | | | | | architectures. Atomic_cmpset_int(9) is a direct replacement, due to loop. The change fixes arm, arm64, mips an sparc64, which lack atomic_swap(). Suggested and reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=285616
* Reset non-zero it_need indicator to zero atomically with fetching itsKonstantin Belousov2015-07-151-12/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | current value. It is believed that the change is the real fix for the issue which was covered over by the r252683. With the current code, if the interrupt handler sets it_need between read and consequent reset, the update could be lost and ithread_execute_handlers() would not be called in response to the lost update. The r252683 could have hide the issue since at the moment of commit, atomic_load_acq_int() did locked cmpxchg on the variable, which puts the cache line into the exclusive owned state and clears store buffers. Then the immediate store of zero has very high chance of reusing the exclusive state of the cache line and make the load and store sequence operate as atomic swap. For now, add the acq+rel fence immediately after the swap, to not disturb current (but excessive) ordering. Acquire is needed for the ih_need reads after the load, while release does not serve a useful purpose [*]. Reviewed by: alc Noted by: alc [*] Discussed with: bde Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=285607
* Style. Remove excessive brackets. Compare non-boolean with zero.Konstantin Belousov2015-07-151-4/+4
| | | | | | | | Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=285606
* Huge cleanup of random(4) code.Mark Murray2015-06-301-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * GENERAL - Update copyright. - Make kernel options for RANDOM_YARROW and RANDOM_DUMMY. Set neither to ON, which means we want Fortuna - If there is no 'device random' in the kernel, there will be NO random(4) device in the kernel, and the KERN_ARND sysctl will return nothing. With RANDOM_DUMMY there will be a random(4) that always blocks. - Repair kern.arandom (KERN_ARND sysctl). The old version went through arc4random(9) and was a bit weird. - Adjust arc4random stirring a bit - the existing code looks a little suspect. - Fix the nasty pre- and post-read overloading by providing explictit functions to do these tasks. - Redo read_random(9) so as to duplicate random(4)'s read internals. This makes it a first-class citizen rather than a hack. - Move stuff out of locked regions when it does not need to be there. - Trim RANDOM_DEBUG printfs. Some are excess to requirement, some behind boot verbose. - Use SYSINIT to sequence the startup. - Fix init/deinit sysctl stuff. - Make relevant sysctls also tunables. - Add different harvesting "styles" to allow for different requirements (direct, queue, fast). - Add harvesting of FFS atime events. This needs to be checked for weighing down the FS code. - Add harvesting of slab allocator events. This needs to be checked for weighing down the allocator code. - Fix the random(9) manpage. - Loadable modules are not present for now. These will be re-engineered when the dust settles. - Use macros for locks. - Fix comments. * src/share/man/... - Update the man pages. * src/etc/... - The startup/shutdown work is done in D2924. * src/UPDATING - Add UPDATING announcement. * src/sys/dev/random/build.sh - Add copyright. - Add libz for unit tests. * src/sys/dev/random/dummy.c - Remove; no longer needed. Functionality incorporated into randomdev.*. * live_entropy_sources.c live_entropy_sources.h - Remove; content moved. - move content to randomdev.[ch] and optimise. * src/sys/dev/random/random_adaptors.c src/sys/dev/random/random_adaptors.h - Remove; plugability is no longer used. Compile-time algorithm selection is the way to go. * src/sys/dev/random/random_harvestq.c src/sys/dev/random/random_harvestq.h - Add early (re)boot-time randomness caching. * src/sys/dev/random/randomdev_soft.c src/sys/dev/random/randomdev_soft.h - Remove; no longer needed. * src/sys/dev/random/uint128.h - Provide a fake uint128_t; if a real one ever arrived, we can use that instead. All that is needed here is N=0, N++, N==0, and some localised trickery is used to manufacture a 128-bit 0ULLL. * src/sys/dev/random/unit_test.c src/sys/dev/random/unit_test.h - Improve unit tests; previously the testing human needed clairvoyance; now the test will do a basic check of compressibility. Clairvoyant talent is still a good idea. - This is still a long way off a proper unit test. * src/sys/dev/random/fortuna.c src/sys/dev/random/fortuna.h - Improve messy union to just uint128_t. - Remove unneeded 'static struct fortuna_start_cache'. - Tighten up up arithmetic. - Provide a method to allow eternal junk to be introduced; harden it against blatant by compress/hashing. - Assert that locks are held correctly. - Fix the nasty pre- and post-read overloading by providing explictit functions to do these tasks. - Turn into self-sufficient module (no longer requires randomdev_soft.[ch]) * src/sys/dev/random/yarrow.c src/sys/dev/random/yarrow.h - Improve messy union to just uint128_t. - Remove unneeded 'staic struct start_cache'. - Tighten up up arithmetic. - Provide a method to allow eternal junk to be introduced; harden it against blatant by compress/hashing. - Assert that locks are held correctly. - Fix the nasty pre- and post-read overloading by providing explictit functions to do these tasks. - Turn into self-sufficient module (no longer requires randomdev_soft.[ch]) - Fix some magic numbers elsewhere used as FAST and SLOW. Differential Revision: https://reviews.freebsd.org/D2025 Reviewed by: vsevolod,delphij,rwatson,trasz,jmg Approved by: so (delphij) Notes: svn path=/head/; revision=284959
* Remove support for Xen PV domU kernels. Support for HVM domU kernelsJohn Baldwin2015-04-301-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | remains. Xen is planning to phase out support for PV upstream since it is harder to maintain and has more overhead. Modern x86 CPUs include virtualization extensions that support HVM guests instead of PV guests. In addition, the PV code was i386 only and not as well maintained recently as the HVM code. - Remove the i386-only NATIVE option that was used to disable certain components for PV kernels. These components are now standard as they are on amd64. - Remove !XENHVM bits from PV drivers. - Remove various shims required for XEN (e.g. PT_UPDATES_FLUSH, LOAD_CR3, etc.) - Remove duplicate copy of <xen/features.h>. - Remove unused, i386-only xenstored.h. Differential Revision: https://reviews.freebsd.org/D2362 Reviewed by: royger Tested by: royger (i386/amd64 HVM domU and amd64 PVH dom0) Relnotes: yes Notes: svn path=/head/; revision=282274
* This is the much-discussed major upgrade to the random(4) device, known to ↵Mark Murray2014-10-301-18/+7
| | | | | | | | | | | | | | | | | | | | | | you all as /dev/random. This code has had an extensive rewrite and a good series of reviews, both by the author and other parties. This means a lot of code has been simplified. Pluggable structures for high-rate entropy generators are available, and it is most definitely not the case that /dev/random can be driven by only a hardware souce any more. This has been designed out of the device. Hardware sources are stirred into the CSPRNG (Yarrow, Fortuna) like any other entropy source. Pluggable modules may be written by third parties for additional sources. The harvesting structures and consequently the locking have been simplified. Entropy harvesting is done in a more general way (the documentation for this will follow). There is some GREAT entropy to be had in the UMA allocator, but it is disabled for now as messing with that is likely to annoy many people. The venerable (but effective) Yarrow algorithm, which is no longer supported by its authors now has an alternative, Fortuna. For now, Yarrow is retained as the default algorithm, but this may be changed using a kernel option. It is intended to make Fortuna the default algorithm for 11.0. Interested parties are encouraged to read ISBN 978-0-470-47424-2 "Cryptography Engineering" By Ferguson, Schneier and Kohno for Fortuna's gory details. Heck, read it anyway. Many thanks to Arthur Mesh who did early grunt work, and who got caught in the crossfire rather more than he deserved to. My thanks also to folks who helped me thresh this out on whiteboards and in the odd "Hallway track", or otherwise. My Nomex pants are on. Let the feedback commence! Reviewed by: trasz,des(partial),imp(partial?),rwatson(partial?) Approved by: so(des) Notes: svn path=/head/; revision=273872
* Convert a missed u_char cpu -> int cpu.Adrian Chadd2014-10-191-3/+2
| | | | | | | | | | This was caught by a gcc build. Reported by: luigi Sponsored by: Norse Corp, Inc. Notes: svn path=/head/; revision=273270
* Add kernel option KSTACK_USAGE_PROF to sample the stack depth onKonstantin Belousov2014-10-041-0/+5
| | | | | | | | | | | | | interrupts and report the largest value seen as sysctl debug.max_kstack_used. Useful to estimate how close the kernel stack size is to overflow. In collaboration with: Larry Baird <lab@gta.com> Sponsored by: The FreeBSD Foundation (kib) MFC after: 1 week Notes: svn path=/head/; revision=272536
* Migrate ie->ie_assign_cpu and associated code to use an int for CPU ratherAdrian Chadd2014-09-171-3/+3
| | | | | | | | | | | | | | | | | | | | than u_char. Migrate post_filter to use an int for a CPU rather than u_char. Change intr_event_bind() to use an int for CPU rather than u_char. It touches the ppc, sparc64, arm and mips machdep code but it should (hah!) be a no-op. Tested: * i386, AMD64 laptops Reviewed by: jhb Notes: svn path=/head/; revision=271712
* Pull in r267961 and r267973 again. Fix for issues reported will follow.Hans Petter Selasky2014-06-281-2/+1
| | | | Notes: svn path=/head/; revision=267992
* Revert r267961, r267973:Glen Barber2014-06-271-1/+2
| | | | | | | | | | | | | These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory Notes: svn path=/head/; revision=267985
* Extend the meaning of the CTLFLAG_TUN flag to automatically check ifHans Petter Selasky2014-06-271-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=267961
* Permit changing cpu mask for cpu set 1 in presence of driversAlexander V. Chernikov2014-06-221-13/+3
| | | | | | | | | | | | | | | binding their threads to particular CPU. Changing ithread cpu mask is now performed by special cpuset_setithread(). It creates additional cpuset root group on first bind invocation. No objection: jhb Tested by: hiren MFC after: 2 weeks Sponsored by: Yandex LLC Notes: svn path=/head/; revision=267716
* Snapshot. This passes the build test, but has not yet been finished or debugged.Mark Murray2013-10-041-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Contains: * Refactor the hardware RNG CPU instruction sources to feed into the software mixer. This is unfinished. The actual harvesting needs to be sorted out. Modified by me (see below). * Remove 'frac' parameter from random_harvest(). This was never used and adds extra code for no good reason. * Remove device write entropy harvesting. This provided a weak attack vector, was not very good at bootstrapping the device. To follow will be a replacement explicit reseed knob. * Separate out all the RANDOM_PURE sources into separate harvest entities. This adds some secuity in the case where more than one is present. * Review all the code and fix anything obviously messy or inconsistent. Address som review concerns while I'm here, like rename the pseudo-rng to 'dummy'. Submitted by: Arthur Mesh <arthurmesh@gmail.com> (the first item) Notes: svn path=/projects/random_number_generator/; revision=256042
* Snapshot; Do some running repairs on entropy harvesting. More needs to follow.Mark Murray2013-08-261-1/+1
| | | | Notes: svn path=/projects/random_number_generator/; revision=254928
* The change in r236456 (atomic_store_rel not locked) exposed a bugAlfred Perlstein2013-07-041-9/+11
| | | | | | | | | | | | | | | | | | | | | | | | in the ithread code where we could lose ithread interrupts if intr_event_schedule_thread() is called while the ithread is already running. Effectively memory writes could be ordered incorrectly such that the unatomic write of 0 to ithd->it_need (in ithread_loop) could be delayed until after it was set to be triggered again by intr_event_schedule_thread(). This was particularly a big problem for CAM because CAM optimizes scheduling of ithreads such that it only signals camisr() when it queues to an empty queue. This means that additional completion events would not unstick CAM and quickly lead to a complete lockup of the CAM stack. To fix this use atomics in all places where ithd->it_need is accessed. Submitted by: delphij, mav Obtained from: TrueOS, iXsystems MFC After: 1 week Notes: svn path=/head/; revision=252683
* If filter of the interrupt event is not null, print it, in addition toKonstantin Belousov2013-04-051-1/+10
| | | | | | | | | | | | | | the handler address. Add a mark to distinguish between filter and handler. Note that the arguments for both filter and handler are same. Sponsored by: The FreeBSD Foundation Reviewed by: jhb MFC after: 1 week Notes: svn path=/head/; revision=249163
* MFcalloutng (r244355):Davide Italiano2013-03-041-10/+1
| | | | | | | | | | | | | | | | Make loadavg calculation callout direct. There are several reasons for it: - it is very simple and doesn't worth context switch to SWI; - since SWI is no longer used here, we can remove twelve years old hack, excluding this SWI from from the loadavg statistics; - it fixes problem when eventtimer (HPET) shares interrupt with some other device, and that interrupt thread counted as permanent loadavg of 1; now loadavg accounted before that interrupt thread is scheduled. Sponsored by: Google Summer of Code 2012, iXsystems inc. Tested by: flo, marius, ian, Fabian Keil, markj Notes: svn path=/head/; revision=247778
* If an interrupt event's assign_cpu method fails, then restore the originalNeel Natu2013-02-071-1/+1
| | | | | | | | | | cpuset mask for the associated interrupt thread. The text used above is verbatim from r195249 and the code should now be in line with the intent of that commit. Notes: svn path=/head/; revision=246452
* Add optional entropy harvesting for software interrupts in swi_sched()John Baldwin2012-09-251-0/+10
| | | | | | | | | | | | as controlled by kern.random.sys.harvest.swi. SWI harvesting feeds into the interrupt FIFO and each event is estimated as providing a single bit of entropy. Reviewed by: markm, obrien MFC after: 2 weeks Notes: svn path=/head/; revision=240921
* Do not add handler to event handlers list until ithread is created.Alexander Kabaev2012-08-061-25/+27
| | | | | | | | | | | | | | | In rare event when fast and ithread interrupts share the same vector and the fast handler was registered first, we can end up trying to schedule the ithread that is not created yet. The kernel built with INVARIANTS then triggers an assertion. Change the order to create the ithread first and only then add the handler that needs it to the interrupt event handlers list. Reviewed by: jhb Notes: svn path=/head/; revision=239095
* Export intrcnt correctly when running under 32-bit compatibility.Juli Mallett2012-03-091-0/+18
| | | | | | | Reviewed by: gonzo, nwhitehorn Notes: svn path=/head/; revision=232751
* Add a new sched_clear_name() method to the scheduler interface to clearJohn Baldwin2012-03-081-0/+3
| | | | | | | | | | | the cached name used for KTR_SCHED traces when a thread's name changes. This way KTR_SCHED traces (and thus schedgraph) will notice when a thread's name changes, most commonly via execve(). MFC after: 2 weeks Notes: svn path=/head/; revision=232700
* Be pedantic and change // comment to C-style one.Sergey Kandaurov2012-01-161-1/+1
| | | | | | | Noticed by: Bruce Evans Notes: svn path=/head/; revision=230231
* Use strchr() and strrchr().Ed Schouten2012-01-021-4/+4
| | | | | | | | | | | It seems strchr() and strrchr() are used more often than index() and rindex(). Therefore, simply migrate all kernel code to use it. For the XFS code, remove an empty line to make the code identical to the code in the Linux kernel. Notes: svn path=/head/; revision=229272