aboutsummaryrefslogtreecommitdiff
path: root/sys/x86/include/apicvar.h
Commit message (Collapse)AuthorAgeFilesLines
* x86: Allow sharing of perfomance counter interruptsBojan Novković2024-12-151-3/+3
| | | | | | | | | | This patch refactors the Performance Counter interrupt setup code to allow sharing the interrupt line between multiple drivers. More specifically, Performance Counter interrupts are used by both hwpmc(4) and hwt(4)'s upcoming Intel Processor Trace backend. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D46420
* intr/x86: add ioapic_drv_t to reduce number of casts in IO-APIC implementationElliott Mitchell2024-12-111-11/+13
| | | | | | | | | void * is handy when you truly do not care about the type. Yet there is so much casting back and forth in the IO-APIC code as to be hazardous. Achieve better static checking by the compiler using a typedef. Reviewed by: imp Pull Request: https://github.com/freebsd/freebsd-src/pull/1457
* apic: add ioapic_get_dev() methodKonstantin Belousov2024-10-171-0/+1
| | | | | | | | which returns apic device_t by apic_id, if there exists the pci representer Sponsored by: Advanced Micro Devices (AMD) Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Increase IOAPIC_MAX_ID to 255 (from 254)Ed Maste2024-05-101-1/+6
| | | | | | | | | | A test system provided by AMD panicked with "madt_parse_apics: I/O APIC ID 255 too high". I/O APIC ID 255 is acceptable, so increase the limit. Reviewed by: jhb, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45157
* sys: Remove $FreeBSD$: two-line .h patternWarner Losh2023-08-161-2/+0
| | | | Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
* spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSDWarner Losh2023-05-121-1/+1
| | | | | | | | | The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch up to that fact and revert to their recommended match of BSD-2-Clause. Discussed with: pfg MFC After: 3 days Sponsored by: Netflix
* Increase MAX_APIC_ID safeguard to 0x800Ed Maste2022-10-271-1/+1
| | | | | | | | | | | | | | | | | | | MAX_APIC_ID must be at least twice MAXCPU. Increase it to 0x800 so that it is possible to set MAXCPU to 512 or 1024 in a custom kernel config file. Note that increasing this limit does not itself cause any allocations to be larger; it just allows madt_parse_cpu() to process higher APIC IDs. APIC IDs may be sparse and so we can waste memory. This is independent of this change, but becomes more of an issue as the maximum APIC ID grows. This should be addressed with future work. Reviewed by: royger MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D37067
* x86/apic: remove apic_opsRoger Pau Monné2022-01-181-272/+37
| | | | | | | | | | | | | | | | All supported Xen instances by FreeBSD provide a local APIC implementation, so there's no need to replace the native local APIC implementation anymore. Leave just the ipi_vectored hook in order to be able to override it with an implementation based on event channels if the underlying local APIC is not virtualized by hardware. Note the hook cannot use ifuncs, because at the point where ifuncs are resolved the kernel doesn't yet know whether it will benefit from using the optimization. Sponsored by: Citrix Systems R&D Reviewed by: kib Differential revision: https://reviews.freebsd.org/D33917
* x86: Defer LAPIC calibration until after timecounters are availableMark Johnston2021-12-061-0/+10
| | | | | | | | | | | | | | | This ensures that we have a good reference timecounter for performing calibration. Change lapic_setup to avoid configuring the timer when booting, and move calibration and initial configuration to a new lapic routine, lapic_calibrate_timer. This calibration will be initiated from cpu_initclocks(), before an eventtimer is selected. Reviewed by: kib, jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33206
* x86: clean up empty lines in .c and .h filesMateusz Guzik2020-09-011-1/+0
| | | | Notes: svn path=/head/; revision=365079
* Allow swi_sched() to be called from NMI context.Alexander Motin2020-07-251-1/+2
| | | | | | | | | | | | | | | | | | For purposes of handling hardware error reported via NMIs I need a way to escape NMI context, being too restrictive to do something significant. To do it this change introduces new swi_sched() flag SWI_FROMNMI, making it careful about used KPIs. On platforms allowing IPI sending from NMI context (x86 for now) it immediately wakes clk_intr_event via new IPI_SWI, otherwise it works just like SWI_DELAY. To handle the delayed SWIs this patch calls clk_intr_event on every hardclock() tick. MFC after: 2 weeks Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D25754 Notes: svn path=/head/; revision=363527
* amd64: allow parallel shootdown IPIsKonstantin Belousov2020-07-141-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stop using smp_ipi_mtx to protect global shootdown state, and move/multiply the global state into pcpu. Now each CPU can initiate shootdown IPI independently from other CPUs. Initiator enters critical section, then fills its local PCPU shootdown info (pc_smp_tlb_XXX), then clears scoreboard generation at location (cpu, my_cpuid) for each target cpu. After that IPI is sent to all targets which scan for zeroed scoreboard generation words. Upon finding such word the shootdown data is read from corresponding cpu' pcpu, and generation is set. Meantime initiator loops waiting for all zeroed generations in scoreboard to update. Initiator does not disable interrupts, which should allow non-invalidation IPIs from deadlocking, it only needs to disable preemption to pin itself to the instance of the pcpu smp_tlb data. The generation is set before the actual invalidation is performed in handler. It is safe because target CPU cannot return to userspace before handler finishes. In principle only NMI can preempt the handler, but NMI would see the kernel handler frame and not touch not-invalidated user page table. Handlers loop until they do not see zeroed scoreboard generations. This, together with hardware keeping one pending IPI in LAPIC IRR should prevent lost shootdowns. Notes. 1. The code does protect writes to LAPIC ICR with exclusion. I believe this is fine because we in fact do not send IPIs from interrupt handlers. More for !x2APIC mode where ICR access for write requires two registers write, we disable interrupts around it. If considered incorrect, I can add per-cpu spinlock around ipi_send(). 2. Scoreboard lines owned by given target CPU can be padded to the cache line, to reduce ping-pong. Reviewed by: markj (previous version) Discussed with: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D25510 Notes: svn path=/head/; revision=363195
* Reimplement stack capture of running threads on i386 and amd64.Mark Johnston2020-01-311-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | After r355784 the td_oncpu field is no longer synchronized by the thread lock, so the stack capture interrupt cannot be delievered precisely. Fix this using a loop which drops the thread lock and restarts if the wrong thread was sampled from the stack capture interrupt handler. Change the implementation to use a regular interrupt instead of an NMI. Now that we drop the thread lock, there is no advantage to the latter. Simplify the KPIs. Remove stack_save_td_running() and add a return value to stack_save_td(). On platforms that do not support stack capture of running threads, stack_save_td() returns EOPNOTSUPP. If the target thread is running in user mode, stack_save_td() returns EBUSY. Reviewed by: kib Reported by: mjg, pho Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23355 Notes: svn path=/head/; revision=357334
* Drop "All rights reserved" from my copyright statements.John Baldwin2019-03-061-1/+0
| | | | | | | | | Reviewed by: rgrimes MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D19485 Notes: svn path=/head/; revision=344855
* Dynamically allocate IRQ ranges on x86.John Baldwin2018-08-281-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, x86 used static ranges of IRQ values for different types of I/O interrupts. Interrupt pins on I/O APICs and 8259A PICs used IRQ values from 0 to 254. MSI interrupts used a compile-time-defined range starting at 256, and Xen event channels used a compile-time-defined range after MSI. Some recent systems have more than 255 I/O APIC interrupt pins which resulted in those IRQ values overflowing into the MSI range triggering an assertion failure. Replace statically assigned ranges with dynamic ranges. Do a single pass computing the sizes of the IRQ ranges (PICs, MSI, Xen) to determine the total number of IRQs required. Allocate the interrupt source and interrupt count arrays dynamically once this pass has completed. To minimize runtime complexity these arrays are only sized once during bootup. The PIC range is determined by the PICs present in the system. The MSI and Xen ranges continue to use a fixed size, though this does make it possible to turn the MSI range size into a tunable in the future. As a result, various places are updated to use dynamic limits instead of constants. In addition, the vmstat(8) utility has been taught to understand that some kernels may treat 'intrcnt' and 'intrnames' as pointers rather than arrays when extracting interrupt stats from a crashdump. This is determined by the presence (vs absence) of a global 'nintrcnt' symbol. This change reverts r189404 which worked around a buggy BIOS which enumerated an I/O APIC twice (using the same memory mapped address for both entries but using an IRQ base of 256 for one entry and a valid IRQ base for the second entry). Making the "base" of MSI IRQ values dynamic avoids the panic that r189404 worked around, and there may now be valid I/O APICs with an IRQ base above 256 which this workaround would incorrectly skip. If in the future the issue reported in PR 130483 reoccurs, we will have to add a pass over the I/O APIC entries in the MADT to detect duplicates using the memory mapped address and use some strategy to choose the "correct" one. While here, reserve room in intrcnts for the Hyper-V counters. PR: 229429, 130483 Reviewed by: kib, royger, cem Tested by: royger (Xen), kib (DMAR) Approved by: re (gjb) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D16861 Notes: svn path=/head/; revision=338360
* Remove some vestiges of IPI_LAZYPMAP on i386.John Baldwin2018-08-191-5/+0
| | | | | | | | | | | | | The support for lazy pmap invalidations on i386 was removed in r281707. This removes the constant for the IPI and stops accounting for it when sizing the interrupt count arrays. Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D16801 Notes: svn path=/head/; revision=338055
* Correct pseudo misspelling in sys/ commentsEd Maste2018-02-231-1/+1
| | | | | | | contrib code and #define in intel_ata.h unchanged. Notes: svn path=/head/; revision=329873
* PTI for amd64.Konstantin Belousov2018-01-171-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The implementation of the Kernel Page Table Isolation (KPTI) for amd64, first version. It provides a workaround for the 'meltdown' vulnerability. PTI is turned off by default for now, enable with the loader tunable vm.pmap.pti=1. The pmap page table is split into kernel-mode table and user-mode table. Kernel-mode table is identical to the non-PTI table, while usermode table is obtained from kernel table by leaving userspace mappings intact, but only leaving the following parts of the kernel mapped: kernel text (but not modules text) PCPU GDT/IDT/user LDT/task structures IST stacks for NMI and doublefault handlers. Kernel switches to user page table before returning to usermode, and restores full kernel page table on the entry. Initial kernel-mode stack for PTI trampoline is allocated in PCPU, it is only 16 qwords. Kernel entry trampoline switches page tables. then the hardware trap frame is copied to the normal kstack, and execution continues. IST stacks are kept mapped and no trampoline is needed for NMI/doublefault, but of course page table switch is performed. On return to usermode, the trampoline is used again, iret frame is copied to the trampoline stack, page tables are switched and iretq is executed. The case of iretq faulting due to the invalid usermode context is tricky, since the frame for fault is appended to the trampoline frame. Besides copying the fault frame and original (corrupted) frame to kstack, the fault frame must be patched to make it look as if the fault occured on the kstack, see the comment in doret_iret detection code in trap(). Currently kernel pages which are mapped during trampoline operation are identical for all pmaps. They are registered using pmap_pti_add_kva(). Besides initial registrations done during boot, LDT and non-common TSS segments are registered if user requested their use. In principle, they can be installed into kernel page table per pmap with some work. Similarly, PCPU can be hidden from userspace mapping using trampoline PCPU page, but again I do not see much benefits besides complexity. PDPE pages for the kernel half of the user page tables are pre-allocated during boot because we need to know pml4 entries which are copied to the top-level paging structure page, in advance on a new pmap creation. I enforce this to avoid iterating over the all existing pmaps if a new PDPE page is needed for PTI kernel mappings. The iteration is a known problematic operation on i386. The need to flush hidden kernel translations on the switch to user mode make global tables (PG_G) meaningless and even harming, so PG_G use is disabled for PTI case. Our existing use of PCID is incompatible with PTI and is automatically disabled if PTI is enabled. PCID can be forced on only for developer's benefit. MCE is known to be broken, it requires IST stack to operate completely correctly even for non-PTI case, and absolutely needs dedicated IST stack because MCE delivery while trampoline did not switched from PTI stack is fatal. The fix is pending. Reviewed by: markj (partially) Tested by: pho (previous version) Discussed with: jeff, jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=328083
* sys/x86: further adoption of SPDX licensing ID tags.Pedro F. Giffuni2017-11-271-0/+2
| | | | | | | | | | | | | | | Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Notes: svn path=/head/; revision=326263
* Add an ioapic_get_rid() function to obtain PCIe TLP requester-id forKonstantin Belousov2017-09-081-0/+2
| | | | | | | | | | | | | | | | | | | | the interrupt messages from given IOAPIC, if the IOAPIC can be enumerated on PCI bus. If IOAPIC has PCI binding, match the PCI device against MADT enumerated IOAPIC. Match is done first by registers window physical address, then by IOAPIC ID as read from the APIC ID register. PCI bsf address of the matched PCI device is the rid. Reviewed by: jhb Sponsored by: The FreeBSD Foundation Hardware provided by: Intel MFC after: 2 weeks X-Differential revision: https://reviews.freebsd.org/D12205 Notes: svn path=/head/; revision=323325
* x86: bump MAX_APIC_ID to 512Roger Pau Monné2017-08-101-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a new define to take int account the xAPIC ID limit, for systems where x2APIC is not available/reliable. Also change some of the usages of the APIC ID to use an unsigned int (which is the correct storage type to deal with x2APIC IDs as found in x2APIC MADT entries). This allows booting FreeBSD on a box with 256 CPUs and APIC IDs up to 295: FreeBSD/SMP: Multiprocessor System Detected: 256 CPUs FreeBSD/SMP: 1 package(s) x 64 core(s) x 4 hardware threads Package HW ID = 0 Core HW ID = 0 CPU0 (BSP): APIC ID: 0 CPU1 (AP/HT): APIC ID: 1 CPU2 (AP/HT): APIC ID: 2 CPU3 (AP/HT): APIC ID: 3 [...] Core HW ID = 73 CPU252 (AP): APIC ID: 292 CPU253 (AP/HT): APIC ID: 293 CPU254 (AP/HT): APIC ID: 294 CPU255 (AP/HT): APIC ID: 295 Submitted by: kib (previous version) Relnotes: yes MFC after: 1 month Reviewed by: kib Differential revision: https://reviews.freebsd.org/D11913 Notes: svn path=/head/; revision=322349
* x86: make the arrays that depend on MAX_APIC_ID dynamicRoger Pau Monné2017-08-101-1/+1
| | | | | | | | | | | | | | | | | | | So that MAX_APIC_ID can be bumped without wasting memory. Note that the usage of MAX_APIC_ID in the SRAT parsing forces the parser to allocate memory directly from the phys_avail physical memory array, which is not the best approach probably, but I haven't found any other way to allocate memory so early in boot. This memory is not returned to the system afterwards, but at least it's sized according to the maximum APIC ID found in the MADT table. Sponsored by: Citrix Systems R&D MFC after: 1 month Reviewed by: kib Differential revision: https://reviews.freebsd.org/D11912 Notes: svn path=/head/; revision=322348
* revert r315959 because it causes build problemsAndriy Gapon2017-03-271-4/+3
| | | | | | | | | | | | | The change introduced a dependency between genassym.c and header files generated from .m files, but that dependency is not specified in the make files. Also, the change could be not as useful as I thought it was. Reported by: dchagin, Manfred Antar <null@pozo.com>, and many others Notes: svn path=/head/; revision=316021
* specific end of interrupt implementation for AMD Local APICAndriy Gapon2017-03-251-3/+4
| | | | | | | | | | | | | | | | | The change is more intrusive than I would like because the feature requires that a vector number is written to a special register. Thus, now the vector number has to be provided to lapic_eoi(). It was readily available in the IO-APIC and MSI cases, but the IPI handlers required more work. Also, we now store the VMM IPI number in a global variable, so that it is available to the justreturn handler for the same reason. Reviewed by: kib MFC after: 6 weeks Differential Revision: https://reviews.freebsd.org/D9880 Notes: svn path=/head/; revision=315959
* Local APIC: add support for extended LVT entries found in AMD processorsAndriy Gapon2017-02-281-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The extended LVT entries can be used to configure interrupt delivery for various events that are internal to a processor and can use this feature. All current processors that support the feature have four of such entries. The entries are all masked upon the processor reset, but it's possible that firmware may use some of them. BIOS and Kernel Developer's Guides for some processor models do not assign any particular names to the extended LVTs, while other BKDGs provide names and suggested usage for them. However, there is no fixed mapping between the LVTs and the processor events in any processor model that supports the feature. Any entry can be assigned to any event. The assignment is done by programming an offset of an entry into configuration bits corresponding to an event. This change does not expose the flexibility that the feature offers. The change adds just a single method to configure a hardcoded extended LVT entry to deliver APIC_CMC_INT. The method is designed to be used with Machine Check Error Thresholding mechanism on supported processor models. For references please see BKDGs for families 10h - 16h and specifically descriptions of APIC30, APIC400, APIC[530:500] registers. For a description of the Error Thresholding mechanism see, for example, BKDG for family 10h, section 2.12.1.6. http://developer.amd.com/resources/developer-guides-manuals/ Thanks to jhb and kib for their suggestions. Reviewed by: kib Discussed with: jhb MFC after: 5 weeks Relnotes: maybe Differential Revision: https://reviews.freebsd.org/D9612 Notes: svn path=/head/; revision=314398
* Detect x2APIC mode on boot and obey it.Konstantin Belousov2016-09-191-0/+8
| | | | | | | | | | | | | | | | | | | | | If BIOS performed hand-off to OS with BSP LAPIC in the x2APIC mode, system usually consumes such configuration without a notice, since x2APIC is turned on by OS if possible (nop). But if BIOS simultaneously requested OS to not use x2APIC, code assumption that that xAPIC is active breaks. In my opinion, we cannot safely turn off x2APIC if control is passed in this mode. Make madt.c ignore user or BIOS requests to turn x2APIC off, and do not check the x2APIC black list. Just trust the config and try to continue, giving a warning in dmesg. Reported and tested by: Slawa Olhovchenkov <slw@zxy.spb.ru> (previous version) Diagnosed by and discussed with: avg Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=305978
* hyperv/vmbus: Rename ISR functionsSepherosa Ziehau2016-05-311-1/+0
| | | | | | | | | MFC after: 1 week Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D6601 Notes: svn path=/head/; revision=301015
* xen: Code cleanup and small bug fixesRoger Pau Monné2015-10-211-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xen/hypervisor.h: - Remove unused helpers: MULTI_update_va_mapping, is_initial_xendomain, is_running_on_xen - Remove unused define CONFIG_X86_PAE - Remove unused variable xen_start_info: note that it's used inpcifront which is not built at all - Remove forward declaration of HYPERVISOR_crash xen/xen-os.h: - Remove unused define CONFIG_X86_PAE - Drop unused helpers: test_and_clear_bit, clear_bit, force_evtchn_callback - Implement a generic version (based on ofed/include/linux/bitops.h) of set_bit and test_bit and prefix them by xen_ to avoid any use by other code than Xen. Note that It would be worth to investigate a generic implementation in FreeBSD. - Replace barrier() by __compiler_membar() - Replace cpu_relax() by cpu_spinwait(): it's exactly the same as rep;nop = pause xen/xen_intr.h: - Move the prototype of xen_intr_handle_upcall in it: Use by all the platform x86/xen/xen_intr.c: - Use BITSET* for the enabledbits: Avoid to use custom helpers - test_bit/set_bit has been renamed to xen_test_bit/xen_set_bit - Don't export the variable xen_intr_pcpu dev/xen/blkback/blkback.c: - Fix the string format when XBB_DEBUG is enabled: host_addr is typed uint64_t dev/xen/balloon/balloon.c: - Remove set but not used variable - Use the correct type for frame_list: xen_pfn_t represents the frame number on any architecture dev/xen/control/control.c: - Return BUS_PROBE_WILDCARD in xs_probe: Returning 0 in a probe callback means the driver can handle this device. If by any chance xenstore is the first driver, every new device with the driver is unset will use xenstore. dev/xen/grant-table/grant_table.c: - Remove unused cmpxchg - Drop unused include opt_pmap.h: Doesn't exist on ARM64 and it doesn't contain anything required for the code on x86 dev/xen/netfront/netfront.c: - Use the correct type for rx_pfn_array: xen_pfn_t represents the frame number on any architecture dev/xen/netback/netback.c: - Use the correct type for gmfn: xen_pfn_t represents the frame number on any architecture dev/xen/xenstore/xenstore.c: - Return BUS_PROBE_WILDCARD in xctrl_probe: Returning 0 in a probe callback means the driver can handle this device. If by any chance xenstore is the first driver, every new device with the driver is unset will use xenstore. Note that with the changes, x86/include/xen/xen-os.h doesn't contain anymore arch-specific code. Although, a new series will add some helpers that differ between x86 and ARM64, so I've kept the headers for now. Submitted by: Julien Grall <julien.grall@citrix.com> Reviewed by: royger Differential Revision: https://reviews.freebsd.org/D3921 Sponsored by: Citrix Systems R&D Notes: svn path=/head/; revision=289686
* Add stack_save_td_running(), a function to trace the kernel stack of aMark Johnston2015-09-111-1/+3
| | | | | | | | | | | | | | | | | | | | running thread. It is currently implemented only on amd64 and i386; on these architectures, it is implemented by raising an NMI on the CPU on which the target thread is currently running. Unlike stack_save_td(), it may fail, for example if the thread is running in user mode. This change also modifies the kern.proc.kstack sysctl to use this function, so that stacks of running threads are shown in the output of "procstat -kk". This is handy for debugging threads that are stuck in a busy loop. Reviewed by: bdrewery, jhb, kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3256 Notes: svn path=/head/; revision=287645
* Microsoft vmbus, storage and other related driver enhancements for HyperV.Wei Hu2015-04-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | - Vmbus multi channel support. - Vector interrupt support. - Signal optimization. - Storvsc driver performance improvement. - Scatter and gather support for storvsc driver. - Minor bug fix for KVP driver. Thanks royger, jhb and delphij from FreeBSD community for the reviews and comments. Also thanks Hovy Xu from NetApp for the contributions to the storvsc driver. PR: 195238 Submitted by: whu Reviewed by: royger, jhb, delphij Approved by: royger MFC after: 2 weeks Relnotes: yes Sponsored by: Microsoft OSTC Notes: svn path=/head/; revision=282212
* Use VT-d interrupt remapping block (IR) to perform FSB messagesKonstantin Belousov2015-03-191-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | translation. In particular, despite IO-APICs only take 8bit apic id, IR translation structures accept 32bit APIC Id, which allows x2APIC mode to function properly. Extend msi_cpu of struct msi_intrsrc and io_cpu of ioapic_intsrc to full int from one byte. KPI of IR is isolated into the x86/iommu/iommu_intrmap.h, to avoid bringing all dmar headers into interrupt code. The non-PCI(e) devices which generate message interrupts on FSB require special handling. The HPET FSB interrupts are remapped, while DMAR interrupts are not. For each msi and ioapic interrupt source, the iommu cookie is added, which is in fact index of the IRE (interrupt remap entry) in the IR table. Cookie is made at the source allocation time, and then used at the map time to fill both IRE and device registers. The MSI address/data registers and IO-APIC redirection registers are programmed with the special values which are recognized by IR and used to restore the IRE index, to find proper delivery mode and target. Map all MSI interrupts in the block when msi_map() is called. Since an interrupt source setup and dismantle code are done in the non-sleepable context, flushing interrupt entries cache in the IR hardware, which is done async and ideally waits for the interrupt, requires busy-wait for queue to drain. The dmar_qi_wait_for_seq() is modified to take a boolean argument requesting busy-wait for the written sequence number instead of waiting for interrupt. Some interrupts are configured before IR is initialized, e.g. ACPI SCI. Add intr_reprogram() function to reprogram all already configured interrupts, and call it immediately before an IR unit is enabled. There is still a small window after the IO-APIC redirection entry is reprogrammed with cookie but before the unit is enabled, but to fix this properly, IR must be started much earlier. Add workarounds for 5500 and X58 northbridges, some revisions of which have severe flaws in handling IR. Use the same identification methods as employed by Linux. Review: https://reviews.freebsd.org/D1892 Reviewed by: neel Discussed with: jhb Tested by: glebius, pho (previous versions) Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Notes: svn path=/head/; revision=280260
* Add x86 specific APIs 'lapic_ipi_alloc()' and 'lapic_ipi_free()' to allow IPINeel Natu2015-03-141-6/+26
| | | | | | | | | | | | | | | vectors to be dynamically allocated. This allows kernel modules like vmm.ko to allocate unique IPI slots when loaded (as opposed to hard allocating one or more vectors). Also, reorganize the fixed IPI vectors to create a contiguous space for dynamic IPI allocation. Reviewed by: kib, jhb Differential Revision: https://reviews.freebsd.org/D2042 Notes: svn path=/head/; revision=279970
* Free up the IPI slot used by IPI_STOP_HARD.Neel Natu2015-03-011-1/+6
| | | | | | | | | | | Change the numeric value of IPI_STOP_HARD so it doesn't occupy a valid IPI slot. This can be done because IPI_STOP_HARD is actually delivered via NMI. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D1983 Notes: svn path=/head/; revision=279468
* Implements EOI suppression mode, where LAPIC on EOI command forKonstantin Belousov2015-02-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | level-triggered interrupt does not broadcast the EOI message to all APICs in the system. Instead, interrupt handler must follow LAPIC EOI with IOAPIC EOI. For modern IOAPICs, the later is done by writing to EOIR register. Otherwise, Intel provided Linux with a trick of temporary switching the pin config to edge and then back to level. Detect presence of EOIR register by reading IO-APIC version. The summary table in the comments was taken from the Linux kernel. For Intel, newer IO-APICs are only briefly documented as part of the ICH/PCH datasheet. According to the BKDG and chipset documentation, AMD LAPICs do not provide EOI suppression, althought IO-APICs do declare version 0x21 and implement EOIR. The trick to temporary switch pin to edge mode to clear IRR was tested on modern chipset, by pretending that EOIR is not present, i.e. by forcing io_haseoi to zero. Tunable hw.lapic_eoi_suppression disables the optimization. Reviewed by: neel Tested by: pho Review: https://reviews.freebsd.org/D1943 Sponsored by: The FreeBSD Foundation MFC after: 2 months Notes: svn path=/head/; revision=279319
* Add x2APIC support. Enable it by default if CPU is capable. TheKonstantin Belousov2015-02-091-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | hw.x2apic_enable tunable allows disabling it from the loader prompt. To closely repeat effects of the uncached memory ops when accessing registers in the xAPIC mode, the x2APIC writes to MSRs are preceeded by mfence, except for the EOI notifications. This is probably too strict, only ICR writes to send IPI require serialization to ensure that other CPUs see the previous actions when IPI is delivered. This may be changed later. In vmm justreturn IPI handler, call doreti_iret instead of doing iretd inline, to handle corner conditions. Note that the patch only switches LAPICs into x2APIC mode. It does not enables FreeBSD to support > 255 CPUs, which requires parsing x2APIC MADT entries and doing interrupts remapping, but is the required step on the way. Reviewed by: neel Tested by: pho (real hardware), neel (on bhyve) Discussed with: jhb, grehan Sponsored by: The FreeBSD Foundation MFC after: 2 months Notes: svn path=/head/; revision=278473
* amd64/i386: introduce APIC hooks for different APIC implementations.Roger Pau Monné2014-06-161-31/+230
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is needed for Xen PV(H) guests, since there's no hardware lapic available on this kind of domains. This commit should not change functionality. Sponsored by: Citrix Systems R&D Reviewed by: jhb Approved by: gibbs amd64/include/cpu.h: amd64/amd64/mp_machdep.c: i386/include/cpu.h: i386/i386/mp_machdep.c: - Remove lapic_ipi_vectored hook from cpu_ops, since it's now implemented in the lapic hooks. amd64/amd64/mp_machdep.c: i386/i386/mp_machdep.c: - Use lapic_ipi_vectored directly, since it's now an inline function that will call the appropiate hook. x86/x86/local_apic.c: - Prefix bare metal public lapic functions with native_ and mark them as static. - Define default implementation of apic_ops. x86/include/apicvar.h: - Declare the apic_ops structure and create inline functions to access the hooks, so the change is transparent to existing users of the lapic_ functions. x86/xen/hvm.c: - Switch to use the new apic_ops. Notes: svn path=/head/; revision=267526
* Drop the 3rd clause from all 3 clause BSD licenses where I am the soleJohn Baldwin2014-02-051-3/+0
| | | | | | | | | holder to convert them to 2 clause BSD licenses. MFC after: 1 week Notes: svn path=/head/; revision=261520
* Move <machine/apicvar.h> to <x86/apicvar.h>.John Baldwin2014-01-231-0/+225
Notes: svn path=/head/; revision=261087