summaryrefslogtreecommitdiff
path: root/sys
Commit message (Collapse)AuthorAgeFilesLines
* Filter TCP connections to SO_REUSEPORT_LB listen sockets by NUMA domainAndrew Gallatin2020-12-196-34/+134
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to efficiently serve web traffic on a NUMA machine, one must avoid as many NUMA domain crossings as possible. With SO_REUSEPORT_LB, a number of workers can share a listen socket. However, even if a worker sets affinity to a core or set of cores on a NUMA domain, it will receive connections associated with all NUMA domains in the system. This will lead to cross-domain traffic when the server writes to the socket or calls sendfile(), and memory is allocated on the server's local NUMA node, but transmitted on the NUMA node associated with the TCP connection. Similarly, when the server reads from the socket, he will likely be reading memory allocated on the NUMA domain associated with the TCP connection. This change provides a new socket ioctl, TCP_REUSPORT_LB_NUMA. A server can now tell the kernel to filter traffic so that only incoming connections associated with the desired NUMA domain are given to the server. (Of course, in the case where there are no servers sharing the listen socket on some domain, then as a fallback, traffic will be hashed as normal to all servers sharing the listen socket regardless of domain). This allows a server to deal only with traffic that is local to its NUMA domain, and avoids cross-domain traffic in most cases. This patch, and a corresponding small patch to nginx to use TCP_REUSPORT_LB_NUMA allows us to serve 190Gb/s of kTLS encrypted https media content from dual-socket Xeons with only 13% (as measured by pcm.x) cross domain traffic on the memory controller. Reviewed by: jhb, bz (earlier version), bcr (man page) Tested by: gonzo Sponsored by: Netfix Differential Revision: https://reviews.freebsd.org/D21636 Notes: svn path=/head/; revision=368819
* Optionally bind ktls threads to NUMA domainsAndrew Gallatin2020-12-191-3/+39
| | | | | | | | | | | | | | | | | | | | | | When ktls_bind_thread is 2, we pick a ktls worker thread that is bound to the same domain as the TCP connection associated with the socket. We use roughly the same code as netinet/tcp_hpts.c to do this. This allows crypto to run on the same domain as the TCP connection is associated with. Assuming TCP_REUSPORT_LB_NUMA (D21636) is in place & in use, this ensures that the crypto source and destination buffers are local to the same NUMA domain as we're running crypto on. This change (when TCP_REUSPORT_LB_NUMA, D21636, is used) reduces cross-domain traffic from over 37% down to about 13% as measured by pcm.x on a dual-socket Xeon using nginx and a Netflix workload. Reviewed by: jhb Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21648 Notes: svn path=/head/; revision=368818
* Ensure a minimum packet length before creating a mbuf in if_ure.Hans Petter Selasky2020-12-191-1/+1
| | | | | | | Sponsored by: Mellanox Technologies // NVIDIA Networking Notes: svn path=/head/; revision=368801
* Move SYSCTL_ADD_PROC() to unlocked context in if_ure to avoid lock order ↵Hans Petter Selasky2020-12-191-9/+9
| | | | | | | | | | | reversal. MFC after: 1 week Reported by: Mark Millard <marklmi@yahoo.com> Sponsored by: Mellanox Technologies // NVIDIA Networking Notes: svn path=/head/; revision=368799
* kern: cpuset: allow jails to modify child jails' rootsKyle Evans2020-12-191-5/+20
| | | | | | | | | | | | | | | | | | | | | | | This partially lifts a restriction imposed by r191639 ("Prevent a superuser inside a jail from modifying the dedicated root cpuset of that jail") that's perhaps beneficial after r192895 ("Add hierarchical jails."). Jails still cannot modify their own cpuset, but they can modify child jails' roots to further restrict them or widen them back to the modifying jails' own mask. As a side effect of this, the system root may once again widen the mask of jails as long as they're still using a subset of the parent jails' mask. This was previously prevented by the fact that cpuset_getroot of a root set will return that root, rather than the root's parent -- cpuset_modify uses cpuset_getroot since it was introduced in r327895, previously it was just validating against set->cs_parent which allowed the system root to widen jail masks. Reviewed by: jamie MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D27352 Notes: svn path=/head/; revision=368779
* usb: Replace ITUNERNET vendor with MICROCHIP and improve product namesJessica Clarke2020-12-182-6/+6
| | | | | | | | | | | | | | | These Mini-Box LCDs are using Microchip components and sub-licensed product IDs. Whilst here, update the constant names and descriptions for the products to use the names listed on the manufacturer's website rather than vague ones. The picoLCD 4x20 is named that on the manufacturer's website so prefer that name, even though linux-usb.org lists it with the numbers reversed as one might expect. Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D27670 Notes: svn path=/head/; revision=368774
* Add ELF flag to disable ASLR stack gap.Konstantin Belousov2020-12-185-6/+15
| | | | | | | | | | | | Also centralize and unify checks to enable ASLR stack gap in a new helper exec_stackgap(). PR: 239873 Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=368772
* proc.h: Reformat P_ and P2_ definitions.Konstantin Belousov2020-12-181-45/+66
| | | | | | | | | | | | Use traditional explicit leading zero format for hex numbers. Align P2_ hex values. Wrap long lines by splitting comments. Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=368771
* Switch direct rt fields access in rtsock.c to newly-create field acessors.Alexander V. Chernikov2020-12-181-38/+121
| | | | | | | | | | | | | | | | | | rtsock code was build around the assumption that each rtentry record in the system radix tree is a ready-to-use sockaddr. This assumptions turned out to be not quite true: * masks have their length tweaked, so we have rtsock_fix_netmask() hack * IPv6 addresses have their scope embedded, so we have another explicit deembedding hack. Change the code to decouple rtentry internals from rtsock code using newly-created rtentry accessors. This will allow to eventually eliminate both of the hacks and change rtentry dst/mask format. Differential Revision: https://reviews.freebsd.org/D27451 Notes: svn path=/head/; revision=368769
* Skip the vm.pmap.kernel_maps sysctl by default.John Baldwin2020-12-184-4/+4
| | | | | | | | | | | This sysctl node can generate very verbose output, so don't trigger it for sysctl -a or sysctl vm.pmap. Reviewed by: markj, kib Differential Revision: https://reviews.freebsd.org/D27504 Notes: svn path=/head/; revision=368768
* riscv: report additional known SBI implementationsMitchell Horne2020-12-182-0/+16
| | | | | | | | | | | | These implementation IDs are defined in the SBI spec, so we should print their name if detected. Submitted by: Danjel Qyteza <danq1222@gmail.com> Reviewed by: jhb, kp Differential Revision: https://reviews.freebsd.org/D27660 Notes: svn path=/head/; revision=368767
* arm64: rk3399: Export the watchdog clockEmmanuel Vadot2020-12-181-0/+5
| | | | | | | | This clock is used by the watchdog IP and can be controlled only in the secure world. Notes: svn path=/head/; revision=368766
* amd64: use register macros for gdb_cpu_getreg()Mitchell Horne2020-12-182-22/+25
| | | | | | | | | | | Prefer these newly-added definitions to bare values. MFC after: 2 weeks Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. Notes: svn path=/head/; revision=368765
* amd64: allow gdb(4) to write to most registersMitchell Horne2020-12-182-4/+50
| | | | | | | | | | | | | | | Similar to the recent patch to arm's gdb stub in r368414, allow GDB to update the contents of most general purpose registers. Reviewed by: cem, jhb, markj MFC after: 2 weeks Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. NetApp PR: 44 Differential Revision: https://reviews.freebsd.org/D27642 Notes: svn path=/head/; revision=368764
* acpi: Ensure that adjacent memory affinity table entries are coalescedMark Johnston2020-12-181-12/+25
| | | | | | | | | | | | | | | | | | | | The SRAT may contain multiple distinct entries that together describe a contiguous region of physical memory. In this case we were not coalescing the corresponding entries in the memory affinity table, which led to fragmented phys_avail[] entries. Since r338431 the vm_phys_segs[] entries derived from phys_avail[] will be coalesced, resulting in a situation where vm_phys_segs[] entries do not have a covering phys_avail[] entry. vm_page_startup() will not add such segments to the physical memory allocator, leaving them unused. Reported by: Don Morris <dgmorris@earthlink.net> Reviewed by: kib, vangyzen MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27620 Notes: svn path=/head/; revision=368763
* virtio_mmio: Fix feature negotiation copy-paste issue in r361943Jessica Clarke2020-12-181-2/+2
| | | | | | | | | | | | | | This caused us to write to the low half of the feature word twice, once with the high bits and once with the low bits. Common legacy device implementations seem to be fairly lenient about being able to write to the feature bits multiple times, but Arm's models use a stricter implementation that will ignore the second write. This fixes using vtnet(4) on those models. Reported by: Jean-Philippe Brucker <jean-philippe@linaro.org> Pointy hat: jrtc27 Notes: svn path=/head/; revision=368761
* pci_iov: When pci_iov_detach(9) is called, destroy VF childrenKonstantin Belousov2020-12-181-15/+38
| | | | | | | | | | | | | | | | | instead of bailing out with EBUSY if there are any. If driver module is unloaded, or just device is forcibly detached from the driver, there is no way for driver to correctly unload otherwise. Esp. if there are resources dedicated to the VFs which prevent turning down other resources. Reviewed by: jhb Sponsored by: Mellanox Technologies / NVidia Networking MFC after: 1 week Differential revision: https://reviews.freebsd.org/D27615 Notes: svn path=/head/; revision=368749
* ice: quiet -Wredundant-declsRyan Libby2020-12-171-9/+0
| | | | | | | | | | | Reapply r364240 after driver update in r365617. Reviewed by: lwhsu Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D27561 Notes: svn path=/head/; revision=368745
* VFS_QUOTACTL: Remove needless casts of argBrooks Davis2020-12-171-7/+7
| | | | | | | | | | | | | | | The argument is a void * so there's no need to cast it to caddr_t. Update documentation to match function decleration. Reviewed by: freqlabs Obtained from: CheriBSD MFC after: 1 week Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D27093 Notes: svn path=/head/; revision=368744
* Use __containerof() instead of home-rolled versions.John Baldwin2020-12-175-11/+5
| | | | | | | | | | Reviewed by: imp, hselasky Obtained from: CheriBSD Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D27582 Notes: svn path=/head/; revision=368741
* Use a template assembly file for firmware object files.John Baldwin2020-12-174-16/+60
| | | | | | | | | | | | | | | | | | | | | Similar to r366897, this uses the .incbin directive to pull in a firmware file's contents into a .fwo file. The same scheme for computing symbol names from the filename is used as before to maximize compatiblity and not require rebuilding existing .fwo files for NO_CLEAN builds. Using ld -o binary requires extra hacks in linkers to either specify ABI options (e.g. soft- vs hard-float) or to ignore ABI incompatiblities when linking certain objects (e.g. object files with only data). Using the compiler driver avoids the need for these hacks as the compiler driver is able to set all the appropriate ABI options. Reviewed by: imp, markj Obtained from: CheriBSD Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D27579 Notes: svn path=/head/; revision=368739
* Cleanups to *ERR* compat shims.John Baldwin2020-12-171-7/+7
| | | | | | | | | | | | | | - Use [u]intptr_t casts to convert pointers to integers. - Change IS_ERR* to return bool instead of long. Reviewed by: manu Obtained from: CheriBSD Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D27577 Notes: svn path=/head/; revision=368738
* Fix a race in tty_signal_sessleader() with unlocked read of s_leader.Konstantin Belousov2020-12-171-2/+9
| | | | | | | | | | | | | | | | Since we do not own the session lock, a parallel killjobc() might reset s_leader to NULL after we checked it. Read s_leader only once and ensure that compiler is not allowed to reload. While there, make access to t_session somewhat more pretty by using local variable. PR: 251915 Submitted by: Jakub Piecuch <j.piecuch96@gmail.com> MFC after: 1 week Notes: svn path=/head/; revision=368735
* fd: reimplement close_range to avoid spurious relockingMateusz Guzik2020-12-171-25/+30
| | | | Notes: svn path=/head/; revision=368732
* audit: rework AUDIT_SYSCLOSEMateusz Guzik2020-12-173-20/+19
| | | | | | | This in particular avoids spurious lookups on close. Notes: svn path=/head/; revision=368731
* fd: refactor closefp in preparation for close_range reworkMateusz Guzik2020-12-171-21/+43
| | | | Notes: svn path=/head/; revision=368730
* [ng_socket] Don't take the SOCKBUF_LOCK() twice in the RX data path.Aleksandr Fedorov2020-12-171-2/+9
| | | | | | | | | | | | This is just a minor optimization, but it's sensitive. This gives an improvement of 30-50 kpps. Reviewed by: kp, markj, glebius, lutz_donnerhacke.de Approved by: vmaffione (mentor) Sponsored by: vstack.com Differential Revision: https://reviews.freebsd.org/D27382 Notes: svn path=/head/; revision=368727
* Add IRQ resource to SPIBUSEmmanuel Vadot2020-12-173-0/+44
| | | | | | | | | | | | | | Add capability to SPIBUS to have child device with IRQ. For example many ADC chip have a dedicated pin to signal "data ready" and the host can just wait for a interrupt to go out and read the result. It is the same code as in R282674 and R282702 for IICBUS by Michal Meloun Submitted by: Oskar Holmund <oskar.holmlund@ohdata.se> Differential Revision: https://reviews.freebsd.org/D27396 Notes: svn path=/head/; revision=368725
* arm: Remove samsung exnynos portEmmanuel Vadot2020-12-1727-6236/+0
| | | | | | | | | | Remove the exynos SoC support, this haven't been updated in a while, isn't present in GENERIC and nobody is motivated to resurect it. Differential Revision: https://reviews.freebsd.org/D24444 Notes: svn path=/head/; revision=368724
* Make non-debug kernels installable.Nathan Whitehorn2020-12-171-2/+0
| | | | | | | | Setting DEBUG_FLAGS results in make installkernel trying to install debug information that doesn't exist if the kernel was built without it. Notes: svn path=/head/; revision=368718
* newvers.sh: Speed up git_tree_modifiedBrooks Davis2020-12-171-23/+1
| | | | | | | | | | | | | | | | | We're looking for file content differences, so ask the question of git more directly. This helps a lot, saving tens of thousands of fork()s, when the builder and editor see different stat() results (e.g., UIDs), as they might with containers. Submitted by: Nathaniel Wesley Filardo <nwf20@cl.cam.ac.uk> Reviewed by: bdrewery, emaste, imp Obtained from: CheriBSD MFC after: 3 days Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D27646 Notes: svn path=/head/; revision=368709
* fd: remove redundant saturation check from fget_unlocked_seqMateusz Guzik2020-12-161-7/+0
| | | | | | | | | | | refcount_acquire_if_not_zero returns true on saturation. The case of 0 is handled by looping again, after which the originally found pointer will no longer be there. Noted by: kib Notes: svn path=/head/; revision=368703
* Fix whitespace in r368698Jessica Clarke2020-12-161-1/+1
| | | | | | | MFC with: r368698 Notes: svn path=/head/; revision=368700
* Fix whitespace in comment modified by r368697Jessica Clarke2020-12-161-1/+1
| | | | Notes: svn path=/head/; revision=368699
* Use the standard method for localizing of MSI-X table bar.Michal Meloun2020-12-162-5/+2
| | | | | | | | | | | | | | Current way, hardcoded value plus heuristic is not conform to the PCI(e) specification and it fails on systems where MSI-X bar is not initialized by BIOS/ACPI (many arm or arm64 systems for example). Instead, use the standard PCI(e) capability for determining of MSIX table bar address. MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D27265 Notes: svn path=/head/; revision=368698
* Allocate right number of pages for the bounced buffers crossing the page.Michal Meloun2020-12-161-5/+13
| | | | | | | | | | | | | | | | | | | | One of the disadvantages of our current busdma code is the fact that we process the bounced buffer in a page-by-page manner. This means that the short (subpage) buffer allocated across page boundaries is bounced to 2 separate pages. This suboptimal behavior is consistent across all platforms and can be related to (probably unimplementable or incompatible with bouncing) BUS_DMA_KEEP_PG_OFFSET flag. Therefore, allocate one additional page to be fully comply with this requirement. Discused with: markj PR: 251018 Notes: svn path=/head/; revision=368697
* Use more standard types for manipulating pointers.John Baldwin2020-12-161-2/+2
| | | | | | | | | | | | | | | | - Use a uintptr_t cast to get the virtual address of a pointer in USB_P2U() instead of a ptrdiff_t. - Add offsets to a char * pointer directly without roundtripping the pointer through a ptrdiff_t in USB_ADD_BYTES(). Reviewed by: imp, hselasky Obtained from: CheriBSD Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D27581 Notes: svn path=/head/; revision=368688
* Use uintptr_t instead of unsigned long for integers holding pointers.John Baldwin2020-12-161-4/+4
| | | | | | | | | | Reviewed by: imp, gallatin Obtained from: CheriBSD Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D27580 Notes: svn path=/head/; revision=368687
* Use uintptr_t instead of unsigned long for pointers.John Baldwin2020-12-161-4/+4
| | | | | | | | | | | | | | | | The sense_ptr thing is quite broken. As near as I can tell, the driver tries to copyout to a physical address rather than whatever user address the sense buffer should be copied to. It is not immediately obvious what user address the sense buffer should be copied to. Reviewed by: imp Obtained from: CheriBSD Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D27578 Notes: svn path=/head/; revision=368686
* Use the 't' modifier to print a ptrdiff_t.John Baldwin2020-12-161-1/+1
| | | | | | | | | | Reviewed by: imp Obtained from: CheriBSD Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D27576 Notes: svn path=/head/; revision=368685
* Revert r368523 which fixed contig allocs waiting forever.Bryan Drewery2020-12-151-16/+4
| | | | | | | | | | This needs to account for empty NUMA domains or domains which do not satisfy the requested range. Discussed with: markj Notes: svn path=/head/; revision=368673
* Improve handling of alternate settings in the USB stack.Hans Petter Selasky2020-12-151-3/+3
| | | | | | | | | | | | | Move initialization of num_altsetting under USB_CFG_INIT, else there will be a page fault when enumerating USB devices. PR: 251856 MFC after: 1 week Submitted by: Ma, Horse <Shichun.Ma@dell.com> Sponsored by: Mellanox Technologies // NVIDIA Networking Notes: svn path=/head/; revision=368664
* Improve handling of alternate settings in the USB stack.Hans Petter Selasky2020-12-155-10/+22
| | | | | | | | | | | | | | | | | Allow setting the alternate interface number to fail when there is only one alternate setting present, to comply with the USB specification. Refactor how iface->num_altsetting is computed. Bump the __FreeBSD_version due to change of core USB structure. PR: 251856 MFC after: 1 week Submitted by: Ma, Horse <Shichun.Ma@dell.com> Sponsored by: Mellanox Technologies // NVIDIA Networking Notes: svn path=/head/; revision=368659
* Improve handling of alternate settings in the USB stack.Hans Petter Selasky2020-12-151-2/+14
| | | | | | | | | | | | | Limit the number of alternate settings to 256. Else the alternate index variable may wrap around. PR: 251856 MFC after: 1 week Submitted by: Ma, Horse <Shichun.Ma@dell.com> Sponsored by: Mellanox Technologies // NVIDIA Networking Notes: svn path=/head/; revision=368658
* Fix LINT-NOINET6 build after r368571.Alexander V. Chernikov2020-12-141-3/+11
| | | | | | | Reported by: mjg Notes: svn path=/head/; revision=368651
* amd64 pmap: fix PCID mode invalidationsKonstantin Belousov2020-12-142-159/+154
| | | | | | | | | | | | | | | | | | | | | | | | | | | When r362031 moved local TLB invalidation after shootdown IPI send, it moved too much. In particular, PCID-mode clearing of the pm_gen generation counters must occur before IPIs are send, which is in fact described by the comment before seq_cst fence in the invalidation functions. Fix it by extracting pm_gen clearing into new helper pmap_invalidate_preipi(), which is executed before a call to smp_masked_tlb_shootdown(). Rest of the local invalidation callbacks is simplified as result, and become very similar to the remote shootdown handlers (to be merged in some future). Move pin of the thread to pmap_invalidate_preipi(), and do unpin in smp_masked_tlb_shootdown(). Reported and tested by: mjg (previous version) Reviewed by: alc, cem (previous version), markj Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D227588 Notes: svn path=/head/; revision=368649
* Enable ROUTE_MPATH support in GENERIC kernels.Alexander V. Chernikov2020-12-145-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | Ability to load-balance traffic over multiple path is a must-have thing for routers. It may be used by the servers to balance outgoing traffic over multiple default gateways. The previous implementation, RADIX_MPATH stayed in the shadow for too long. It was not well maintained, which lead us to a vicious circle - people were using non-contiguous mask or firewalls to achieve similar goals. As a result, some routing daemons implementation still don't have multipath support enabled for FreeBSD. Turning on ROUTE_MPATH by default would fix it. It will allow to reduce networking feature gap to other operating systems. Linux and OpenBSD enabled similar support at least 5 years ago. ROUTE_MPATH does not consume memory unless actually used. It enables around ~1k LOC. It does not bring any behaviour changes for userland. Additionally, feature is (temporarily) turned off by the net.route.multipath sysctl defaulting to 0. Differential Revision: https://reviews.freebsd.org/D27428 Notes: svn path=/head/; revision=368648
* Remove unused functions and variables in cpufunc.[ch].Michal Meloun2020-12-144-125/+3
| | | | Notes: svn path=/head/; revision=368635
* Finish implementation of ARM PMU interrupts.Michal Meloun2020-12-141-35/+171
| | | | | | | | | | | | | The ARM PMU may use single per-core interrupt or may use multiple generic interrupts, one per core. In this case, special attention must be paid to the correct identification of the physical location of the core, its order in the external database (FDT) and the associated cpuid. Also keep in mind that a SoC can have multiple different PMUs (usually one per cluster) Notes: svn path=/head/; revision=368634
* Verify (and fix) the context_id argument passed to the mpentry () by PSCI.Michal Meloun2020-12-141-0/+15
| | | | | | | | | | | | Some older PSCI implementations corrupt (or do not pass) the context_id argument to newly started secondary cores. Although the ideal solution to this problem is u-boot update, we can find the correct value for the argument (cpuid) by comparing of real core mpidr register with the value stored in pcu->mpidr. MFC after: 2 weeks Notes: svn path=/head/; revision=368633