aboutsummaryrefslogtreecommitdiff
path: root/sys/netinet/in_rss.c
Commit message (Collapse)AuthorAgeFilesLines
* sys: Remove $FreeBSD$: two-line .c patternWarner Losh2023-08-161-3/+0
| | | | Remove /^#include\s+<sys/cdefs.h>.*$\n\s+__FBSDID\("\$FreeBSD\$"\);\n/
* in_rss: fix set but not used warningKristof Provost2022-05-071-2/+0
| | | | | | | If 'options RSS' is set. MFC after: 1 week Sponsored by: Orange Business Services
* Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816"Cy Schubert2021-12-021-5/+0
| | | | | | | | This reverts commit 266f97b5e9a7958e365e78288616a459b40d924a, reversing changes made to a10253cffea84c0c980a36ba6776b00ed96c3e3b. A mismerge of a merge to catch up to main resulted in files being committed which should not have been.
* wpa: Import wpa_supplicant/hostapd commit 14ab4a816Cy Schubert2021-12-021-0/+5
| | | | | | This is the November update to vendor/wpa committed upstream 2021-11-26. MFC after: 1 month
* Remove "options PCBGROUP"Gleb Smirnoff2021-12-021-1/+0
| | | | | | | | | | | | | | | | With upcoming changes to the inpcb synchronisation it is going to be broken. Even its current status after the move of PCB synchronization to the network epoch is very questionable. This experimental feature was sponsored by Juniper but ended never to be used in Juniper and doesn't exist in their source tree [sjg@, stevek@, jtl@]. In the past (AFAIK, pre-epoch times) it was tried out at Netflix [gallatin@, rrs@] with no positive result and at Yandex [ae@, melifaro@]. I'm up to resurrecting it back if there is any interest from anybody. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33020
* Allow to compile RSS without PCBGROUP.Gleb Smirnoff2021-12-021-4/+0
| | | | | Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33019
* Implement flowid calculation for outbound connections to balanceAlexander V. Chernikov2020-10-181-0/+42
| | | | | | | | | | | | | | | | | | | connections over multiple paths. Multipath routing relies on mbuf flowid data for both transit and outbound traffic. Current code fills mbuf flowid from inp_flowid for connection-oriented sockets. However, inp_flowid is currently not calculated for outbound connections. This change creates simple hashing functions and starts calculating hashes for TCP,UDP/UDP-Lite and raw IP if multipath routes are present in the system. Reviewed by: glebius (previous version),ae Differential Revision: https://reviews.freebsd.org/D26523 Notes: svn path=/head/; revision=366813
* Rename rss_soft_m2cpuid() -> rss_soft_m2cpuid_v4() in preparation forAdrian Chadd2015-08-291-1/+1
| | | | | | | | | | an IPv6 version to show up. Submitted by: Tiwei Bie <btw@mail.ustc.edu.cn> Differential Revision: https://reviews.freebsd.org/D3504 Notes: svn path=/head/; revision=287277
* Replace the printf()s with optional rate limited debugging for RSS.Adrian Chadd2015-08-281-7/+7
| | | | | | | | Submitted by: Tiwei Bie <btw@mail.ustc.edu.cn> Differential Revision: https://reviews.freebsd.org/D3471 Notes: svn path=/head/; revision=287245
* Correctly const-ify things.Adrian Chadd2015-03-181-2/+2
| | | | | | | Found by: clang 3.6 Notes: svn path=/head/; revision=280202
* Refactor / restructure the RSS code into generic, IPv4 and IPv6 specificAdrian Chadd2015-01-181-556/+14
| | | | | | | | | | | | | | | | | | | | | | | | | bits. The motivation here is to eventually teach netisr and potentially other networking subsystems a bit more about how RSS work queues / buckets are configured so things have a hope of auto-configuring in the future. * net/rss_config.[ch] takes care of the generic bits for doing configuration, hash function selection, etc; * topelitz.[ch] is now in net/ rather than netinet/; * (and would be in libkern if it didn't directly include RSS_KEYSIZE; that's a later thing to fix up.) * netinet/in_rss.[ch] now just contains the IPv4 specific methods; * and netinet/in6_rss.[ch] now just contains the IPv6 specific methods. This should have no functional impact on anyone currently using the RSS support. Differential Revision: D1383 Reviewed by: gnn, jfv (intel driver bits) Notes: svn path=/head/; revision=277331
* Migrate the RSS IPv6 hash code to use pointers to the v6 addressesAdrian Chadd2014-12-311-13/+13
| | | | | | | | | | | | | | | | | | | rather than passing them in by value. The eventual aim is to do incremental hash construction rather than all of the memcpy()'ing into a contiguous buffer for the hash function, which does show up as taking quite a bit of CPU during profiling. Tested: * a variety of laptops/desktop setups I have, with v6 connectivity Differential Revision: D1404 Reviewed by: bz, rpaulo Notes: svn path=/head/; revision=276484
* Start process of removing the use of the deprecated "M_FLOWID" flagHans Petter Selasky2014-12-011-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | from the FreeBSD network code. The flag is still kept around in the "sys/mbuf.h" header file, but does no longer have any users. Instead the "m_pkthdr.rsstype" field in the mbuf structure is now used to decide the meaning of the "m_pkthdr.flowid" field. To modify the "m_pkthdr.rsstype" field please use the existing "M_HASHTYPE_XXX" macros as defined in the "sys/mbuf.h" header file. This patch introduces new behaviour in the transmit direction. Previously network drivers checked if "M_FLOWID" was set in "m_flags" before using the "m_pkthdr.flowid" field. This check has now now been replaced by checking if "M_HASHTYPE_GET(m)" is different from "M_HASHTYPE_NONE". In the future more hashtypes will be added, for example hashtypes for hardware dedicated flows. "M_HASHTYPE_OPAQUE" indicates that the "m_pkthdr.flowid" value is valid and has no particular type. This change removes the need for an "if" statement in TCP transmit code checking for the presence of a valid flowid value. The "if" statement mentioned above is now a direct variable assignment which is then later checked by the respective network drivers like before. Additional notes: - The SCTP code changes will be committed as a separate patch. - Removal of the "M_FLOWID" flag will also be done separately. - The FreeBSD version has been bumped. MFC after: 1 month Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=275358
* Ensure the correct software IPv4 hash is done based on the configuredAdrian Chadd2014-09-161-3/+10
| | | | | | | RSS parameters, rather than assuming we're hashing IPv4+UDP and IPv4+TCP. Notes: svn path=/head/; revision=271660
* Implement IPv4 RSS software hash functions to use during packet ingressAdrian Chadd2014-09-091-8/+271
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and egress. * rss_mbuf_software_hash_v4 - look at the IPv4 mbuf to fetch the IPv4 details + direction to calculate a hash. * rss_proto_software_hash_v4 - hash the given source/destination IPv4 address, port and direction. * rss_soft_m2cpuid - map the given mbuf to an RSS CPU ("bucket" for now) These functions are intended to be used by the stack to support the following: * Not all NICs do RSS hashing, so we should support some way of doing a hash in software; * The NIC / driver may not hash frames the way we want (eg UDP 4-tuple hashing when the stack is only doing 2-tuple hashing for UDP); so we may need to re-hash frames; * .. same with IPv4 fragments - they will need to be re-hashed after reassembly; * .. and same with things like IP tunneling and such; * The transmit path for things like UDP, RAW and ICMP don't currently have any RSS information attached to them - so they'll need an RSS calculation performed before transmit. TODO: * Counters! Everywhere! * Add a debug mode that software hashes received frames and compares them to the hardware hash provided by the hardware to ensure they match. The IPv6 part of this is missing - I'm going to do some re-juggling of where various parts of the RSS framework live before I add the IPv6 code (read: the IPv6 code is going to go into netinet6/in6_rss.[ch], rather than living here.) Note: This API is still fluid. Please keep that in mind. Differential Revision: https://reviews.freebsd.org/D527 Reviewed by: grehan Notes: svn path=/head/; revision=271297
* Fix byte ordering in default RSS key.Peter Grehan2014-08-011-5/+5
| | | | | | | | | | | | | | The rss_key[] array in netinet/in_rss.c has the bytes in incorrect order. This results in the RSS test vectors in the Microsft RSS spec and Intel NIC specs giving incorrect results, and making it difficult to verify correct hash operation when RSS functionality is added to new NICs. CR: https://phabric.freebsd.org/D516 Reviewed by: adrian Notes: svn path=/head/; revision=269391
* Add hash awareness of the IPv4 and IPv6 UDP 4-tuple.Adrian Chadd2014-07-201-0/+4
| | | | | | | Note: it would be nice if the supported hash check would be used here! Notes: svn path=/head/; revision=268912
* Implement rss_gethashconfig() - return the currently supported hash methodsAdrian Chadd2014-07-201-0/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | by the stack. Right now the stack isn't really setup for RSS with 4-tuple UDP hashing for either IPv4 and IPv6. The specifics: * The UDP init path udp_init() and udplite_init() specify the hash as 2-tuple, so the PCBGROUPS code only tries a 2-tuple check; * The PCBGROUPS and RSS code doesn't know about the UDP hash types just yet, so they're never treated as valid hashes. * For correctness, 4-tuple can't be enabled in the general case because UDP datagrams can be more fragmented than IP datagrams may be. Strictly speaking, TCP datagrams may also be fragmented and this could cause issues with PCBGROUPS/RSS until the IP defragment path grows some code to re-calculate the RSS hash. I'll follow this commit up with awareness of the UDP 4-tuple for those who wish to configure it, but for now it'll stay disabled. No drivers (yet) know to use this function when RSS is enabled. Notes: svn path=/head/; revision=268911
* Update the comment to be more concise.Adrian Chadd2014-07-201-3/+2
| | | | Notes: svn path=/head/; revision=268909
* Update the default RSS hash to the Chelsio T5 firmware one - it providesAdrian Chadd2014-07-181-5/+5
| | | | | | | | | | | markedly better distribution of IPv6 address/ports than the previous key. The previous key would hash large swaths of the port space for a given source/destination IP address to the same low handful of bits, effectively mapping them to the same queue. This made testing very .. special. Notes: svn path=/head/; revision=268837
* Add RSS hashing awareness for IPv6 and TCP IPv6 hash types.Adrian Chadd2014-07-121-0/+4
| | | | Notes: svn path=/head/; revision=268559
* Pull in r267961 and r267973 again. Fix for issues reported will follow.Hans Petter Selasky2014-06-281-4/+2
| | | | Notes: svn path=/head/; revision=267992
* Revert r267961, r267973:Glen Barber2014-06-271-2/+4
| | | | | | | | | | | | | These changes prevent sysctl(8) from returning proper output, such as: 1) no output from sysctl(8) 2) erroneously returning ENOMEM with tools like truss(1) or uname(1) truss: can not get etype: Cannot allocate memory Notes: svn path=/head/; revision=267985
* Extend the meaning of the CTLFLAG_TUN flag to automatically check ifHans Petter Selasky2014-06-271-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | there is an environment variable which shall initialize the SYSCTL during early boot. This works for all SYSCTL types both statically and dynamically created ones, except for the SYSCTL NODE type and SYSCTLs which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to be used in the case a tunable sysctl has a custom initialisation function allowing the sysctl to still be marked as a tunable. The kernel SYSCTL API is mostly the same, with a few exceptions for some special operations like iterating childrens of a static/extern SYSCTL node. This operation should probably be made into a factored out common macro, hence some device drivers use this. The reason for changing the SYSCTL API was the need for a SYSCTL parent OID pointer and not only the SYSCTL parent OID list pointer in order to quickly generate the sysctl path. The motivation behind this patch is to avoid parameter loading cludges inside the OFED driver subsystem. Instead of adding special code to the OFED driver subsystem to post-load tunables into dynamically created sysctls, we generalize this in the kernel. Other changes: - Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask" to "hw.pcic.intr_mask". - Removed redundant TUNABLE statements throughout the kernel. - Some minor code rewrites in connection to removing not needed TUNABLE statements. - Added a missing SYSCTL_DECL(). - Wrapped two very long lines. - Avoid malloc()/free() inside sysctl string handling, in case it is called to initialize a sysctl from a tunable, hence malloc()/free() is not ready when sysctls from the sysctl dataset are registered. - Bumped FreeBSD version to indicate SYSCTL API change. MFC after: 2 weeks Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=267961
* Add another RSS method to query the indirection table entries.Adrian Chadd2014-06-261-0/+20
| | | | | | | | | | | | | | | There's 128 indirection table entries which correspond to the low 7 bits of the 32 bit RSS hash. Each value will correspond to an RSS bucket. (Then each RSS bucket currently will map to a CPU.) This is a more explicit way of figuring out which RSS bucket is in each RSS indirection slot. It can be inferred by the other methods but I'd rather drivers use something more simplified and explicit. Notes: svn path=/head/; revision=267891
* The users of RSS shouldn't be directly concerned about hash -> CPU IDAdrian Chadd2014-05-271-0/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | mappings. Instead, they should be first mapping to an RSS bucket and then querying the RSS bucket -> CPU ID mapping to figure out the target CPU. When (if?) RSS rebalancing is implemented or some other (non round-robin) distribution of work from buckets to CPU IDs, various bits of code - both userland and kernel - will need to know how this mapping works. So, to support this: * Add a new function rss_m2bucket() - this maps an mbuf to a given bucket. Anything which is currently doing hash -> CPU work may instead wish to do hash -> bucket, and then query the bucket->cpuid map for which CPU it belongs on. Or, map it to a bucket, then re-pin that bucket -> CPU during a rebalance operation. * For userland applications which wish to exploit affinity to RSS buckets, the bucket -> CPU ID mapping is now available via a sysctl. net.inet.rss.bucket_mapping lists the bucket to CPU ID mapping via a list of bucket:cpu pairs. Notes: svn path=/head/; revision=266737
* Use CPU_FIRST() / CPU_NEXT() to iterate over the valid CPU IDs.Adrian Chadd2014-05-221-4/+6
| | | | Notes: svn path=/head/; revision=266537
* Add a new function to do a CPU ID lookup based on RSS hash information.Adrian Chadd2014-05-181-11/+18
| | | | | | | | | | | This is intended to be used by various places that wish to hash some information about a TCP/UDP/IP flow but don't necessarily have a live mbuf to do it with. Refactor rss_m2cpuid() to use the refactored function. Notes: svn path=/head/; revision=266419
* Several years after initial development, merge prototype support forRobert Watson2014-03-151-0/+505
linking NIC Receive Side Scaling (RSS) to the network stack's connection-group implementation. This prototype (and derived patches) are in use at Juniper and several other FreeBSD-using companies, so despite some reservations about its maturity, merge the patch to the base tree so that it can be iteratively refined in collaboration rather than maintained as a set of gradually diverging patch sets. (1) Merge a software implementation of the Toeplitz hash specified in RSS implemented by David Malone. This is used to allow suitable pcbgroup placement of connections before the first packet is received from the NIC. Software hashing is generally avoided, however, due to high cost of the hash on general-purpose CPUs. (2) In in_rss.c, maintain authoritative versions of RSS state intended to be pushed to each NIC, including keying material, hash algorithm/ configuration, and buckets. Provide software-facing interfaces to hash 2- and 4-tuples for IPv4 and IPv6 using both the RSS standardised Toeplitz and a 'naive' variation with a hash efficient in software but with poor distribution properties. Implement rss_m2cpuid()to be used by netisr and other load balancing code to look up the CPU on which an mbuf should be processed. (3) In the Ethernet link layer, allow netisr distribution using RSS as a source of policy as an alternative to source ordering; continue to default to direct dispatch (i.e., don't try and requeue packets for processing on the 'right' CPU if they arrive in a directly dispatchable context). (4) Allow RSS to control tuning of connection groups in order to align groups with RSS buckets. If a packet arrives on a protocol using connection groups, and contains a suitable hardware-generated hash, use that hash value to select the connection group for pcb lookup for both IPv4 and IPv6. If no hardware-generated Toeplitz hash is available, we fall back on regular PCB lookup risking contention rather than pay the cost of Toeplitz in software -- this is a less scalable but, at my last measurement, faster approach. As core counts go up, we may want to revise this strategy despite CPU overhead. Where device drivers suitably configure NICs, and connection groups / RSS are enabled, this should avoid both lock and line contention during connection lookup for TCP. This commit does not modify any device drivers to tune device RSS configuration to the global RSS configuration; patches are in circulation to do this for at least Chelsio T3 and Intel 1G/10G drivers. Currently, the KPI for device drivers is not particularly robust, nor aware of more advanced features such as runtime reconfiguration/rebalancing. This will hopefully prove a useful starting point for refinement. No MFC is scheduled as we will first want to nail down a more mature and maintainable KPI/KBI for device drivers. Sponsored by: Juniper Networks (original work) Sponsored by: EMC/Isilon (patch update and merge) Notes: svn path=/head/; revision=263198