aboutsummaryrefslogtreecommitdiff
path: root/sys/netinet
Commit message (Collapse)AuthorAgeFilesLines
* tcp lro: use theflowid only when it has hash propertiesMichael Tuexen11 hours1-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | When a packet is provided to LRO using tcp_lro_queue_mbuf(), a sequence number is computed based on the m_pkthdr.flowid provided by he driver. The implicit assumption is that the m_pkthdr.flowid has hash properties. The recent use of tcp_lro_queue_mbuf() in iflib exposed a bug in at least one driver (igc) , which * reports always that is uses M_HASHTYPE_OPAQUE. * sets in some cases m_pkthdr.flowid not consistently for packets belonging to the same TCP connection. This results in severe performance problems for the base TCP stack, since it handles the packets in the wrong sequence, although they were received in the correct sequence. To protect against such misbehaving drivers, just take the m_pkthdr.flowid only into account, if it has hash properties. The performance problems were observed by gallatin@ and analyzed together with rrs@. Reviewed by: gallatin Tested by: gallatin MFC after: 5 Minutes Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D52989
* tcp: remove stray ;Michael Tuexen3 days1-1/+1
| | | | | MFC after: 3 days Sponsored by: Netflix, Inc.
* tcp: improve SEG.ACK validation in SYN-RECEIVEDMichael Tuexen3 days1-1/+2
| | | | | | | | | | | | | | | | According to the fifth step in SEGMENT ARRIVES, send a RST segment in response to an ACK segment which fails the SEG.ACK check, but leave the endpoint state unchanged. FreeBSD handles this correctly when entering the SYN-RECEIVED state via the SYN-SENT state, but not in the SYN-cache code, which handles the SYN-RECEIVED state via the LISTEN state. This also fixes a panic reported by Alexander Leidinger. Reviewed by: jtl, glebius MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D52934
* tcp: remove stray ;Michael Tuexen3 days1-1/+1
| | | | | | | No functional change intended. MFC after: 3 days Sponsored by: Netflix, Inc.
* mtx: rename MTX_CONTESTED to MTX_WAITERSMateusz Guzik4 days1-3/+3
| | | | | | | | | | | | | | | | | | | Using the word "contested" for the case where there are threads blocked on the lock is misleading at best (the lock is already contested if it is being held by one thread and wanted by another). It also diverges from naming used in other primitives (which refer to them as "waiters"). Rename it for some consistency. There were uses of the flag outside of mutex code itself. This is an abuse of the interface. The netgraph thing looks suspicious at best, the sctp thing is fundamentally wrong. Fixing those up is left as an exercise for the reader. While here touch up stale commentary. Sponsored by: Rubicon Communications, LLC ("Netgate")
* carp6: revise the generation of ND6 NAAndrey V. Elsukov7 days1-7/+20
| | | | | | | | | | | | | | | | | | | | | | | * use ND_NA_FLAG_ROUTER flag in carp_send_na() when we work as router. * use in6addr_any as destination address for nd6_na_output(), then it will use ipv6-all-nodes multicast address. * add in6_selectsrc_nbr() function that accepts additional argument ip6_moptions. Use this function from ND6 code to avoid cases when nd6_na_output/nd6_ns_output can not find source address for multicast destinations. * add some comments from RFC2461 for better understanding. * use tlladdr argument as flags and use ND6_NA_OPT_LLA when we need to add target link-layer address option, and ND6_NA_CARP_MASTER when we know that target address is CARP master. Then we can prepare correct CARP's mac address if target address is CARP master. * move blocks of code where multicast options is initialized and use it when destination address is multicast. Reviewed by: kp Obtained from: Yandex LLC MFC after: 2 weeks Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D52825
* tcp: close two minor races with debug messagesJonathan T. Looney7 days1-2/+2
| | | | | | | | | | | | | | The syncache entry is locked by the hash bucket lock. After running SCH_UNLOCK(), we have no guarantee that the syncache entry still exists. Resolve the race by moving SCH_UNLOCK() after the log() call which reads variables from the syncache entry. Reviewed by: rrs, tuexen, Nick Banks Sponsored by: Netflix MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D52868
* tcp: improve segment validation in SYN-RECEIVEDMichael Tuexen7 days1-15/+17
| | | | | | | | | | | | | | | The validation of SEG.SEQ (first step in SEGMENT ARRIVES of RFC 9293) should be done before the validation of SEG.ACK (fifth step in SEGMENT ARRIVES in RFC 9293). Furthermore, when the SEG.SEQ validation fails, a challenge ACK should be sent instead of sending a RST-segment and moving the endpoint to CLOSED. Reported by: Tilnel on freebsd-net Reviewed by: Nick Banks MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D52849
* tcp: keep SYN-cache entry when sending of challenge ACK failsMichael Tuexen8 days1-13/+4
| | | | | | | | | | Don't drop a SYN-cache entry just because a challenge ACK couldn't be sent. This might only be a temporary failure. Reviewed by: Nick Banks, glebius, jtl MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D52840
* tcp: cleanup syncache_expand()Michael Tuexen8 days1-23/+29
| | | | | | | | | | | | | Only validate SEG.SEQ and SEG.ACK when processing a real SYN-cache entry. In the SYN-cookie case, these conditions are always true, since the SYN-cache entry on the stack is constructed from the incoming TCP segment. While there, fix the logging messages. Reviewed by: Nick Banks MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D52816
* tcp: apply rate limits to challenge ACKsMichael Tuexen9 days2-4/+26
| | | | | | | | | | When sending challenge ACKs from the SYN-cache, apply the same rate limiting as in other states. Reviewed by: cc, rrs MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D52754
* tcp: refactor tcp_send_challenge_ack()Michael Tuexen2025-09-252-15/+28
| | | | | | | | | | | | | | Refactor tcp_send_challenge_ack() such that the logic checking whether a challenge ACK is sent or not is available in the separate function tcp_challenge_ack_check(). This new function will also be used for sending challenge ACKs in the SYN-cache code, which will be added in upcoming commits. No functional change intended. Reviewed by: cc, Nick Banks, Peter Lei MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D52717
* tcp: whitespace cleanupMichael Tuexen2025-09-251-2/+2
| | | | | | | No functional change intended. MFC after: 3 days Sponsored by: Netflix, Inc.
* tcp: Add CTLFLAG_VNET flag to some sysctlsZhenlei Huang2025-09-241-2/+2
| | | | | | | | | | | | | The two sysctls net.inet.tcp.hostcache.list and net.inet.tcp.hostcache.histo are readonly and are to operate hostcache of vnet jails. Add CTLFLAG_VNET flag to them since they are per-vnet sysctls. This change does not have any impact on reading the two sysctls, but `sysctl -ANV net.inet.tcp.hostcache` will report them correctly. Reviewed by: tuexen, #transport, #network MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D52693
* tcp: Fix expiring and purging hostcache entries of vnet jailsZhenlei Huang2025-09-241-1/+1
| | | | | | | | | | | | | | | A jailed process, `sysctl -j foo` or `jexec foo sysctl`, do not have privilege to write to non-vnet sysctls but only to those marked as jail writable, aka sysctls those marked with CTLFLAG_VNET flag. Without this change we will get EPERM when trying to expire and purge hostcache entries of vnet jails via the net.inet.tcp.hostcache.purgenow sysctl. Fix that by adding a CTLFLAG_VNET flag. Reviewed by: tuexen, #transport, #network Fixes: 264563806496 Add a new sysctl net.inet.tcp.hostcache.purgenow=1 to expire ... MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D52692
* tcp lro: remove redundant checkMichael Tuexen2025-09-231-11/+0
| | | | | | | | | Remove a check which is also done in tcp_lro_rx_common(). Reviewed by: gallatin MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D52683
* tcp: fix sending of RST segmentsMichael Tuexen2025-09-231-1/+1
| | | | | | | | Take endpoint parameters into account when available. Fixes: 463b5aed0d62 ("tcp: retire rstreason") MFC after: 3 days Sponsored by: Netflix, Inc.
* tcp: minor cleanupMichael Tuexen2025-09-101-3/+3
| | | | | | | No functional change intended. MFC after: 3 days Sponsored by: Netflix, Inc.
* tcp: improve compilabilityMichael Tuexen2025-09-051-0/+3
| | | | | | | | | | When building with DDB support, the inclusion of in_kdtrace.h is needed. Make this explicit and don't rely on tcp_var.h to do this. This is required for stable/14. Fixes: a62c6b0de48a ("ddb: add optional printing of BBLog entries") MFC after: immediately Sponsored by: Netflix, Inc.
* tcp: add gone_in note for net.inet.tcp.sack.revised for fbsd16Richard Scheffenegger2025-09-041-2/+19
| | | | | | | | | Depricate the support for the old RFC3517 behavior of SACK loss recovery, and simplfy the code to always adhere to RFC6675. Reviewed By: tuexen, cc, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D52383
* bridge: Print a warning if member_ifaddrs=1Lexi Winter2025-09-041-2/+9
| | | | | | | | | | | When adding an interface with an IP address to a bridge, or assigning an IP address to an interface which is in a bridge, and member_ifaddrs=1, print a warning so users are informed this is deprecated. Also add "(deprecated)" to the sysctl description. MFC after: 9 hours Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D52335
* ifnet: Defer detaching address family dependent dataZhenlei Huang2025-09-031-0/+2
| | | | | | | | | | | | | While diagnosing PR 279653 and PR 285129, I observed that thread may write to freed memory but the system does not crash. This hides the real problem. A clear NULL pointer derefence is much better than writing to freed memory. PR: 279653 PR: 285129 Reviewed by: glebius MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D49444
* tcp: micro-optimize SYN-cookie expansionMichael Tuexen2025-09-011-6/+8
| | | | | | | | | | | Only compute wscale when it is actually used. While there, change the type of wscale to u_int as suggested by glebius. No functional change intended. Reviewed by: glebius, rscheff (older version) MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D52296
* udp: slightly refactor udp_append()Gleb Smirnoff2025-09-011-21/+21
| | | | | | | | | Make it bool. Reword the comment, add note that mbuf is always consumed. In case tunnel consumed the mbuf, don't INP_RUNLOCK(), behave just like all the other normal exits from the function. Reviewed by: tuexen, kp, markj Differential Revision: https://reviews.freebsd.org/D52171
* udp: don't leak mbuf if tunnel didn't consume and inpcb is goneGleb Smirnoff2025-09-011-1/+4
| | | | | | | Fixes: e1751ef896119d7372035b1b60f18a6342bd0e3b Reviewed by: tuexen, kp, markj Differential Revision: https://reviews.freebsd.org/D52170
* bridge: Fix adding gif(4) interface assigned with IP addresses as bridge memeberZhenlei Huang2025-09-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | and fix assigning IP addresses to the gif(4) interface when it is a member of a if_bridge(4) interface. When setting the sysctl net.link.bridge.member_ifaddrs to 1, if_bridge(4) can eliminate unnecessary walk of the member list to determine whether the inbound unicast packets are for us or not. Well when a gif(4) interface is member of a if_bridge(4) interface, it acts as the tunnel endpoint to tunnel Ethernet frames over IP network, aka the EtherIP protocol, so the IP addresses configured on it are independent of the if_bridge(4) interface or other if_bridge(4) members, hence the sysctl net.link.bridge.member_ifaddrs should not have any influnce over gif(4) interfaces's behavior of assigning IP addresses. PR: 227450 Reported by: Siva Mahadevan <me@svmhdvn.name> Reviewed by: ivy, #bridge MFC after: 1 week Fixes: 0a1294f6c610 bridge: allow IP addresses on members to be disabled Differential Revision: https://reviews.freebsd.org/D52200
* tcp: improve sending of SYN-cookiesMichael Tuexen2025-08-301-41/+48
| | | | | | | | | | | | | | | | | | Ensure that when the sysctl-variable net.inet.tcp.syncookies_only is non zero, SYN-cookies are sent and no SYN-cache entry is added to the SYN-cache. In particular, this behavior should not depend on the value of the sysctl-variable net.inet.tcp.syncookies, which controls whether SYN cookies are used in combination with the SYN-cache to deal with bucket overflows. Also ensure that tcps_sc_completed does not include TCP connections established via a SYN-cookie. While there, make V_tcp_syncookies and V_tcp_syncookiesonly bool instead of int, since they are used as boolean variables. Reviewed by: rscheff, cc, Peter Lei, Nick Banks MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D52225
* tcp: remove stale commentMichael Tuexen2025-08-281-1/+0
| | | | | MFC after: 3 days Sponsored by: Netflix, Inc.
* netinet: provide "at offset" variant of the in_delayed_cksum() APIMaxim Sobolev2025-08-262-3/+11
| | | | | | | | | | | | The need for such a variant comes from the fact that we need to re-calculate checksum aftet ng_nat(4) transformations while getting mbufs from the layer 2 (ethernet) directly. Reviewed by: markj, tuexen Approved by: tuexen Sponsored by: Sippy Software, Inc. Differential Revision: https://reviews.freebsd.org/D49677 MFC After: 2 weeks
* tcp: remove now unneeded icmp includesGleb Smirnoff2025-08-256-12/+0
|
* mod_cc(4): Fix a typo in a source code commentGordon Bergling2025-08-251-1/+1
| | | | | | - s/assigments/assignments/ MFC after: 3 days
* tcp: improve inflating cwnd in limited transmitMichael Tuexen2025-08-251-5/+3
| | | | | | | | | Don't subtract tcp_sack_adjust() sometimes twice, just once in all cases. Reviewed by: rscheff Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D52140
* tcp: improve the condition for detecting dup ACKsMichael Tuexen2025-08-241-263/+236
| | | | | | | | | | | Take the condition of RFC 6675 into account. While there, remove stale comments. PR: 282605 Reviewed by: cc (earlier version) MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D51426
* icmp: clear offset and flags when reflecting a packetMichael Tuexen2025-08-181-1/+2
| | | | | | | | | | | When reflecting a packet, use an offset of 0 and clear all three bits, in particular the DF bit. PR: 288558 Reviewed by: markj, zlei MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D51991
* udp: Fix a typo in a source code commentGordon Bergling2025-08-171-1/+1
| | | | | | - s/datgram/datagram/ MFC after: 3 days
* tcp: fix sysctl name in the gone_in() printfGleb Smirnoff2025-08-131-1/+1
| | | | Fixes: c3fc0db3bc50df18a724e6e6b12ea4e060fd9255
* IPv6: Ignore PTB packets with an MTU < 1280Eric van Gyzen2025-08-122-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | RFC 2460 section 5 paragraph 7 allowed a Packet Too Big message to report a Next-Hop MTU less than 1280 in support of 6-to-4 routers. A node receiving such a message was required to add a Fragment Header to outgoing packets, even though they were not fragmented. Almost 20 years later, RFC 8200 was published. It obsoletes RFC 2460 and removes that paragraph. UNH IOL Intact was updated to test for compliance with the new standard. Remove code supporting that obsolete paragraph. Test cases v6LC_4_1_06a and 06b failed before this change, saying: DUT processed PTB and sent a fragmented echo reply Those two test cases now pass: DUT did not process PTB and sent un-fragmented echo reply All PMTU test cases pass except v6LC_4_1_08. It fails because we ignore the MTU in RAs. Reviewed by: tuexen MFC After: 1 month Sponsored by: Dell Inc. Differential Revision: https://reviews.freebsd.org/D51835
* tcp: retire rstreasonMichael Tuexen2025-08-126-58/+41
| | | | | | | | | | | | | With the latest changes, this variable and parameter for tcp_dropwithreset() is not needed anymore. It also makes it harder to introduce the usage of multiple counters for TCP again, which might open side channel attacks. No funtional changes intended. Reviewed by: rrs MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D51872
* tcp: minor cleanupMichael Tuexen2025-08-121-14/+14
| | | | | | | | | | | | Don't use the rstreason variable as a hint that a second lookup is performed, since the rstreason variable will be removed. Use the INPLOOKUP_WILDCARD flag in the lookupflag variable instead. No functional change intended. Reviewed by: rrs MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D51847
* udp: use appropriate error countersMichael Tuexen2025-08-121-1/+5
| | | | | | | | | | Since there are multicast and broadcast specific error counters, use them. Reviewed by: rrs MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D51869
* icmp: remove unused BANDLIM_UNLIMITEDMichael Tuexen2025-08-112-2/+1
| | | | | | | Reviewed by: Nick Banks MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D51849
* tcp: mitigate a side channel for detection of TCP connectionsMichael Tuexen2025-08-091-0/+8
| | | | | | | | | | | | | | | | If a blind attacker wants to guess by sending ACK segments if there exists a TCP connection , this might trigger a challenge ACK on an existing TCP connection. To make this hit non-observable for the attacker, also increment the global counter, which would have been incremented if it would have been a non-hit. This issue was reported as issue number 11 in Keyu Man et al.: SCAD: Towards a Universal and Automated Network Side-Channel Vulnerability Detection Reviewed by: Nick Banks, Peter Lei MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D51724
* tcp: rate limit the sending of all RST segmentsMichael Tuexen2025-08-073-7/+7
| | | | | | | | | | | | | | | | | | Also rate limit the sending of RST segments in the following cases: * when receiving data on a closed socket. * when a socket can not be created at the end of the handshake and the sysctl-variable net.inet.tcp.syncache.rst_on_sock_fail is 1. * when an ACK segment is received in SYN SENT state and it does not acknowledge the SYN segment. After this change, there is no need anymore to provide a rstreason to tcp_dropwithreset(), since it is always BANDLIM_TCP_RST. This will be a follow-up commit, since it will change the code in a couple of places, but will not change the functionality. Reviewed by: rrs, Nick Banks, Peter Lei MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D51815
* tcp : remove assignment without effectMichael Tuexen2025-08-071-1/+0
| | | | | | | | | | | rstreason is only relevant in the code paths with the label 'dropwithreset', but not in the one with the label 'drop'. No functional change intended. Reviewed by: Nick Banks, rrs, Peter Lei, imp MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D51814
* inet: fix typoWarner Losh2025-08-081-1/+1
| | | | | | | | | Note: btw submitted a number of other things in this area that haven't made it into the tree, so I'm making an exception to the no typo rule since it was done in that context. Submitted by: btw (Tiwei Bie GSOC 2015 so unsure what to use for author) Differential Revision: https://reviews.freebsd.org/D3510
* tcp: ensure SACK rxmit never ends up left of its holeRichard Scheffenegger2025-08-062-3/+3
| | | | | | | | | | | When a RTO happens during SACK loss recovery, snd_recover can possibly pulled left. With Lost Retransmission Detection (LRD) this can lead to rxmit of a hole to end up pointing to the left of the hole, which is unexpected and leads to complications. Reviewed By: tuexen, #transport Sponsored by: NetApp, Inc. MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D51725
* tcp sack: improve computation of delivered_dataMichael Tuexen2025-08-061-1/+1
| | | | | | | | | | | delivered_data is the number of bytes, which have newly been delivered to the peer. This includes the number of bytes cumulatively acknowledged and selectively acknowledged. Reviewed by: rscheff MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D51718
* tcp: improve consistency of KASSERTs in tcp_sack.cMichael Tuexen2025-08-061-13/+17
| | | | | | | | | | When panicing, don't print the condition, which was violated, but the condition which holds at the time of the panic. Reviewed by: Nick Banks MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D51726
* rack, bbr: minor cleanupMichael Tuexen2025-08-061-4/+2
| | | | | | | | | No functional change intended. Reviewed by: Nick Banks MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D51734
* ipfw: add numeric initializers to enum ipfw_opcodesAndrey V. Elsukov2025-08-031-110/+110
| | | | | | | | | This is mostly for better readability when we need to resolve what opcode corresponds to specific number. Obtained from: Yandex LLC Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D51457