aboutsummaryrefslogtreecommitdiff
path: root/sys/netinet/tcp_var.h
Commit message (Collapse)AuthorAgeFilesLines
* tcp: Make tcp_var.h more self-containedMark Johnston2024-04-101-0/+3
| | | | | | | | | | | | struct tcpcb embeds a struct osd and a struct callout. Rather than forcing all consumers to pull in the same headers, include the headers directly. No functional change intended. Reviewed by: glebius MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D44685
* netinet: add a probe point for IP, IP6, ICMP, ICMP6, UDP and TCP stats countersKristof Provost2024-04-081-4/+13
| | | | | | | | | | | | | | | | | | | | When debugging network issues one common clue is an unexpectedly incrementing error counter. This is helpful, in that it gives us an idea of what might be going wrong, but often these counters may be incremented in different functions. Add a static probe point for them so that we can use dtrace to get futher information (e.g. a stack trace). For example: dtrace -n 'mib:ip:count: { printf("%d", arg0); stack(); }' This can be disabled by setting the following kernel option: options KDTRACE_NO_MIB_SDT Reviewed by: gallatin, tuexen (previous version), gnn (previous version) Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D43504
* tcp: fix commentMichael Tuexen2024-04-031-4/+4
| | | | | | | | | Make the comment consistent with the code. Reviewed by: rscheff MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D44611
* tcp: remove IS_FASTOPEN() macroGleb Smirnoff2024-03-181-8/+0
| | | | | | | | | | | | The macro is more obfuscating than helping as it just checks a single flag of t_flags. All other t_flags bits are checked without a macro. A bigger problem was that declaration of the macro in tcp_var.h depended on a kernel option. It is a bad practice to create such definitions in installable headers. Reviewed by: rscheff, tuexen, kib Differential Revision: https://reviews.freebsd.org/D44362
* netinet/tcp_var.h: always define IS_FASTOPEN() for kernel compilation envKonstantin Belousov2024-03-131-1/+3
| | | | | | | | | | and drop the definition for userspace (which matched TCP_RFC7413) since it depends on presence of the kernel option. Reviewed by: glebius, rscheff Sponsored by: NVIDIA networking MFC after: 1 week Differential revision: https://reviews.freebsd.org/D44349
* tcp: move struct tcp_ifcap declaration under _KERNELGleb Smirnoff2024-03-131-11/+11
| | | | | Reviewed by: rscheff, tuexen, kib Differential Revision: https://reviews.freebsd.org/D44340
* Update to bring the rack stack with all its fixes in.Randall Stewart2024-03-121-2/+10
| | | | | | | | | | | | | | | | | This brings the rack stack up to the current level used at NF. Many fixes and improvements have been added. I also add in a fix to BBR to deal with the changes that have been in hpts for a while i.e. only one call no matter if mbuf queue or tcp_output. It basically does little except BBlogs and is a placemark for future work on doing path capacity measurements. With a bit of a struggle with git I finally got rack_pcm.c into place (apologies for not noticing this error). The LINT kernel is running on my box now .. sigh. Reviewed by: tuexen, glebius Sponsored by: Netflix Inc. Differential Revision:https://reviews.freebsd.org/D43986
* Revert "Update to bring the rack stack with all its fixes in."Brooks Davis2024-03-111-10/+2
| | | | | | | This commit was incomplete and breaks LINT kernels. The tree has been broken for 8+ hours. This reverts commit f6d489f402c320f1a6eaa473491a0b8c3878113e.
* Update to bring the rack stack with all its fixes in.Randall Stewart2024-03-111-2/+10
| | | | | | | | | | | | | | | | This brings the rack stack up to the current level used at NF. Many fixes and improvements have been added. I also add in a fix to BBR to deal with the changes that have been in hpts for a while i.e. only one call no matter if mbuf queue or tcp_output. Note there is a new file that I can't figure out how to get in rack_pcm.c It basically does little except BBlogs and is a placemark for future work on doing path capacity measurements. Reviewed by: tuexen, glebius Sponsored by: Netflix Inc. Differential Revision:https://reviews.freebsd.org/D43986
* tcp: pass maxseg around instead of calculating locallyRichard Scheffenegger2024-01-241-3/+5
| | | | | | | | | | Improve slowpath processing (reordering, retransmissions) slightly by calculating maxseg only once. This typically saves one of two calls to tcp_maxseg(). Reviewed By: glebius, tuexen, cc, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D43536
* tcp: remove outdated commentGleb Smirnoff2024-01-221-10/+0
| | | | This paragraph should have been removed in 446ccdd08e2a.
* tcp: do not purge SACK scoreboard on first RTORichard Scheffenegger2024-01-061-0/+1
| | | | | | | | | | | | Keeping the SACK scoreboard intact after the first RTO and retransmitting all data anew only on subsequent RTOs allows a more timely and efficient loss recovery under many adverse cirumstances. Reviewed By: tuexen, #transport MFC after: 10 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D42906
* netpfil: Use accessor functions and named constants for all tcphdr flagsRichard Scheffenegger2023-12-251-12/+0
| | | | | | | | | | | | | Update all remaining references to the struct tcphdr th_x2 field. This completes the compatibilty of various aspects with AccECN (TH_AE), after the internal ipfw "re-checksum required" was moved to use the TH_RES1 flag. No functional change. Reviewed By: tuexen, #transport, glebius Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D43172
* tcp: allow userspace use of tcp header flags accessor functionsRichard Scheffenegger2023-12-221-2/+1
| | | | | | | | | | Provide accessor functions to all 12 possible TCP header flags for userspace too. Reviewed By: zlei MFC after: 2 weeks Sponsored by: Netapp, Inc. Differential Revision: https://reviews.freebsd.org/D43152
* sys: Remove ancient SCCS tags.Warner Losh2023-11-271-2/+0
| | | | | | | | Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script. Sponsored by: Netflix
* tcp: uninline tcp_account_for_send()Gleb Smirnoff2023-11-211-25/+1
| | | | | | | This allows to clear inclusion of "opt_kern_tls.h" from a system header. Reviewed by: rscheff, tuexen Differential Revision: https://reviews.freebsd.org/D42696
* [tcp] add PRR 6937bis heuristic and retire prr_conservative sysctlRichard Scheffenegger2023-11-151-3/+10
| | | | | | | | | | | | | | | Improve Proportional Rate Reduction (RFC6937) by using a heuristic, which automatically chooses between conservative CRB and more aggressive SSRB modes. Only when snd_una advances (a partial ACK), SSRB may be used. Also, that ACK must not have any indication of ongoing loss - using the addition of new holes into the scoreboard as proxy for such an event. MFC after: 4 weeks Reviewed By: #transport, kbowling, rrs Sponsored By: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D28822
* tcp: use signed IsLost() related accounting variablesRichard Scheffenegger2023-10-171-2/+2
| | | | | | | | | | | | Coverity found that one safety check (kassert) was not functional, as possible incorrect subtractions during the accounting wouldn't show up as (invalid) negative values. Reported by: gallatin Reviewed By: cc, #transport Sponsored By: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D42180
* tcp: include RFC6675 IsLost() in pipe calculationRichard Scheffenegger2023-10-091-0/+2
| | | | | | | | | | | | | | Add more accounting while processing SACK data, to keep track of when a packet is deemed lost using the RFC6675 guidance. Together with PRR (RFC6972) this allows a sender to retransmit presumed lost packets faster, and loss recovery to complete earlier. Reviewed By: cc, rrs, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D39299
* sys: Remove $FreeBSD$: one-line .h patternWarner Losh2023-08-161-1/+0
| | | | Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/
* tcp: document that conditional fields in tcpcb should be at the endMichael Tuexen2023-07-271-0/+5
| | | | | Reviewed by: rscheff, Peter Lei Sponsored by: Netflix, Inc.
* tcp: improve layout of struct tcpcbMichael Tuexen2023-07-191-13/+11
| | | | | | | | | Put optional fields at the end to minimize run time problems in case CC modules are build from within its directory. Reviewed by: cc, gallatin, glebius, imp Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D41059
* tcbpcb: Always define t_osdWarner Losh2023-07-171-2/+0
| | | | | | | | | | Always define t_osd. congestion control modules access it unconditionally. This fixes the build. However, this is, at best, a temporary band-aide until the larger issues are sorted. Sponsored by: Netflix
* tcp: make the maximum number of retransmissions tunable per VNETRichard Scheffenegger2023-06-061-0/+2
| | | | | | | | | | | Both Windows (TcpMaxDataRetransmissions) and Linux (tcp_retries2) allow to restrict the maximum number of consecutive timer based retransmissions. Add that same capability on a per-VNet basis to FreeBSD. Reviewed By: cc, tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D40424
* tcp: request tracking is not http specific.Randall Stewart2023-05-241-34/+32
| | | | | | | | | | This change is a name change only. TCP Request tracking can track sendfile and even non-sendfile requests. The names however in the current code use http, and they should not. The feature is not http specific. Lets change the name so they more properly reflect whats going on. This also fixes conflicts with http_req which caused application pain. Reviewed by: tuexen Sponsored by: Netflix Inc Differential Revision:https://reviews.freebsd.org/D40229
* There are congestion control algorithms will that pull in srtt, and this can ↵Randall Stewart2023-05-191-0/+1
| | | | | | | | | | | | | cause issues with rack. When using rack, cubic and htcp will grab the srtt, but they think it is in ticks. For rack it is in micro-seconds (which we should probably move all stacks to actually). This causes issues so instead lets make a new interface so that any CC module can pull the srtt in whatever granularity they want. Reviewed by: tuexen Sponsored by: Netflix Inc Differential Revision:https://reviews.freebsd.org/D40146
* tcp: move HPTS/LRO flags out of inpcb to tcpcbGleb Smirnoff2023-04-251-10/+15
| | | | | | | | These flags are TCP specific. While here, make also several LRO internal functions to pass tcpcb pointer instead of inpcb one. Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D39698
* tcp_hpts: move HPTS related fields from inpcb to tcpcbGleb Smirnoff2023-04-251-1/+19
| | | | | | | | | | This makes inpcb lighter and allows future cache line optimizations of tcpcb. The reason why HPTS originally used inpcb is the compressed TIME-WAIT state (see 0d7445193ab), that used to free a tcpcb, while the associated connection is still on the HPTS ring. Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D39697
* Correct the value of macro TF2_TCP_ACCOUNTING.Cheng Cui2023-04-241-1/+1
| | | | | | | | | Summary: Make sure the values are in order. Reviewers: rscheff, tuexen, #transport! Approved by: rscheff, tuexen, glebius Subscribers: imp, melifaro, glebius Differential Revision: https://reviews.freebsd.org/D39716
* tcp_hpts: use queue(9) STAILQ for the input queueGleb Smirnoff2023-04-171-2/+1
| | | | | Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D39574
* tcp: pass tcpcb in the tfb_tcp_ctloutput() method instead of inpcbGleb Smirnoff2023-04-071-2/+2
| | | | | | | Just matches rest of the KPI. Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D39435
* tcp: reduce argument list to functions that pass a segmentGleb Smirnoff2023-04-071-10/+7
| | | | | | | | The socket argument is superfluous, as a tcpcb always has one and only one socket. Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D39434
* tcp: retire tfb_tcp_hpts_do_segment()Gleb Smirnoff2023-04-071-4/+0
| | | | | | | Isn't in use anymore. Correct comments that mention it. Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D39433
* tcp: misc cleanup of options for rack as well as socket option logging.Randall Stewart2023-04-071-2/+14
| | | | | | | | | | Both BBR and Rack have the ability to log socket options, which is currently disabled. Rack has an experimental SaD (Sack Attack Detection) algorithm that should be made available. Also there is a t_maxpeak_rate that needs to be removed (its un-used). Reviewed by: tuexen, cc Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D39427
* Fixes in the tcp infrastructure with respect to stack changes as well as ↵Randall Stewart2023-04-011-5/+269
| | | | | | | | | | | | | | | | | | | other infrastructure updates for incoming rack features. So stack switching as always been a bit of a issue. We currently use a break before make setup which means that if something goes wrong you have to try to get back to a stack. This patch among a lot of other things changes that so that it is a make before break. We also expand some of the function blocks in prep for new features in rack that will allow more controlled pacing. We also add other abilities such as the pathway for a stack to query a previous stack to acquire from it critical state information so things in flight don't get dropped or mis-handled when switching stacks. We also add the concept of a timer granularity. This allows an alternate stack to change from the old ticks granularity to microseconds and of course this even gives us a pathway to go to nanosecond timekeeping if we need to (something for the data center to consider for sure). Once all this lands I will then update rack to begin using all these new features. Reviewed by: tuexen Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D39210
* Move access to tcp's t_logstate into inline functions and provide new ↵Randall Stewart2023-03-161-2/+12
| | | | | | | | | | | | tracepoint and bbpoint capabilities. The TCP stacks have long accessed t_logstate directly, but in order to do tracepoints and the new bbpoints we need to move to using the new inline functions. This adds them and moves rack to now use the tcp_tracepoints. Reviewed by: tuexen, gallatin Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D38831
* Change hw_tls to a boolAlfonso2023-02-251-1/+1
| | | | | Reviewed by: imp Pull Request: https://github.com/freebsd/freebsd-src/pull/512
* tcp: remove unused function prototypeMichael Tuexen2023-02-221-1/+0
| | | | | | | | tcp_trace was implemented in tcp_debug.c, which was removed recently. Reviewed by: rscheff@, zlei@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D38712
* bblog: improve timeout event handlingMichael Tuexen2023-02-211-0/+7
| | | | | | | | | | | Extend the BBLog RTO event to deal with all timers of the base stack. Also provide information about starting, stopping, and running off. The expiration of the retransmission timer is reported as it was done before. Reviewed by: rscheff@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D38710
* tcp: rearrange enum and remove unused variableMichael Tuexen2023-02-211-2/+2
| | | | | | | | | | | Rearrange the enum tt_which such that TT_REXMIT is 0. This allows an extension of the BBLog event RTO in a backwards compatible way. Remove tcptimers, which was only used in trpt, a utility removed from the source tree recently. Reviewed by: glebius@, guest-ccui@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D38547
* ktls: Accurately track if ifnet ktls is enabledAndrew Gallatin2023-02-091-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows us to avoid spurious calls to ktls_disable_ifnet() When we implemented ifnet kTLSe, we set a flag in the tx socket buffer (SB_TLS_IFNET) to indicate ifnet kTLS. This flag meant that now, or in the past, ifnet ktls was active on a socket. Later, I added code to switch ifnet ktls sessions to software in the case of lossy TCP connections that have a high retransmit rate. Because TCP was using SB_TLS_IFNET to know if it needed to do math to calculate the retransmit ratio and potentially call into ktls_disable_ifnet(), it was doing unneeded work long after a session was moved to software. This patch carefully tracks whether or not ifnet ktls is still enabled on a TCP connection. Because the inp is now embedded in the tcpcb, and because TCP is the most frequent accessor of this state, it made sense to move this from the socket buffer flags to the tcpcb. Because we now need reliable access to the tcbcb, we take a ref on the inp when creating a tx ktls session. While here, I noticed that rack/bbr were incorrectly implementing tfb_hwtls_change(), and applying the change to all pending sends, when it should apply only to future sends. This change reduces spurious calls to ktls_disable_ifnet() by 95% or so in a Netflix CDN environment. Reviewed by: markj, rrs Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D38380
* tcp_var.h: Fix spelling of independent in commentJohn Baldwin2023-02-071-1/+1
|
* tcp: reduce the size of t_rttupdated in tcpcbRichard Scheffenegger2023-01-261-1/+1
| | | | | | | | | | | During tcp session start, various mechanisms need to track a few initial RTTs before becoming active. Prevent overflows of the corresponding tracking counter and reduce the size of tcpcb simultaneously. Reviewed By: #transport, tuexen, guest-ccui Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D21117
* tcp: use single locked callout per tcpcb for the TCP timersGleb Smirnoff2022-12-071-15/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use only one callout structure per tcpcb that is responsible for handling all five TCP timeouts. Use locked version of callout, of course. The callout function tcp_timer_enter() chooses soonest timer and executes it with lock held. Unless the timer reports that the tcpcb has been freed, the callout is rescheduled for next soonest timer, if there is any. With single callout per tcpcb on connection teardown we should be able to fully stop the callout and immediately free it, avoiding use of callout_async_drain(). There is one gotcha here: callout_stop() can actually touch our memory when a rare race condition happens. See comment above tcp_timer_stop(). Synchronous stop of the callout makes tcp_discardcb() the single entry point for tcpcb destructor, merging the tcp_freecb() to the end of the function. While here, also remove lots of lingering checks in the beginning of TCP timer functions. With a locked callout they are unnecessary. While here, clean unused parts of timer KPI for the pluggable TCP stacks. While here, remove TCPDEBUG from tcp_timer.c, as this allows for more simplification of TCP timers. The TCPDEBUG is scheduled for removal. Move the DTrace probes in timers to the beginning of a function, where a tcpcb is always existing. Discussed with: rrs, tuexen, rscheff (the TCP part of the diff) Reviewed by: hselasky, kib, mav (the callout part) Differential revision: https://reviews.freebsd.org/D37321
* tcp: remove tcp_timer_suspend()Gleb Smirnoff2022-12-071-2/+0
| | | | | It was a temporary code added together with RACK to fight against TCP timer races.
* tcp: embed inpcb into tcpcbGleb Smirnoff2022-12-071-17/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For the TCP protocol inpcb storage specify allocation size that would provide space to most of the data a TCP connection needs, embedding into struct tcpcb several structures, that previously were allocated separately. The most import one is the inpcb itself. With embedding we can provide strong guarantee that with a valid TCP inpcb the tcpcb is always valid and vice versa. Also we reduce number of allocs/frees per connection. The embedded inpcb is placed in the beginning of the struct tcpcb, since in_pcballoc() requires that. However, later we may want to move it around for cache line efficiency, and this can be done with a little effort. The new intotcpcb() macro is ready for such move. The congestion algorithm data, the TCP timers and osd(9) data are also embedded into tcpcb, and temprorary struct tcpcb_mem goes away. There was no extra allocation here, but we went through extra pointer every time we accessed this data. One interesting side effect is that now TCP data is allocated from SMR-protected zone. Potentially this allows the TCP stacks or other TCP related modules to utilize that for their own synchronization. Large part of the change was done with sed script: s/tp->ccv->/tp->t_ccv./g s/tp->ccv/\&tp->t_ccv/g s/tp->cc_algo/tp->t_cc/g s/tp->t_timers->tt_/tp->tt_/g s/CCV\(ccv, osd\)/\&CCV(ccv, t_osd)/g Dependency side effect is that code that needs to know struct tcpcb should also know struct inpcb, that added several <netinet/in_pcb.h>. Differential revision: https://reviews.freebsd.org/D37127
* tcp: remove unused t_rttbestMichael Tuexen2022-11-161-2/+0
| | | | | | | | No functional change intended. Reviewed by: rscheff@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D37401
* tcp: account sent/received IP ECN markings independentlyRichard Scheffenegger2022-11-101-4/+8
| | | | | | | | | | | Have tcpstats (netstat -s) differentiate between received and sent ECN-marked packets. Also account for IP ECN bits (on TCP packets) even when the tcp session has not negotiated ECN support. Event: IETF 115 Hackathon Reviewed By: glebius, tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D37314
* tcp: don't store VNET in every tcpcb, take it from the inpcbinfoGleb Smirnoff2022-11-081-1/+0
| | | | | Reviewed by: rscheff Differential revision: https://reviews.freebsd.org/D37125
* tcp: provide macros to access inpcb and socket from a tcpcbGleb Smirnoff2022-11-081-12/+17
| | | | | | | There should be no functional changes with this commit. Reviewed by: rscheff Differential revision: https://reviews.freebsd.org/D37123