| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
struct tcpcb embeds a struct osd and a struct callout. Rather than
forcing all consumers to pull in the same headers, include the headers
directly.
No functional change intended.
Reviewed by: glebius
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D44685
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When debugging network issues one common clue is an unexpectedly
incrementing error counter. This is helpful, in that it gives us an
idea of what might be going wrong, but often these counters may be
incremented in different functions.
Add a static probe point for them so that we can use dtrace to get
futher information (e.g. a stack trace).
For example:
dtrace -n 'mib:ip:count: { printf("%d", arg0); stack(); }'
This can be disabled by setting the following kernel option:
options KDTRACE_NO_MIB_SDT
Reviewed by: gallatin, tuexen (previous version), gnn (previous version)
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D43504
|
|
|
|
|
|
|
|
|
| |
Make the comment consistent with the code.
Reviewed by: rscheff
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D44611
|
|
|
|
|
|
|
|
|
|
|
|
| |
The macro is more obfuscating than helping as it just checks a single flag
of t_flags. All other t_flags bits are checked without a macro.
A bigger problem was that declaration of the macro in tcp_var.h depended
on a kernel option. It is a bad practice to create such definitions in
installable headers.
Reviewed by: rscheff, tuexen, kib
Differential Revision: https://reviews.freebsd.org/D44362
|
|
|
|
|
|
|
|
|
|
| |
and drop the definition for userspace (which matched TCP_RFC7413) since
it depends on presence of the kernel option.
Reviewed by: glebius, rscheff
Sponsored by: NVIDIA networking
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D44349
|
|
|
|
|
| |
Reviewed by: rscheff, tuexen, kib
Differential Revision: https://reviews.freebsd.org/D44340
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This brings the rack stack up to the current level used at NF. Many fixes
and improvements have been added. I also add in a fix to BBR to deal with
the changes that have been in hpts for a while i.e. only one call no matter
if mbuf queue or tcp_output.
It basically does little except BBlogs and is a placemark for future work on
doing path capacity measurements.
With a bit of a struggle with git I finally got rack_pcm.c into place (apologies
for not noticing this error). The LINT kernel is running on my box now .. sigh.
Reviewed by: tuexen, glebius
Sponsored by: Netflix Inc.
Differential Revision:https://reviews.freebsd.org/D43986
|
|
|
|
|
|
|
| |
This commit was incomplete and breaks LINT kernels. The tree has been
broken for 8+ hours.
This reverts commit f6d489f402c320f1a6eaa473491a0b8c3878113e.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This brings the rack stack up to the current level used at NF. Many fixes
and improvements have been added. I also add in a fix to BBR to deal with
the changes that have been in hpts for a while i.e. only one call no matter
if mbuf queue or tcp_output.
Note there is a new file that I can't figure out how to get in rack_pcm.c
It basically does little except BBlogs and is a placemark for future work on
doing path capacity measurements.
Reviewed by: tuexen, glebius
Sponsored by: Netflix Inc.
Differential Revision:https://reviews.freebsd.org/D43986
|
|
|
|
|
|
|
|
|
|
| |
Improve slowpath processing (reordering, retransmissions)
slightly by calculating maxseg only once. This typically
saves one of two calls to tcp_maxseg().
Reviewed By: glebius, tuexen, cc, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D43536
|
|
|
|
| |
This paragraph should have been removed in 446ccdd08e2a.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Keeping the SACK scoreboard intact after the first RTO
and retransmitting all data anew only on subsequent RTOs
allows a more timely and efficient loss recovery under
many adverse cirumstances.
Reviewed By: tuexen, #transport
MFC after: 10 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D42906
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Update all remaining references to the struct tcphdr th_x2 field.
This completes the compatibilty of various aspects with AccECN
(TH_AE), after the internal ipfw "re-checksum required" was moved
to use the TH_RES1 flag.
No functional change.
Reviewed By: tuexen, #transport, glebius
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D43172
|
|
|
|
|
|
|
|
|
|
| |
Provide accessor functions to all 12 possible TCP header
flags for userspace too.
Reviewed By: zlei
MFC after: 2 weeks
Sponsored by: Netapp, Inc.
Differential Revision: https://reviews.freebsd.org/D43152
|
|
|
|
|
|
|
|
| |
Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.
Sponsored by: Netflix
|
|
|
|
|
|
|
| |
This allows to clear inclusion of "opt_kern_tls.h" from a system header.
Reviewed by: rscheff, tuexen
Differential Revision: https://reviews.freebsd.org/D42696
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Improve Proportional Rate Reduction (RFC6937) by using a
heuristic, which automatically chooses between
conservative CRB and more aggressive SSRB modes.
Only when snd_una advances (a partial ACK), SSRB may be
used. Also, that ACK must not have any indication of
ongoing loss - using the addition of new holes into the
scoreboard as proxy for such an event.
MFC after: 4 weeks
Reviewed By: #transport, kbowling, rrs
Sponsored By: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D28822
|
|
|
|
|
|
|
|
|
|
|
|
| |
Coverity found that one safety check (kassert) was not
functional, as possible incorrect subtractions during
the accounting wouldn't show up as (invalid) negative
values.
Reported by: gallatin
Reviewed By: cc, #transport
Sponsored By: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D42180
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add more accounting while processing SACK data, to
keep track of when a packet is deemed lost using
the RFC6675 guidance.
Together with PRR (RFC6972) this allows a sender to
retransmit presumed lost packets faster, and loss
recovery to complete earlier.
Reviewed By: cc, rrs, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D39299
|
|
|
|
| |
Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/
|
|
|
|
|
| |
Reviewed by: rscheff, Peter Lei
Sponsored by: Netflix, Inc.
|
|
|
|
|
|
|
|
|
| |
Put optional fields at the end to minimize run time problems in
case CC modules are build from within its directory.
Reviewed by: cc, gallatin, glebius, imp
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D41059
|
|
|
|
|
|
|
|
|
|
| |
Always define t_osd. congestion control modules access it
unconditionally. This fixes the build.
However, this is, at best, a temporary band-aide until the
larger issues are sorted.
Sponsored by: Netflix
|
|
|
|
|
|
|
|
|
|
|
| |
Both Windows (TcpMaxDataRetransmissions) and Linux (tcp_retries2)
allow to restrict the maximum number of consecutive timer based
retransmissions. Add that same capability on a per-VNet basis to
FreeBSD.
Reviewed By: cc, tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D40424
|
|
|
|
|
|
|
|
|
|
| |
This change is a name change only. TCP Request tracking can track sendfile and even non-sendfile requests. The
names however in the current code use http, and they should not. The feature is not http specific. Lets change the
name so they more properly reflect whats going on. This also fixes conflicts with http_req which caused application pain.
Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D40229
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cause issues with rack.
When using rack, cubic and htcp will grab the srtt, but they think it is in ticks. For rack
it is in micro-seconds (which we should probably move all stacks to actually). This causes
issues so instead lets make a new interface so that any CC module can pull the srtt in
whatever granularity they want.
Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision:https://reviews.freebsd.org/D40146
|
|
|
|
|
|
|
|
| |
These flags are TCP specific. While here, make also several LRO
internal functions to pass tcpcb pointer instead of inpcb one.
Reviewed by: rrs
Differential Revision: https://reviews.freebsd.org/D39698
|
|
|
|
|
|
|
|
|
|
| |
This makes inpcb lighter and allows future cache line optimizations
of tcpcb. The reason why HPTS originally used inpcb is the compressed
TIME-WAIT state (see 0d7445193ab), that used to free a tcpcb, while the
associated connection is still on the HPTS ring.
Reviewed by: rrs
Differential Revision: https://reviews.freebsd.org/D39697
|
|
|
|
|
|
|
|
|
| |
Summary: Make sure the values are in order.
Reviewers: rscheff, tuexen, #transport!
Approved by: rscheff, tuexen, glebius
Subscribers: imp, melifaro, glebius
Differential Revision: https://reviews.freebsd.org/D39716
|
|
|
|
|
| |
Reviewed by: rrs
Differential Revision: https://reviews.freebsd.org/D39574
|
|
|
|
|
|
|
| |
Just matches rest of the KPI.
Reviewed by: rrs
Differential Revision: https://reviews.freebsd.org/D39435
|
|
|
|
|
|
|
|
| |
The socket argument is superfluous, as a tcpcb always has one and
only one socket.
Reviewed by: rrs
Differential Revision: https://reviews.freebsd.org/D39434
|
|
|
|
|
|
|
| |
Isn't in use anymore. Correct comments that mention it.
Reviewed by: rrs
Differential Revision: https://reviews.freebsd.org/D39433
|
|
|
|
|
|
|
|
|
|
| |
Both BBR and Rack have the ability to log socket options, which is currently disabled. Rack
has an experimental SaD (Sack Attack Detection) algorithm that should be made available. Also
there is a t_maxpeak_rate that needs to be removed (its un-used).
Reviewed by: tuexen, cc
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D39427
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
other infrastructure updates for incoming rack features.
So stack switching as always been a bit of a issue. We currently use a break before make setup which means that
if something goes wrong you have to try to get back to a stack. This patch among a lot of other things changes that so
that it is a make before break. We also expand some of the function blocks in prep for new features in rack that will allow
more controlled pacing. We also add other abilities such as the pathway for a stack to query a previous stack to acquire from
it critical state information so things in flight don't get dropped or mis-handled when switching stacks. We also add the
concept of a timer granularity. This allows an alternate stack to change from the old ticks granularity to microseconds and
of course this even gives us a pathway to go to nanosecond timekeeping if we need to (something for the data center to consider
for sure).
Once all this lands I will then update rack to begin using all these new features.
Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D39210
|
|
|
|
|
|
|
|
|
|
|
|
| |
tracepoint and bbpoint capabilities.
The TCP stacks have long accessed t_logstate directly, but in order to do tracepoints and the new bbpoints
we need to move to using the new inline functions. This adds them and moves rack to now use
the tcp_tracepoints.
Reviewed by: tuexen, gallatin
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D38831
|
|
|
|
|
| |
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/512
|
|
|
|
|
|
|
|
| |
tcp_trace was implemented in tcp_debug.c, which was removed recently.
Reviewed by: rscheff@, zlei@
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D38712
|
|
|
|
|
|
|
|
|
|
|
| |
Extend the BBLog RTO event to deal with all timers of the base
stack. Also provide information about starting, stopping, and
running off. The expiration of the retransmission timer is
reported as it was done before.
Reviewed by: rscheff@
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D38710
|
|
|
|
|
|
|
|
|
|
|
| |
Rearrange the enum tt_which such that TT_REXMIT is 0. This allows
an extension of the BBLog event RTO in a backwards compatible way.
Remove tcptimers, which was only used in trpt, a utility removed
from the source tree recently.
Reviewed by: glebius@, guest-ccui@
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D38547
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows us to avoid spurious calls to ktls_disable_ifnet()
When we implemented ifnet kTLSe, we set a flag in the tx socket
buffer (SB_TLS_IFNET) to indicate ifnet kTLS. This flag meant that
now, or in the past, ifnet ktls was active on a socket. Later,
I added code to switch ifnet ktls sessions to software in the case
of lossy TCP connections that have a high retransmit rate.
Because TCP was using SB_TLS_IFNET to know if it needed to do math
to calculate the retransmit ratio and potentially call into
ktls_disable_ifnet(), it was doing unneeded work long after
a session was moved to software.
This patch carefully tracks whether or not ifnet ktls is still enabled
on a TCP connection. Because the inp is now embedded in the tcpcb, and
because TCP is the most frequent accessor of this state, it made sense to
move this from the socket buffer flags to the tcpcb. Because we now need
reliable access to the tcbcb, we take a ref on the inp when creating a tx
ktls session.
While here, I noticed that rack/bbr were incorrectly implementing
tfb_hwtls_change(), and applying the change to all pending sends,
when it should apply only to future sends.
This change reduces spurious calls to ktls_disable_ifnet() by 95% or so
in a Netflix CDN environment.
Reviewed by: markj, rrs
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D38380
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
During tcp session start, various mechanisms need to
track a few initial RTTs before becoming active.
Prevent overflows of the corresponding tracking counter
and reduce the size of tcpcb simultaneously.
Reviewed By: #transport, tuexen, guest-ccui
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D21117
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use only one callout structure per tcpcb that is responsible for handling
all five TCP timeouts. Use locked version of callout, of course. The
callout function tcp_timer_enter() chooses soonest timer and executes it
with lock held. Unless the timer reports that the tcpcb has been freed,
the callout is rescheduled for next soonest timer, if there is any.
With single callout per tcpcb on connection teardown we should be able
to fully stop the callout and immediately free it, avoiding use of
callout_async_drain(). There is one gotcha here: callout_stop() can
actually touch our memory when a rare race condition happens. See
comment above tcp_timer_stop(). Synchronous stop of the callout makes
tcp_discardcb() the single entry point for tcpcb destructor, merging the
tcp_freecb() to the end of the function.
While here, also remove lots of lingering checks in the beginning of
TCP timer functions. With a locked callout they are unnecessary.
While here, clean unused parts of timer KPI for the pluggable TCP stacks.
While here, remove TCPDEBUG from tcp_timer.c, as this allows for more
simplification of TCP timers. The TCPDEBUG is scheduled for removal.
Move the DTrace probes in timers to the beginning of a function, where
a tcpcb is always existing.
Discussed with: rrs, tuexen, rscheff (the TCP part of the diff)
Reviewed by: hselasky, kib, mav (the callout part)
Differential revision: https://reviews.freebsd.org/D37321
|
|
|
|
|
| |
It was a temporary code added together with RACK to fight against
TCP timer races.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For the TCP protocol inpcb storage specify allocation size that would
provide space to most of the data a TCP connection needs, embedding
into struct tcpcb several structures, that previously were allocated
separately.
The most import one is the inpcb itself. With embedding we can provide
strong guarantee that with a valid TCP inpcb the tcpcb is always valid
and vice versa. Also we reduce number of allocs/frees per connection.
The embedded inpcb is placed in the beginning of the struct tcpcb,
since in_pcballoc() requires that. However, later we may want to move
it around for cache line efficiency, and this can be done with a little
effort. The new intotcpcb() macro is ready for such move.
The congestion algorithm data, the TCP timers and osd(9) data are
also embedded into tcpcb, and temprorary struct tcpcb_mem goes away.
There was no extra allocation here, but we went through extra pointer
every time we accessed this data.
One interesting side effect is that now TCP data is allocated from
SMR-protected zone. Potentially this allows the TCP stacks or other
TCP related modules to utilize that for their own synchronization.
Large part of the change was done with sed script:
s/tp->ccv->/tp->t_ccv./g
s/tp->ccv/\&tp->t_ccv/g
s/tp->cc_algo/tp->t_cc/g
s/tp->t_timers->tt_/tp->tt_/g
s/CCV\(ccv, osd\)/\&CCV(ccv, t_osd)/g
Dependency side effect is that code that needs to know struct tcpcb
should also know struct inpcb, that added several <netinet/in_pcb.h>.
Differential revision: https://reviews.freebsd.org/D37127
|
|
|
|
|
|
|
|
| |
No functional change intended.
Reviewed by: rscheff@
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D37401
|
|
|
|
|
|
|
|
|
|
|
| |
Have tcpstats (netstat -s) differentiate between received and sent
ECN-marked packets. Also account for IP ECN bits (on TCP packets)
even when the tcp session has not negotiated ECN support.
Event: IETF 115 Hackathon
Reviewed By: glebius, tuexen, #transport
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D37314
|
|
|
|
|
| |
Reviewed by: rscheff
Differential revision: https://reviews.freebsd.org/D37125
|
|
|
|
|
|
|
| |
There should be no functional changes with this commit.
Reviewed by: rscheff
Differential revision: https://reviews.freebsd.org/D37123
|