summaryrefslogtreecommitdiff
path: root/sys/netinet/tcp_stacks/fastpath.c
Commit message (Collapse)AuthorAgeFilesLines
* Delete the example tcp stack "fastpath" whichRandall Stewart2018-07-241-2360/+0
| | | | | | | | | | was only put in has an example. Sponsored by: Netflix inc. Differential Revision: https://reviews.freebsd.org/D16420 Notes: svn path=/head/; revision=336672
* Use appropriate MSS value when populating the TCP FO client cookie cacheMichael Tuexen2018-07-101-6/+24
| | | | | | | | | | | | | | When a client receives a SYN-ACK segment with a TFP fast open cookie, but without an MSS option, an MSS value from uninitialised stack memory is used. This patch ensures that in case no MSS option is included in the SYN-ACK, the appropriate value as given in RFC 7413 is used. Reviewed by: kbowling@ Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D16175 Notes: svn path=/head/; revision=336167
* Allow alternate TCP stack to populate the TCP FO client cookieMichael Tuexen2018-07-071-0/+15
| | | | | | | | | | | | | | cache. Without this patch, TCP FO could be used when using alternate TCP stack, but only existing entires in the TCP client cookie cache could be used. This cache was not populated by connections using alternate TCP stacks. Sponsored by: Netflix, Inc. Notes: svn path=/head/; revision=336061
* epoch(9): allow preemptible epochs to composeMatt Macy2018-07-041-125/+14
| | | | | | | | | | | | | | | | | | | | | | | - Add tracker argument to preemptible epochs - Inline epoch read path in kernel and tied modules - Change in_epoch to take an epoch as argument - Simplify tfb_tcp_do_segment to not take a ti_locked argument, there's no longer any benefit to dropping the pcbinfo lock and trying to do so just adds an error prone branchfest to these functions - Remove cases of same function recursion on the epoch as recursing is no longer free. - Remove the the TAILQ_ENTRY and epoch_section from struct thread as the tracker field is now stack or heap allocated as appropriate. Tested by: pho and Limelight Networks Reviewed by: kbowling at llnw dot com Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16066 Notes: svn path=/head/; revision=335924
* This commit brings in a new refactored TCP stack called Rack.Randall Stewart2018-06-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rack includes the following features: - A different SACK processing scheme (the old sack structures are not used). - RACK (Recent acknowledgment) where counting dup-acks is no longer done instead time is used to knwo when to retransmit. (see the I-D) - TLP (Tail Loss Probe) where we will probe for tail-losses to attempt to try not to take a retransmit time-out. (see the I-D) - Burst mitigation using TCPHTPS - PRR (partial rate reduction) see the RFC. Once built into your kernel, you can select this stack by either socket option with the name of the stack is "rack" or by setting the global sysctl so the default is rack. Note that any connection that does not support SACK will be kicked back to the "default" base FreeBSD stack (currently known as "default"). To build this into your kernel you will need to enable in your kernel: makeoptions WITH_EXTRA_TCP_STACKS=1 options TCPHPTS Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D15525 Notes: svn path=/head/; revision=334804
* This commit brings in the TCP high precision timer system (tcp_hpts).Randall Stewart2018-04-191-5/+5
| | | | | | | | | | | | | | | It is the forerunner/foundational work of bringing in both Rack and BBR which use hpts for pacing out packets. The feature is optional and requires the TCPHPTS option to be enabled before the feature will be active. TCP modules that use it must assure that the base component is compile in the kernel in which they are loaded. MFC after: Never Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D15020 Notes: svn path=/head/; revision=332770
* Declare more TCP globals in tcp_var.h, so that alternative TCP stacksGleb Smirnoff2017-10-111-15/+0
| | | | | | | | | | can use them. Gather all TCP tunables in tcp_var.h in one place and alphabetically sort them, to ease maintainance of the list. Don't copy and paste declarations in tcp_stacks/fastpath.c. Notes: svn path=/head/; revision=324539
* Avoid TCP log messages which are false positives.Michael Tuexen2017-08-231-36/+40
| | | | | | | | | | This is https://svnweb.freebsd.org/changeset/base/322812, just for alternate TCP stacks. XMFC with: 322812 Notes: svn path=/head/; revision=322813
* Revert r307901 - Inform CC modules about loss events.Sean Bruno2017-07-251-17/+5
| | | | | | | | | | | | | This was discussed between various transport@ members and it was requested to be reverted and discussed. Submitted by: Kevin Bowling <kevin.bowling@kev009.com> Reported by: lawrence Reviewed by: hiren Sponsored by: Limelight Networks Notes: svn path=/head/; revision=321480
* Improve comments to describe what the code does.Michael Tuexen2017-06-011-2/+4
| | | | | | | | Reported by: jtl Sponsored by: Netflix, Inc. Notes: svn path=/head/; revision=319433
* When a SYN-ACK is received in SYN-SENT state, RFC 793 requires theMichael Tuexen2017-04-261-7/+28
| | | | | | | | | | | | | | | validation of SEG.ACK as the first step. If the ACK is not acceptable, a RST segment should be sent and the segment should be dropped. Up to now, the segment was partially processed. This patch moves the check for the SEG.ACK validation up to the front as required. Reviewed by: hiren, gnn MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D10424 Notes: svn path=/head/; revision=317435
* The sysctl variable net.inet.tcp.drop_synfin is not honored in all states,Michael Tuexen2017-04-121-2/+36
| | | | | | | | | | | | | | for example not in SYN-SENT. This patch adds code to check the sysctl variable in other states than LISTEN. Thanks to ae and gnn for providing comments. Reviewed by: gnn MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D9894 Notes: svn path=/head/; revision=316743
* Use estimated RTT for receive buffer auto resizing instead of timestampsSteven Hartland2017-04-101-60/+2
| | | | | | | | | | | | | | | | | | | | | | | Switched from using timestamps to RTT estimates when performing TCP receive buffer auto resizing, as not all hosts support / enable TCP timestamps. Disabled reset of receive buffer auto scaling when not in bulk receive mode, which gives an extra 20% performance increase. Also extracted auto resizing to a common method shared between standard and fastpath modules. With this AWS S3 downloads at ~17ms latency on a 1Gbps connection jump from ~3MB/s to ~100MB/s using the default settings. Reviewed by: lstewart, gnn MFC after: 2 weeks Relnotes: Yes Sponsored by: Multiplay Differential Revision: https://reviews.freebsd.org/D9668 Notes: svn path=/head/; revision=316676
* Renumber copyright clause 4Warner Losh2017-02-281-1/+1
| | | | | | | | | | | | Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96 Notes: svn path=/head/; revision=314436
* Merge projects/ipsec into head/.Andrey V. Elsukov2017-02-061-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Small summary ------------- o Almost all IPsec releated code was moved into sys/netipsec. o New kernel modules added: ipsec.ko and tcpmd5.ko. New kernel option IPSEC_SUPPORT added. It enables support for loading and unloading of ipsec.ko and tcpmd5.ko kernel modules. o IPSEC_NAT_T option was removed. Now NAT-T support is enabled by default. The UDP_ENCAP_ESPINUDP_NON_IKE encapsulation type support was removed. Added TCP/UDP checksum handling for inbound packets that were decapsulated by transport mode SAs. setkey(8) modified to show run-time NAT-T configuration of SA. o New network pseudo interface if_ipsec(4) added. For now it is build as part of ipsec.ko module (or with IPSEC kernel). It implements IPsec virtual tunnels to create route-based VPNs. o The network stack now invokes IPsec functions using special methods. The only one header file <netipsec/ipsec_support.h> should be included to declare all the needed things to work with IPsec. o All IPsec protocols handlers (ESP/AH/IPCOMP protosw) were removed. Now these protocols are handled directly via IPsec methods. o TCP_SIGNATURE support was reworked to be more close to RFC. o PF_KEY SADB was reworked: - now all security associations stored in the single SPI namespace, and all SAs MUST have unique SPI. - several hash tables added to speed up lookups in SADB. - SADB now uses rmlock to protect access, and concurrent threads can do SA lookups in the same time. - many PF_KEY message handlers were reworked to reflect changes in SADB. - SADB_UPDATE message was extended to support new PF_KEY headers: SADB_X_EXT_NEW_ADDRESS_SRC and SADB_X_EXT_NEW_ADDRESS_DST. They can be used by IKE daemon to change SA addresses. o ipsecrequest and secpolicy structures were cardinally changed to avoid locking protection for ipsecrequest. Now we support only limited number (4) of bundled SAs, but they are supported for both INET and INET6. o INPCB security policy cache was introduced. Each PCB now caches used security policies to avoid SP lookup for each packet. o For inbound security policies added the mode, when the kernel does check for full history of applied IPsec transforms. o References counting rules for security policies and security associations were changed. The proper SA locking added into xform code. o xform code was also changed. Now it is possible to unregister xforms. tdb_xxx structures were changed and renamed to reflect changes in SADB/SPDB, and changed rules for locking and refcounting. Reviewed by: gnn, wblock Obtained from: Yandex LLC Relnotes: yes Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D9352 Notes: svn path=/head/; revision=313330
* Followup to mtod removal in main stack (r311225). Continued removalGeorge V. Neville-Neil2017-01-041-10/+9
| | | | | | | | | | of mtod() calls from TCP_PROBE macros. MFC after: 1 week Sponsored by: Limelight Networks Notes: svn path=/head/; revision=311243
* Ensure that TCP state changes to state-closing are reported via dtrace.Michael Tuexen2016-11-191-1/+0
| | | | | | | | | | | | This does not cover state changes from TIME-WAIT. Reviewed by: gnn MFC after: 3 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D8443 Notes: svn path=/head/; revision=308832
* Notify the use via setting errno when a TCP RST segment is receivedMichael Tuexen2016-11-171-0/+2
| | | | | | | | | | | | either in the CLOSING or LAST-ACK state. Reviewed by: hiren MFC after: 3 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D8371 Notes: svn path=/head/; revision=308745
* FreeBSD tcp stack used to inform respective congestion control module about theHiren Panchasara2016-10-251-5/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | loss event but not use or obay the recommendations i.e. values set by it in some cases. Here is an attempt to solve that confusion by following relevant RFCs/drafts. Stack only sets congestion window/slow start threshold values when there is no CC module availalbe to take that action. All CC modules are inspected and updated when needed to take appropriate action on loss. tcp_stacks/fastpath module has been updated to adapt these changes. Note: Probably, the most significant change would be to not bring congestion window down to 1MSS on a loss signaled by 3-duplicate acks and letting respective CC decide that value. In collaboration with: Matt Macy <mmacy at nextbsd dot org> Discussed on: transport@ mailing list Reviewed by: jtl MFC after: 1 month Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D8225 Notes: svn path=/head/; revision=307901
* The code currently resets the keepalive timer each time a packet isJonathan T. Looney2016-10-141-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | received on a TCP session that has entered the ESTABLISHED state. This results in a lot of calls to reset the keepalive timer. This patch changes the behavior so we set the keepalive timer for the keepalive idle time (TP_KEEPIDLE). When the keepalive timer fires, it will first check to see if the session has been idle for TP_KEEPIDLE ticks. If not, it will reschedule the keepalive timer for the time the session will have been idle for TP_KEEPIDLE ticks. For a session with regular communication, the keepalive timer should fire approximately once every TP_KEEPIDLE ticks. For sessions with irregular communication, the keepalive timer might fire more often. But, the disruption from a periodic keepalive timer should be less than the regular cost of resetting the keepalive timer on every packet. (FWIW, this change saved approximately 1.73% of the busy CPU cycles on a particular test system with a heavy TCP output load. Of course, the actual impact is very specific to the particular hardware and workload.) Reviewed by: gallatin, rrs MFC after: 2 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D8243 Notes: svn path=/head/; revision=307319
* With build without TCP_HHOOK and with INVARIANTS. Before mutex.h cameGleb Smirnoff2016-10-131-0/+2
| | | | | | | via sys/hhook.h -> sys/rmlock.h -> sys/mutex.h. Notes: svn path=/head/; revision=307226
* In the TCP stack, the hhook(9) framework provides hooks for kernel modulesJonathan T. Looney2016-10-121-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to add actions that run when a TCP frame is sent or received on a TCP session in the ESTABLISHED state. In the base tree, this functionality is only used for the h_ertt module, which is used by the cc_cdg, cc_chd, cc_hd, and cc_vegas congestion control modules. Presently, we incur overhead to check for hooks each time a TCP frame is sent or received on an ESTABLISHED TCP session. This change adds a new compile-time option (TCP_HHOOK) to determine whether to include the hhook(9) framework for TCP. To retain backwards compatibility, I added the TCP_HHOOK option to every configuration file that already defined "options INET". (Therefore, this patch introduces no functional change. In order to see a functional difference, you need to compile a custom kernel without the TCP_HHOOK option.) This change will allow users to easily exclude this functionality from their kernel, should they wish to do so. Note that any users who use a custom kernel configuration and use one of the congestion control modules listed above will need to add the TCP_HHOOK option to their kernel configuration. Reviewed by: rrs, lstewart, hiren (previous version), sjg (makefiles only) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D8185 Notes: svn path=/head/; revision=307082
* Remove "long" variables from the TCP stack (not including the modularJonathan T. Looney2016-10-061-12/+12
| | | | | | | | | | | | congestion control framework). Reviewed by: gnn, lstewart (partial) Sponsored by: Juniper Networks, Netflix Differential Revision: (multiple) Tested by: Limelight, Netflix Notes: svn path=/head/; revision=306769
* Adjust TCP module fastpath after r304803's cc_ack_received() changes.Hiren Panchasara2016-08-261-6/+20
| | | | | | | | | | Reported by: hiren, bz, np Reviewed by: rrs Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D7664 Notes: svn path=/head/; revision=304857
* Cleanup unneded include "opt_ipfw.h".Andrey V. Elsukov2016-06-091-1/+0
| | | | | | | | It was used for conditional build IPFIREWALL_FORWARD support. But IPFIREWALL_FORWARD option was removed a long time ago. Notes: svn path=/head/; revision=301717
* This small change adopts the excellent suggestion for using namedRandall Stewart2016-05-171-25/+8
| | | | | | | | | | | | | | structures in the add of a new tcp-stack that came in late to me via email after the last commit. It also makes it so that a new stack may optionally get a callback during a retransmit timeout. This allows the new stack to clear specific state (think sack scoreboards or other such structures). Sponsored by: Netflix Inc. Differential Revision: http://reviews.freebsd.org/D6303 Notes: svn path=/head/; revision=300042
* opt_kdtrace.h is not needed for SDT probes as of r258541.Mark Johnston2016-05-151-1/+0
| | | | Notes: svn path=/head/; revision=299864
* sys/net*: minor spelling fixes.Pedro F. Giffuni2016-05-031-2/+2
| | | | | | | No functional change. Notes: svn path=/head/; revision=298995
* This cleans up the timers code in TCP to start using the newRandall Stewart2016-04-281-2/+0
| | | | | | | | | | | | async_drain functionality. This as been tested in NF as well as by Verisign. Still to do in here is to remove all the old flags. They are currently left being maintained but probably are no longer needed. Sponsored by: Netflix Inc. Differential Revision: http://reviews.freebsd.org/D5924 Notes: svn path=/head/; revision=298743
* Remove duplicate external declaration of tcprexmtthresh makingBjoern A. Zeeb2016-03-131-2/+0
| | | | | | | gcc compiles barf. Notes: svn path=/head/; revision=296811
* Fix a sneaky bug where we were missing an externRandall Stewart2016-03-081-1/+1
| | | | | | | | | to get the rxt threshold.. and thus created our own defaulted to 0 :-( Sponsored by: Netflix Inc Notes: svn path=/head/; revision=296476
* Fix dtrace probes (introduced in 287759): debug__input was usedGeorge V. Neville-Neil2016-03-031-4/+7
| | | | | | | | | | | | for output and drop; connect didn't always fire a user probe some probes were missing in fastpath Submitted by: Hannes Mehnert Sponsored by: REMS, EPSRC Differential Revision: https://reviews.freebsd.org/D5525 Notes: svn path=/head/; revision=296352
* This fixes the fastpath code to have a better module initialization sequence ↵Randall Stewart2016-02-231-1/+1
| | | | | | | | | | | | | | when included in loader.conf. It also fixes it so that no matter if some one incorrectly specifies a load order, the lists and such will be initialized on demand at that time so no one can make that mistake. Reviewed by: hiren Differential Revision: D5189 Notes: svn path=/head/; revision=295927
* Rename netinet/tcp_cc.h to netinet/cc/cc.h.Gleb Smirnoff2016-01-271-1/+1
| | | | | | | Discussed with: lstewart Notes: svn path=/head/; revision=294931
* - Rename cc.h to more meaningful tcp_cc.h.Gleb Smirnoff2016-01-211-1/+2
| | | | | | | | - Declare it a kernel only include, which it already is. - Don't include tcp.h implicitly from tcp_cc.h Notes: svn path=/head/; revision=294535
* Cleanup TCP files from unnecessary interface related includes.Gleb Smirnoff2016-01-211-4/+0
| | | | Notes: svn path=/head/; revision=294534
* Apply the changes from r293284 to one additional file.Jonathan T. Looney2016-01-071-3/+1
| | | | | | | Discussed with: glebius Notes: svn path=/head/; revision=293313
* Remove redundant extern's that make the ppc compile fail.Randall Stewart2015-12-161-25/+0
| | | | | | | Thanks Ed Maste for the heads up. Notes: svn path=/head/; revision=292336
* First cut of the modularization of our TCP stack. StillRandall Stewart2015-12-161-0/+2486
to do is to clean up the timer handling using the async-drain. Other optimizations may be coming to go with this. Whats here will allow differnet tcp implementations (one included). Reviewed by: jtl, hiren, transports Sponsored by: Netflix Inc. Differential Revision: D4055 Notes: svn path=/head/; revision=292309