summaryrefslogtreecommitdiff
path: root/sys/netinet/tcp_output.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Don't overpromote values when calculating len in tcp_output().Jonathan T. Looney2017-07-051-1/+1
| | | | | | | | | | | | | sbavail() returns u_int and sendwin is a uint32_t. Therefore, min() (which operates on two u_int values) is able to correctly calculate the minimum of these two arguments. Reported by: rrs MFC after: 1 week Sponsored by: Netflix Notes: svn path=/head/; revision=320682
* Add a sysctl to toggle the use of the sockets LOWAT when calculating auto ↵Sean Bruno2017-07-031-2/+12
| | | | | | | | | | | | window growth Submitted by: j@nitrology.com (Jason Wolfe) Reviewed by: gnn hiren Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D11016 Notes: svn path=/head/; revision=320614
* Use estimated RTT for receive buffer auto resizing instead of timestampsSteven Hartland2017-04-101-4/+6
| | | | | | | | | | | | | | | | | | | | | | | Switched from using timestamps to RTT estimates when performing TCP receive buffer auto resizing, as not all hosts support / enable TCP timestamps. Disabled reset of receive buffer auto scaling when not in bulk receive mode, which gives an extra 20% performance increase. Also extracted auto resizing to a common method shared between standard and fastpath modules. With this AWS S3 downloads at ~17ms latency on a 1Gbps connection jump from ~3MB/s to ~100MB/s using the default settings. Reviewed by: lstewart, gnn MFC after: 2 weeks Relnotes: Yes Sponsored by: Multiplay Differential Revision: https://reviews.freebsd.org/D9668 Notes: svn path=/head/; revision=316676
* Enable route and LLE (ndp) caching in TCP/IPv6Mike Karels2017-03-271-7/+4
| | | | | | | | | | | | | tcp_output.c was using a route on the stack for IPv6, which does not allow route caching or LLE/ndp caching. Switch to using the route (v6 flavor) in the in_pcb, which was already present, which caches both L3 and L2 lookups. Reviewed by: gnn hiren MFC after: 2 weeks Notes: svn path=/head/; revision=316065
* Renumber copyright clause 4Warner Losh2017-02-281-1/+1
| | | | | | | | | | | | Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96 Notes: svn path=/head/; revision=314436
* TCP window updates are only sent if the window can be increased by atMichael Tuexen2017-02-231-0/+2
| | | | | | | | | | | | | | | least 2 * MSS. However, if the receive buffer size is small, this might be impossible. Add back a criterion to send a TCP window update if the window can be increased by at least half of the receive buffer size. This condition was removed in r242252. This patch simply brings it back. PR: 211003 Reviewed by: gnn MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D9475 Notes: svn path=/head/; revision=314155
* Merge projects/ipsec into head/.Andrey V. Elsukov2017-02-061-21/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Small summary ------------- o Almost all IPsec releated code was moved into sys/netipsec. o New kernel modules added: ipsec.ko and tcpmd5.ko. New kernel option IPSEC_SUPPORT added. It enables support for loading and unloading of ipsec.ko and tcpmd5.ko kernel modules. o IPSEC_NAT_T option was removed. Now NAT-T support is enabled by default. The UDP_ENCAP_ESPINUDP_NON_IKE encapsulation type support was removed. Added TCP/UDP checksum handling for inbound packets that were decapsulated by transport mode SAs. setkey(8) modified to show run-time NAT-T configuration of SA. o New network pseudo interface if_ipsec(4) added. For now it is build as part of ipsec.ko module (or with IPSEC kernel). It implements IPsec virtual tunnels to create route-based VPNs. o The network stack now invokes IPsec functions using special methods. The only one header file <netipsec/ipsec_support.h> should be included to declare all the needed things to work with IPsec. o All IPsec protocols handlers (ESP/AH/IPCOMP protosw) were removed. Now these protocols are handled directly via IPsec methods. o TCP_SIGNATURE support was reworked to be more close to RFC. o PF_KEY SADB was reworked: - now all security associations stored in the single SPI namespace, and all SAs MUST have unique SPI. - several hash tables added to speed up lookups in SADB. - SADB now uses rmlock to protect access, and concurrent threads can do SA lookups in the same time. - many PF_KEY message handlers were reworked to reflect changes in SADB. - SADB_UPDATE message was extended to support new PF_KEY headers: SADB_X_EXT_NEW_ADDRESS_SRC and SADB_X_EXT_NEW_ADDRESS_DST. They can be used by IKE daemon to change SA addresses. o ipsecrequest and secpolicy structures were cardinally changed to avoid locking protection for ipsecrequest. Now we support only limited number (4) of bundled SAs, but they are supported for both INET and INET6. o INPCB security policy cache was introduced. Each PCB now caches used security policies to avoid SP lookup for each packet. o For inbound security policies added the mode, when the kernel does check for full history of applied IPsec transforms. o References counting rules for security policies and security associations were changed. The proper SA locking added into xform code. o xform code was also changed. Now it is possible to unregister xforms. tdb_xxx structures were changed and renamed to reflect changes in SADB/SPDB, and changed rules for locking and refcounting. Reviewed by: gnn, wblock Obtained from: Yandex LLC Relnotes: yes Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D9352 Notes: svn path=/head/; revision=313330
* Correct comment grammar and make it easier to understand.Cy Schubert2017-01-301-2/+2
| | | | | | | MFC after: 1 week Notes: svn path=/head/; revision=312982
* Fix DTrace TCP tracepoints to not use mtod() as it is both unnecessary andGeorge V. Neville-Neil2017-01-041-1/+1
| | | | | | | | | | | | | | dangerous. Those wanting data from an mbuf should use DTrace itself to get the data. PR: 203409 Reviewed by: hiren MFC after: 1 week Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D9035 Notes: svn path=/head/; revision=311225
* We currently don't do TSO if ip options are present. In case of IPv6, we look atHiren Panchasara2016-12-111-19/+21
| | | | | | | | | | | | | | | in6p_options to check that. That is incorrect as we carry ip options in in6p_outputopts. Also, just checking for in6p_outputopts being NULL won't suffice as we combine ip options and ip header fields both in that one field. The commit fixes this by using ip6_optlen() which correctly calculates length of only ip options for IPv6. Reviewed by: ae, bz MFC after: 3 weeks Sponsored by: Limelight Networks Notes: svn path=/head/; revision=309858
* The TFO server-side code contains some changes that are not conditioned onJonathan T. Looney2016-10-121-4/+4
| | | | | | | | | | | | | | | | the TCP_RFC7413 kernel option. This change removes those few instructions from the packet processing path. While not strictly necessary, for the sake of consistency, I applied the new IS_FASTOPEN macro to all places in the packet processing path that used the (t_flags & TF_FASTOPEN) check. Reviewed by: hiren Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D8219 Notes: svn path=/head/; revision=307153
* In the TCP stack, the hhook(9) framework provides hooks for kernel modulesJonathan T. Looney2016-10-121-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to add actions that run when a TCP frame is sent or received on a TCP session in the ESTABLISHED state. In the base tree, this functionality is only used for the h_ertt module, which is used by the cc_cdg, cc_chd, cc_hd, and cc_vegas congestion control modules. Presently, we incur overhead to check for hooks each time a TCP frame is sent or received on an ESTABLISHED TCP session. This change adds a new compile-time option (TCP_HHOOK) to determine whether to include the hhook(9) framework for TCP. To retain backwards compatibility, I added the TCP_HHOOK option to every configuration file that already defined "options INET". (Therefore, this patch introduces no functional change. In order to see a functional difference, you need to compile a custom kernel without the TCP_HHOOK option.) This change will allow users to easily exclude this functionality from their kernel, should they wish to do so. Note that any users who use a custom kernel configuration and use one of the congestion control modules listed above will need to add the TCP_HHOOK option to their kernel configuration. Reviewed by: rrs, lstewart, hiren (previous version), sjg (makefiles only) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D8185 Notes: svn path=/head/; revision=307082
* Remove "long" variables from the TCP stack (not including the modularJonathan T. Looney2016-10-061-32/+33
| | | | | | | | | | | | congestion control framework). Reviewed by: gnn, lstewart (partial) Sponsored by: Juniper Networks, Netflix Differential Revision: (multiple) Tested by: Limelight, Netflix Notes: svn path=/head/; revision=306769
* If the new window size is less than the old window size, skip theJonathan T. Looney2016-10-061-3/+4
| | | | | | | | | | | | | calculations to check if we should advertise a larger window. Reviewed by: gnn MFC after: 2 weeks Sponsored by: Juniper Networks, Netflix Differential Revision: https://reviews.freebsd.org/D7076 Tested by: Limelight, Netflix Notes: svn path=/head/; revision=306768
* Correctly calculate snd_max in persist case.Jonathan T. Looney2016-10-061-1/+1
| | | | | | | | | | | | | | In the persist case, take the SYN and FIN flags into account when updating the sequence space sent. Reviewed by: gnn MFC after: 2 weeks Sponsored by: Juniper Networks, Netflix Differential Revision: https://reviews.freebsd.org/D7075 Tested by: Limelight, Netflix Notes: svn path=/head/; revision=306767
* Remove the 4.3BSD compatible macro m_copy(), use m_copym() instead.Kevin Lo2016-09-151-1/+1
| | | | | | | | Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D7878 Notes: svn path=/head/; revision=305824
* tcp: Don't prematurely drop receiving-only connectionsSepherosa Ziehau2016-05-301-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | If the connection was persistent and receiving-only, several (12) sporadic device insufficient buffers would cause the connection be dropped prematurely: Upon ENOBUFS in tcp_output() for an ACK, retransmission timer is started. No one will stop this retransmission timer for receiving- only connection, so the retransmission timer promises to expire and t_rxtshift is promised to be increased. And t_rxtshift will not be reset to 0, since no RTT measurement will be done for receiving-only connection. If this receiving-only connection lived long enough (e.g. >350sec, given the RTO starts from 200ms), and it suffered 12 sporadic device insufficient buffers, i.e. t_rxtshift >= 12, this receiving-only connection would be dropped prematurely by the retransmission timer. We now assert that for data segments, SYNs or FINs either rexmit or persist timer was wired upon ENOBUFS. And don't set rexmit timer for other cases, i.e. ENOBUFS upon ACKs. Discussed with: lstewart, hiren, jtl, Mike Karels MFC after: 3 weeks Sponsored by: Microsoft OSTC Differential Revision: https://reviews.freebsd.org/D5872 Notes: svn path=/head/; revision=300981
* Change net.inet.tcp.ecn.enable sysctl mib from a binary off/onDon Lewis2016-05-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | control to a three way setting. 0 - Totally disable ECN. (no change) 1 - Enable ECN if incoming connections request it. Outgoing connections will request ECN. (no change from present != 0 setting) 2 - Enable ECN if incoming connections request it. Outgoing conections will not request ECN. Change the default value of net.inet.tcp.ecn.enable from 0 to 2. Linux version 2.4.20 and newer, Solaris, and Mac OS X 10.5 and newer have similar capabilities. The actual values above match Linux, and the default matches the current Linux default. Reviewed by: eadler MFC after: 1 month MFH: yes Sponsored by: https://reviews.freebsd.org/D6386 Notes: svn path=/head/; revision=300240
* sys/net*: minor spelling fixes.Pedro F. Giffuni2016-05-031-4/+4
| | | | | | | No functional change. Notes: svn path=/head/; revision=298995
* FreeBSD previously provided route caching for TCP (and UDP). Re-addGeorge V. Neville-Neil2016-03-241-7/+3
| | | | | | | | | | | | | route caching for TCP, with some improvements. In particular, invalidate the route cache if a new route is added, which might be a better match. The cache is automatically invalidated if the old route is deleted. Submitted by: Mike Karels Reviewed by: gnn Differential Revision: https://reviews.freebsd.org/D4306 Notes: svn path=/head/; revision=297225
* to_flags is currently a 64-bit integer; however, we only use 7 bits.Jonathan T. Looney2016-03-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Furthermore, there is no reason this needs to be a 64-bit integer for the forseeable future. Also, there is an inconsistency between to_flags and the mask in tcp_addoptions(). Before r195654, to_flags was a u_long and the mask in tcp_addoptions() was a u_int. r195654 changed to_flags to be a u_int64_t but left the mask in tcp_addoptions() as a u_int, meaning that these variables will only be the same width on platforms with 64-bit integers. Convert both to_flags and the mask in tcp_addoptions() to be explicitly 32-bit variables. This may save a few cycles on 32-bit platforms, and avoids unnecessarily mixing types. Differential Revision: https://reviews.freebsd.org/D5584 Reviewed by: hiren MFC after: 2 weeks Sponsored by: Juniper Networks Notes: svn path=/head/; revision=297193
* Fix dtrace probes (introduced in 287759): debug__input was usedGeorge V. Neville-Neil2016-03-031-1/+1
| | | | | | | | | | | | for output and drop; connect didn't always fire a user probe some probes were missing in fastpath Submitted by: Hannes Mehnert Sponsored by: REMS, EPSRC Differential Revision: https://reviews.freebsd.org/D5525 Notes: svn path=/head/; revision=296352
* Rename netinet/tcp_cc.h to netinet/cc/cc.h.Gleb Smirnoff2016-01-271-1/+1
| | | | | | | Discussed with: lstewart Notes: svn path=/head/; revision=294931
* Persist timers TCPTV_PERSMIN and TCPTV_PERSMAX are hardcoded with 5 seconds andHiren Panchasara2016-01-261-1/+1
| | | | | | | | | | | | | | 60 seconds, respectively. Turn them into sysctls that can be tuned live. The default values of 5 seconds and 60 seconds have been retained. Submitted by: Jason Wolfe (j at nitrology dot com) Reviewed by: gnn, rrs, hiren, bz MFC after: 1 week Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D5024 Notes: svn path=/head/; revision=294840
* - Rename cc.h to more meaningful tcp_cc.h.Gleb Smirnoff2016-01-211-1/+2
| | | | | | | | - Declare it a kernel only include, which it already is. - Don't include tcp.h implicitly from tcp_cc.h Notes: svn path=/head/; revision=294535
* There is a bug in tcp_output()'s implementation of the TCP_SIGNATUREGleb Smirnoff2016-01-141-2/+4
| | | | | | | | | | | | | | | (RFC 2385/TCP-MD5) kernel option. If a tcpcb has TF_NOOPT flag, then tcp_addoptions() is not called, and to.to_signature is an uninitialized stack variable. The value is later used as write offset, which leads to writing to random address. Submitted by: rstone, jtl Security: SA-16:05.tcp Notes: svn path=/head/; revision=293910
* Historically we have two fields in tcpcb to describe sender MSS: t_maxopd,Gleb Smirnoff2016-01-071-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | and t_maxseg. This dualism emerged with T/TCP, but was not properly cleaned up after T/TCP removal. After all permutations over the years the result is that t_maxopd stores a minimum of peer offered MSS and MTU reduced by minimum protocol header. And t_maxseg stores (t_maxopd - TCPOLEN_TSTAMP_APPA) if timestamps are in action, or is equal to t_maxopd otherwise. That's a very rough estimate of MSS reduced by options length. Throughout the code it was used in places, where preciseness was not important, like cwnd or ssthresh calculations. With this change: - t_maxopd goes away. - t_maxseg now stores MSS not adjusted by options. - new function tcp_maxseg() is provided, that calculates MSS reduced by options length. The functions gives a better estimate, since it takes into account SACK state as well. Reviewed by: jtl Differential Revision: https://reviews.freebsd.org/D3593 Notes: svn path=/head/; revision=293284
* Implementation of server-side TCP Fast Open (TFO) [RFC7413].Patrick Kelsey2015-12-241-1/+70
| | | | | | | | | | | | | TFO is disabled by default in the kernel build. See the top comment in sys/netinet/tcp_fastopen.c for implementation particulars. Reviewed by: gnn, jch, stas MFC after: 3 days Sponsored by: Verisign, Inc. Differential Revision: https://reviews.freebsd.org/D4350 Notes: svn path=/head/; revision=292706
* There are times when it would be really nice to have a record of the last fewHiren Panchasara2015-10-141-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | packets and/or state transitions from each TCP socket. That would help with narrowing down certain problems we see in the field that are hard to reproduce without understanding the history of how we got into a certain state. This change provides just that. It saves copies of the last N packets in a list in the tcpcb. When the tcpcb is destroyed, the list is freed. I thought this was likely to be more performance-friendly than saving copies of the tcpcb. Plus, with the packets, you should be able to reverse-engineer what happened to the tcpcb. To enable the feature, you will need to compile a kernel with the TCPPCAP option. Even then, the feature defaults to being deactivated. You can activate it by setting a positive value for the number of captured packets. You can do that on either a global basis or on a per-socket basis (via a setsockopt call). There is no way to get the packets out of the kernel other than using kmem or getting a coredump. I thought that would help some of the legal/privacy concerns regarding such a feature. However, it should be possible to add a future effort to export them in PCAP format. I tested this at low scale, and found that there were no mbuf leaks and the peak mbuf usage appeared to be unchanged with and without the feature. The main performance concern I can envision is the number of mbufs that would be used on systems with a large number of sockets. If you save five packets per direction per socket and have 3,000 sockets, that will consume at least 30,000 mbufs just to keep these packets. I tried to reduce the concerns associated with this by limiting the number of clusters (not mbufs) that could be used for this feature. Again, in my testing, that appears to work correctly. Differential Revision: D3100 Submitted by: Jonathan Looney <jlooney at juniper dot net> Reviewed by: gnn, hiren Notes: svn path=/head/; revision=289276
* Update TSO limits to include all headers.Hans Petter Selasky2015-09-141-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To make driver programming easier the TSO limits are changed to reflect the values used in the BUSDMA tag a network adapter driver is using. The TCP/IP network stack will subtract space for all linklevel and protocol level headers and ensure that the full mbuf chain passed to the network adapter fits within the given limits. Implementation notes: If a network adapter driver needs to fixup the first mbuf in order to support VLAN tag insertion, the size of the VLAN tag should be subtracted from the TSO limit. Else not. Network adapters which typically inline the complete header mbuf could technically transmit one more segment. This patch does not implement a mechanism to recover the last segment for data transmission. It is believed when sufficiently large mbuf clusters are used, the segment limit will not be reached and recovering the last segment will not have any effect. The current TSO algorithm tries to send MTU-sized packets, where the MTU typically is 1500 bytes, which gives 1448 bytes of TCP data payload per packet for IPv4. That means if the TSO length limitiation is set to 65536 bytes, there will be a data payload remainder of (65536 - 1500) mod 1448 bytes which is equal to 324 bytes. Trying to recover total TSO length due to inlining mbuf header data will not have any effect, because adding or removing the ETH/IP/TCP headers to or from 324 bytes will not cause more or less TCP payload to be TSO'ed. Existing network adapter limits will be updated separately. Differential Revision: https://reviews.freebsd.org/D3458 Reviewed by: rmacklem MFC after: 2 weeks Notes: svn path=/head/; revision=287775
* dd DTrace probe points, translators and a corresponding scriptGeorge V. Neville-Neil2015-09-131-0/+1
| | | | | | | | | | | | to provide the TCPDEBUG functionality with pure DTrace. Reviewed by: rwatson MFC after: 2 weeks Sponsored by: Limelight Networks Differential Revision: D3530 Notes: svn path=/head/; revision=287759
* Remove stale comment.Kristof Provost2015-07-251-1/+0
| | | | | | | | | The IPv6 pseudo header checksum was added by bz in r235961. Sponsored by: Essen FreeBSD Hackathon Notes: svn path=/head/; revision=285874
* Fix resource exhaustion due to sessions stuck in LAST_ACK state.Xin LI2015-07-211-2/+9
| | | | | | | | | | Submitted by: Jonathan Looney (Juniper SIRT) Reviewed by: lstewart Security: CVE-2015-5358 Security: SA-15:13.tcp Notes: svn path=/head/; revision=285777
* Avoid a situation where we do not set persist timer after a zero windowHiren Panchasara2015-06-291-0/+24
| | | | | | | | | | | | | | | | | condition. If you send a 0-length packet, but there is data is the socket buffer, and neither the rexmt or persist timer is already set, then activate the persist timer. PR: 192599 Differential Revision: D2946 Submitted by: jlott at averesystems dot com Reviewed by: jhb, jch, gnn, hiren Tested by: jlott at averesystems dot com, jch MFC after: 2 weeks Notes: svn path=/head/; revision=284941
* To ease changes to underlying mbuf structure and the mbuf allocator, reduceRobert Watson2015-01-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | the knowledge of mbuf layout, and in particular constants such as M_EXT, MLEN, MHLEN, and so on, in mbuf consumers by unifying various alignment utility functions (M_ALIGN(), MH_ALIGN(), MEXT_ALIGN() in a single M_ALIGN() macro, implemented by a now-inlined m_align() function: - Move m_align() from uipc_mbuf.c to mbuf.h; mark as __inline. - Reimplement M_ALIGN(), MH_ALIGN(), and MEXT_ALIGN() using m_align(). - Update consumers around the tree to simply use M_ALIGN(). This change eliminates a number of cases where mbuf consumers must be aware of whether or not mbufs returned by the allocator use external storage, but also assumptions about the size of the returned mbuf. This will make it easier to introduce changes in how we use external storage, as well as features such as variable-size mbufs. Differential Revision: https://reviews.freebsd.org/D1436 Reviewed by: glebius, trasz, gnn, bz Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=276692
* In preparation of merging projects/sendfile, transform bare access toGleb Smirnoff2014-11-121-13/+18
| | | | | | | | | | | | | | | sb_cc member of struct sockbuf to a couple of inline functions: sbavail() and sbused() Right now they are equal, but once notion of "not ready socket buffer data", will be checked in, they are going to be different. Sponsored by: Netflix Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=274421
* Fix some minor TSO issues:Hans Petter Selasky2014-11-111-8/+8
| | | | | | | | | | | | | - Improve description of TSO limits. - Remove a not needed KASSERT() - Remove some not needed variable casts. Sponsored by: Mellanox Technologies Discussed with: lstewart @ MFC after: 1 week Notes: svn path=/head/; revision=274376
* Remove SYSCTL_VNET_* macros, and simply put CTLFLAG_VNET where needed.Gleb Smirnoff2014-11-071-6/+6
| | | | | | | Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=274225
* Catch ipv6 case when attempting to do PLPMTUD blackhole detection.Sean Bruno2014-10-131-0/+5
| | | | | | | | | Submitted by: Mikhail <mp@lenta.ru> MFC after: 2 weeks Relnotes: yes Notes: svn path=/head/; revision=273062
* Implement PLPMTUD blackhole detection (RFC 4821), inspired by codeSean Bruno2014-10-071-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | from xnu sources. If we encounter a network where ICMP is blocked the Needs Frag indicator may not propagate back to us. Attempt to downshift the mss once to a preconfigured value. Default this feature to off for now while we do not have a full PLPMTUD implementation in our stack. Adds the following new sysctl's for control: net.inet.tcp.pmtud_blackhole_detection -- turns on/off this feature net.inet.tcp.pmtud_blackhole_mss -- mss to try for ipv4 net.inet.tcp.v6pmtud_blackhole_mss -- mss to try for ipv6 Adds the following new sysctl's for monitoring: -- Number of times the code was activated to attempt a mss downshift net.inet.tcp.pmtud_blackhole_activated -- Number of times the blackhole mss was used in an attempt to downshift net.inet.tcp.pmtud_blackhole_min_activated -- Number of times that we failed to connect after we downshifted the mss net.inet.tcp.pmtud_blackhole_failed Phabricator: https://reviews.freebsd.org/D506 Reviewed by: rpaulo bz MFC after: 2 weeks Relnotes: yes Sponsored by: Limelight Networks Notes: svn path=/head/; revision=272720
* Minor code styling.Hans Petter Selasky2014-10-061-17/+16
| | | | | | | Suggested by: glebius @ Notes: svn path=/head/; revision=272595
* Improve transmit sending offload, TSO, algorithm in general.Hans Petter Selasky2014-09-221-11/+96
| | | | | | | | | | | | | | | | | | | | The current TSO limitation feature only takes the total number of bytes in an mbuf chain into account and does not limit by the number of mbufs in a chain. Some kinds of hardware is limited by two factors. One is the fragment length and the second is the fragment count. Both of these limits need to be taken into account when doing TSO. Else some kinds of hardware might have to drop completely valid mbuf chains because they cannot loaded into the given hardware's DMA engine. The new way of doing TSO limitation has been made backwards compatible as input from other FreeBSD developers and will use defaults for values not set. Reviewed by: adrian, rmacklem Sponsored by: Mellanox Technologies MFC after: 1 week Notes: svn path=/head/; revision=271946
* Revert r271504. A new patch to solve this issue will be made.Hans Petter Selasky2014-09-131-74/+3
| | | | | | | Suggested by: adrian @ Notes: svn path=/head/; revision=271551
* Improve transmit sending offload, TSO, algorithm in general.Hans Petter Selasky2014-09-131-3/+74
| | | | | | | | | | | | | | | | | | | The current TSO limitation feature only takes the total number of bytes in an mbuf chain into account and does not limit by the number of mbufs in a chain. Some kinds of hardware is limited by two factors. One is the fragment length and the second is the fragment count. Both of these limits need to be taken into account when doing TSO. Else some kinds of hardware might have to drop completely valid mbuf chains because they cannot loaded into the given hardware's DMA engine. The new way of doing TSO limitation has been made backwards compatible as input from other FreeBSD developers and will use defaults for values not set. MFC after: 1 week Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=271504
* Fix a typo.Hiren Panchasara2014-07-031-1/+1
| | | | Notes: svn path=/head/; revision=268241
* - Remove rt_metrics_lite and simply put its members into rtentry.Gleb Smirnoff2014-03-051-2/+2
| | | | | | | | | | | | | | | | | | | - Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This removes another cache trashing ++ from packet forwarding path. - Create zini/fini methods for the rtentry UMA zone. Via initialize mutex and counter in them. - Fix reporting of rmx_pksent to routing socket. - Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode. The change is mostly targeted for stable/10 merge. For head, rt_pksent is expected to just disappear. Discussed with: melifaro Sponsored by: Netflix Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=262763
* dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINEAndriy Gapon2013-11-261-2/+2
| | | | | | | | | | | In its stead use the Solaris / illumos approach of emulating '-' (dash) in probe names with '__' (two consecutive underscores). Reviewed by: markj MFC after: 3 weeks Notes: svn path=/head/; revision=258622
* - For kernel compiled only with KDTRACE_HOOKS and not any lock debuggingAttilio Rao2013-11-251-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip Notes: svn path=/head/; revision=258541
* Implement the ip, tcp, and udp DTrace providers. The probe definitions useMark Johnston2013-08-251-0/+20
| | | | | | | | | | | | dynamic translation so that their arguments match the definitions for these providers in Solaris and illumos. Thus, existing scripts for these providers should work unmodified on FreeBSD. Tested by: gnn, hiren MFC after: 1 month Notes: svn path=/head/; revision=254889
* Allow drivers to specify a maximum TSO length in bytes if they areAndre Oppermann2013-06-031-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | limited in the amount of data they can handle at once. Drivers can set ifp->if_hw_tsomax before calling ether_ifattach() to change the limit. The lowest allowable size is IP_MAXPACKET / 8 (8192 bytes) as anything less wouldn't be very useful anymore. The upper limit is still at IP_MAXPACKET (65536 bytes). Raising it requires further auditing of the IPv4/v6 code path's as the length field in the IP header would overflow leading to confusion in firewalls and others packet handler on the real size of the packet. The placement into "struct ifnet" is a bit hackish but the best place that was found. When the stack/driver boundary is updated it should be handled in a better way. Submitted by: cperciva (earlier version) Reviewed by: cperciva Tested by: cperciva MFC after: 1 week (using spare struct members to preserve ABI) Notes: svn path=/head/; revision=251296